[
https://issues.apache.org/jira/browse/SQOOP-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583729#comment-15583729
]
Ruslan Dautkhanov commented on SQOOP-2907:
------------------------------------------
Any workarounds for this?
Is any way we can generate from parquet schema a .metadata that KiteSDK/ Sqoop
expects?
I was trying to do
# beeline 'create table avro_table stored as avro as select * from
parquet_table where 1=0'
# $ hadoop fs -get /hivewarehouse/avro_table/000000_0 ./
# $ avroavro-tools getschema /hivewarehouse/avro_table/000000_0 >000000_0.schema
# $ kite-dataset -v create amf_trans -s 000000_0.schema
The latest command of this four-steps process had finally .metadata directory,
but once I tried to run sqoop export, got following exception:
{noformat}
16/10/17 16:10:25 ERROR sqoop.Sqoop: Got exception running Sqoop:
org.kitesdk.data.DatasetIOException: Unable to load descriptor
file:hdfs://epsdatalake/hivewarehouse/disc_dv.db/amf_trans_dv_09142016/.metadata/descriptor.properties
for dataset:amf_trans_dv_09142016
org.kitesdk.data.DatasetIOException: Unable to load descriptor
file:hdfs://epsdatalake/hivewarehouse/disc_dv.db/amf_trans_dv_09142016/.metadata/descriptor.properties
for dataset:amf_trans_dv_09142016
at
org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.load(FileSystemMetadataProvider.java:127)
at
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:197)
at org.kitesdk.data.Datasets.load(Datasets.java:108)
at org.kitesdk.data.Datasets.load(Datasets.java:140)
at
org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:92)
at
org.kitesdk.data.mapreduce.DatasetKeyInputFormat$ConfigBuilder.readFrom(DatasetKeyInputFormat.java:139)
at
org.apache.sqoop.mapreduce.JdbcExportJob.configureInputFormat(JdbcExportJob.java:84)
at
org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:424)
at
org.apache.sqoop.manager.oracle.OraOopConnManager.exportTable(OraOopConnManager.java:320)
at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:81)
at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:100)
{noformat}
\\
sqoop export's kiteSDK looks for *.metadata/descriptor.properties* file,
but what kite-dataset utility generates has only *.metadata/schemas/1.asvc*
The process has to be repeatable / scriptable that's why we were looking at
different options to generate .metadata automatically,
including using kite-dataset commands. It would be awesome if sqoop would
generate .metadata that KiteSDK expects, if .metadata not found.
> Export parquet files to RDBMS: don't require .metadata for parquet files
> ------------------------------------------------------------------------
>
> Key: SQOOP-2907
> URL: https://issues.apache.org/jira/browse/SQOOP-2907
> Project: Sqoop
> Issue Type: Improvement
> Components: metastore
> Affects Versions: 1.4.6
> Environment: sqoop 1.4.6
> export parquet files to Oracle
> Reporter: Ruslan Dautkhanov
>
> Kite currently requires .metadata.
> Parquet files have their own metadata stored along data files.
> It would be great for Export operation on parquet files to RDBMS not to
> require .metadata.
> We have most of the files created by Spark and Hive, and they don't create
> .metadata, it only Kite that does.
> It makes sqoop export of parquet files usability very limited.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)