[
https://issues.apache.org/jira/browse/SQOOP-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957738#comment-16957738
]
Pankaj Arora commented on SQOOP-2874:
-------------------------------------
Is there any ETA for the issue?
> Highlight Sqoop import with --as-parquetfile use cases (Dataset name <NAME>
> is not alphanumeric (plus '_'))
> -----------------------------------------------------------------------------------------------------------
>
> Key: SQOOP-2874
> URL: https://issues.apache.org/jira/browse/SQOOP-2874
> Project: Sqoop
> Issue Type: Improvement
> Components: docs
> Reporter: Markus Kemper
> Assignee: Markus Kemper
> Priority: Major
> Attachments: Jira_SQOOP-2874_TestCases.txt
>
>
> Hello Sqoop Community,
> Would it be possible to request some documentation enhancements?
> The ask is here is to proactively help raise awareness and improve user
> experience with a few specific use cases [1] where some Sqoop commands have
> restricted character options when using import with --as-parquetfile.
> My understanding is Sqoop1 currently relies on Kite Datasets to write Parquet
> files. From the Kite documentation [3] we see that to ensure compatibility
> (with Hive, etc.), Kite imposes some restrictions on Names and Namespaces
> which bubble up in Sqoop.
> The following Sqoop use cases when using import with --as-parquetfile result
> in the error [2] below. Full tests cases for each scenario are attached. If
> it is an option to enhance the Sqoop documentation for these use cases I am
> happy to provide proposed changes, let me know.
> [1] Use Cases:
> 1. sqoop import --as-parquetfile + --target-dir
> /<path>/<rdbms_database>.<table>
> 1.1. The '.' is not allowed
> 2. sqoop import --as-parquetfile + --table <rdbms_database>.<table> + (no
> --target-dir)
> 2.1. The '.' is not allowed, this is essentially the same as (1)
> 3. sqoop import --as-parquetfile + --hive-import --table
> <hive_database>.<table>
> 3.1. The proper usage is to use --hive-database with --hive-table however
> with --as-textfile --hive-table works with <hive_database>.<table>
> [2] Kite Error:
> 16/03/06 08:45:56 ERROR sqoop.Sqoop: Got exception running Sqoop:
> org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not
> alphanumeric (plus '_')
> org.kitesdk.data.ValidationException: Dataset name DATABASE.TABLE is not
> alphanumeric (plus '_')
> at
> org.kitesdk.data.ValidationException.check(ValidationException.java:55)
> at
> org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:105)
> at org.kitesdk.data.spi.Compatibility.check(Compatibility.java:68)
> at
> org.kitesdk.data.spi.filesystem.FileSystemMetadataProvider.create(FileSystemMetadataProvider.java:209)
> at
> org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.create(FileSystemDatasetRepository.java:137)
> at org.kitesdk.data.Datasets.create(Datasets.java:239)
> at org.kitesdk.data.Datasets.create(Datasets.java:307)
> at
> org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:141)
> at
> org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:119)
> at
> org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:130)
> at
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
> at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
> at
> org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:444)
> at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
> at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
> at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
> at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
> at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
> [3] Kite Documenation:
> http://kitesdk.org/docs/1.0.0/introduction-to-datasets.html
> Names and Namespaces
> URIs also define a name and namespace for your dataset. Kite uses these
> values when the underlying system has the same concept (for example, Hive).
> The name and namespace are typically the last two values in a URI. For
> example, if you create a dataset using the URI
> dataset:hive:fact_tables/ratings, Kite stores a Hive table ratings in the
> fact_tables Hive database. If you create a dataset using the URI
> dataset:hdfs:/user/cloudera/fact_tables/ratings, Kite stores an HDFS dataset
> named ratings in the fact_tables namespace. To ensure compatibility with
> Hive and other underlying systems, names and namespaces in URIs must be made
> of alphanumeric or underscore (_) characters and cannot start with a number.
> Thanks, Markus
--
This message was sent by Atlassian Jira
(v8.3.4#803005)