[
https://issues.apache.org/jira/browse/SQOOP-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637836#comment-15637836
]
Markus Kemper commented on SQOOP-3046:
--------------------------------------
Linking SQOOP-3010 in the event that this capability is not technically
possible.
> Add support for (import + --hcatalog* + --as-parquetfile)
> ----------------------------------------------------------
>
> Key: SQOOP-3046
> URL: https://issues.apache.org/jira/browse/SQOOP-3046
> Project: Sqoop
> Issue Type: Improvement
> Components: hive-integration
> Reporter: Markus Kemper
>
> This is a request to identify a way to support Sqoop import with --hcatalog
> options when writing Parquet data files. The test case below demonstrates
> the issue.
> CODE SNIP
> {noformat}
> ../MapredParquetOutputFormat.java
> 69 @Override
> 70 public RecordWriter<Void, ParquetHiveRecord> getRecordWriter(
> 71 final FileSystem ignored,
> 72 final JobConf job,
> 73 final String name,
> 74 final Progressable progress
> 75 ) throws IOException {
> 76 throw new RuntimeException("Should never be used");
> 77 }
> {noformat}
> TEST CASE:
> {noformat}
> STEP 01 - Create MySQL Tables
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query
> "create table t1 (c_int int, c_date date, c_timestamp timestamp)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query
> "describe t1"
> ---------------------------------------------------------------------------------------------------------
> | Field | Type | Null | Key | Default
> | Extra |
> ---------------------------------------------------------------------------------------------------------
> | c_int | int(11) | YES | | (null)
> | |
> | c_date | date | YES | | (null)
> | |
> | c_timestamp | timestamp | NO | | CURRENT_TIMESTAMP
> | on update CURRENT_TIMESTAMP |
> ---------------------------------------------------------------------------------------------------------
> STEP 02 : Insert and Select Row
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query
> "insert into t1 values (1, current_date(), current_timestamp())"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query
> "select * from t1"
> --------------------------------------------------
> | c_int | c_date | c_timestamp |
> --------------------------------------------------
> | 1 | 2016-10-26 | 2016-10-26 14:30:33.0 |
> --------------------------------------------------
> beeline -u jdbc:hive2:// -e "use default; drop table t1"
> sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username
> $MYUSER --password $MYPSWD --table t1 --hcatalog-database default
> --hcatalog-table t1 --create-hcatalog-table --hcatalog-storage-stanza 'stored
> as parquet' --num-mappers 1
> [sqoop console debug]
> 16/11/02 20:25:15 INFO mapreduce.Job: Task Id :
> attempt_1478089149450_0046_m_000000_0, Status : FAILED
> Error: java.lang.RuntimeException: Should never be used
> at
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76)
> at
> org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102)
> at
> org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> [yarn maptask debug]
> 2016-11-02 20:25:15,565 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: 1=1 AND 1=1
> 2016-11-02 20:25:15,583 DEBUG [main]
> org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat: Creating db record
> reader for db product: MYSQL
> 2016-11-02 20:25:15,613 INFO [main]
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output
> Committer Algorithm version is 1
> 2016-11-02 20:25:15,614 INFO [main]
> org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is
> deprecated. Instead, use mapreduce.job.output.key.class
> 2016-11-02 20:25:15,620 INFO [main]
> org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class
> is deprecated. Instead, use mapreduce.job.output.value.class
> 2016-11-02 20:25:15,633 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.lang.RuntimeException: Should never be used
> at
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getRecordWriter(MapredParquetOutputFormat.java:76)
> at
> org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:102)
> at
> org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)