[
https://issues.apache.org/jira/browse/SQOOP-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jarek Jarcec Cecho updated SQOOP-2783:
--------------------------------------
Attachment: SQOOP-2783.patch
Attaching simple patch changing the default schema name as suggested. I've
verified that the patch works on real cluster.
> Query import with parquet fails on incompatible schema
> ------------------------------------------------------
>
> Key: SQOOP-2783
> URL: https://issues.apache.org/jira/browse/SQOOP-2783
> Project: Sqoop
> Issue Type: Bug
> Reporter: Jarek Jarcec Cecho
> Assignee: Jarek Jarcec Cecho
> Fix For: 1.4.7
>
> Attachments: SQOOP-2783.patch
>
>
> This is a follow up on SQOOP-2582 where we added support for query import
> into parquet. It seems that when run on a real cluster (rather then
> mini-cluster), the job fails with exception similar to this one:
> {code}
> 16/01/08 09:47:13 INFO mapreduce.Job: Task Id :
> attempt_1452259292738_0001_m_000000_2, Status : FAILED
> Error: org.kitesdk.data.IncompatibleSchemaException: The type cannot be used
> to read from or write to the dataset:
> Type schema:
> {"type":"record","name":"QueryResult","fields":[{"name":"PROTOCOL_VERSION","type":"int"},{"name":"__cur_result_set","type":["null",{"type":"record","name":"ResultSet","namespace":"java.sql","fields":[]}],"default":null},{"name":"c1_int","type":["null","int"],"default":null},{"name":"c2_date","type":["null",{"type":"record","name":"Date","namespace":"java.sql","fields":[]}],"default":null},{"name":"c3_timestamp","type":["null",{"type":"record","name":"Timestamp","namespace":"java.sql","fields":[]}],"default":null},{"name":"c4_varchar20","type":["null","string"],"default":null},{"name":"__parser","type":["null",{"type":"record","name":"RecordParser","namespace":"com.cloudera.sqoop.lib","fields":[{"name":"delimiters","type":["null",{"type":"record","name":"DelimiterSet","fields":[{"name":"fieldDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"recordDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"enclosedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"escapedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"encloseRequired","type":"boolean"}]}],"default":null},{"name":"outputs","type":["null",{"type":"array","items":"string","java-class":"java.util.ArrayList"}],"default":null}]}],"default":null}]}
> Dataset schema: {"type":"record","name":"QueryResult","doc":"Sqoop import of
> QueryResult","fields":[{"name":"c1_int","type":["null","int"],"default":null,"columnName":"c1_int","sqlType":"4"},{"name":"c2_date","type":["null","long"],"default":null,"columnName":"c2_date","sqlType":"91"},{"name":"c3_timestamp","type":["null","long"],"default":null,"columnName":"c3_timestamp","sqlType":"93"},{"name":"c4_varchar20","type":["null","string"],"default":null,"columnName":"c4_varchar20","sqlType":"12"}],"tableName":"QueryResult"}
> at
> org.kitesdk.data.IncompatibleSchemaException.check(IncompatibleSchemaException.java:55)
> at
> org.kitesdk.data.spi.AbstractRefinableView.<init>(AbstractRefinableView.java:90)
> at
> org.kitesdk.data.spi.filesystem.FileSystemView.<init>(FileSystemView.java:71)
> at
> org.kitesdk.data.spi.filesystem.FileSystemPartitionView.<init>(FileSystemPartitionView.java:57)
> at
> org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:116)
> at
> org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:129)
> at
> org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:696)
> at
> org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
> at
> org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40)
> at
> org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:591)
> at
> org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:602)
> at
> org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:615)
> at
> org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:448)
> at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
> Looking into Sqoop and Kite source code I was not able to precisely identify
> where is the problem. Not until I found SQOOP-1395/SQOOP-2294 that are
> talking about similar problem, just for table based import. I do not clearly
> understand why the test added back in SQOOP-2582 is not failing, but I assume
> that it's due to the differences in classpath on minicluster versus real
> cluster.
> I would suggest to change the avro schema name generated from {{QueryResult}}
> to something more generic, such as {{AutoGeneratedSchema}} that will avoid
> this problem. I'm not particularly concerned about backward compatibility
> here because it doesn't make much sense to depend on name that can be
> generated for every single query based import.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)