Jarek Jarcec Cecho created SQOOP-2783:
-----------------------------------------
Summary: Query import with parquet fails on incompatible schema
Key: SQOOP-2783
URL: https://issues.apache.org/jira/browse/SQOOP-2783
Project: Sqoop
Issue Type: Bug
Reporter: Jarek Jarcec Cecho
Assignee: Jarek Jarcec Cecho
Fix For: 1.4.7
This is a follow up on SQOOP-2582 where we added support for query import into
parquet. It seems that when run on a real cluster (rather then mini-cluster),
the job fails with exception similar to this one:
{code}
16/01/08 09:47:13 INFO mapreduce.Job: Task Id :
attempt_1452259292738_0001_m_000000_2, Status : FAILED
Error: org.kitesdk.data.IncompatibleSchemaException: The type cannot be used to
read from or write to the dataset:
Type schema:
{"type":"record","name":"QueryResult","fields":[{"name":"PROTOCOL_VERSION","type":"int"},{"name":"__cur_result_set","type":["null",{"type":"record","name":"ResultSet","namespace":"java.sql","fields":[]}],"default":null},{"name":"c1_int","type":["null","int"],"default":null},{"name":"c2_date","type":["null",{"type":"record","name":"Date","namespace":"java.sql","fields":[]}],"default":null},{"name":"c3_timestamp","type":["null",{"type":"record","name":"Timestamp","namespace":"java.sql","fields":[]}],"default":null},{"name":"c4_varchar20","type":["null","string"],"default":null},{"name":"__parser","type":["null",{"type":"record","name":"RecordParser","namespace":"com.cloudera.sqoop.lib","fields":[{"name":"delimiters","type":["null",{"type":"record","name":"DelimiterSet","fields":[{"name":"fieldDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"recordDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"enclosedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"escapedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"encloseRequired","type":"boolean"}]}],"default":null},{"name":"outputs","type":["null",{"type":"array","items":"string","java-class":"java.util.ArrayList"}],"default":null}]}],"default":null}]}
Dataset schema: {"type":"record","name":"QueryResult","doc":"Sqoop import of
QueryResult","fields":[{"name":"c1_int","type":["null","int"],"default":null,"columnName":"c1_int","sqlType":"4"},{"name":"c2_date","type":["null","long"],"default":null,"columnName":"c2_date","sqlType":"91"},{"name":"c3_timestamp","type":["null","long"],"default":null,"columnName":"c3_timestamp","sqlType":"93"},{"name":"c4_varchar20","type":["null","string"],"default":null,"columnName":"c4_varchar20","sqlType":"12"}],"tableName":"QueryResult"}
at
org.kitesdk.data.IncompatibleSchemaException.check(IncompatibleSchemaException.java:55)
at
org.kitesdk.data.spi.AbstractRefinableView.<init>(AbstractRefinableView.java:90)
at
org.kitesdk.data.spi.filesystem.FileSystemView.<init>(FileSystemView.java:71)
at
org.kitesdk.data.spi.filesystem.FileSystemPartitionView.<init>(FileSystemPartitionView.java:57)
at
org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:116)
at
org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:129)
at
org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:696)
at
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
at
org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40)
at
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:591)
at
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:602)
at
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:615)
at
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:448)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{code}
Looking into Sqoop and Kite source code I was not able to precisely identify
where is the problem. Not until I found SQOOP-1395/SQOOP-2294 that are talking
about similar problem, just for table based import. I do not clearly understand
why the test added back in SQOOP-2582 is not failing, but I assume that it's
due to the differences in classpath on minicluster versus real cluster.
I would suggest to change the avro schema name generated from {{QueryResult}}
to something more generic, such as {{AutoGeneratedSchema}} that will avoid this
problem. I'm not particularly concerned about backward compatibility here
because it doesn't make much sense to depend on name that can be generated for
every single query based import.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)