Jarek Jarcec Cecho created SQOOP-2783:
-----------------------------------------

             Summary: Query import with parquet fails on incompatible schema
                 Key: SQOOP-2783
                 URL: https://issues.apache.org/jira/browse/SQOOP-2783
             Project: Sqoop
          Issue Type: Bug
            Reporter: Jarek Jarcec Cecho
            Assignee: Jarek Jarcec Cecho
             Fix For: 1.4.7


This is a follow up on SQOOP-2582 where we added support for query import into 
parquet. It seems that when run on a real cluster (rather then mini-cluster), 
the job fails with exception similar to this one:

{code}
16/01/08 09:47:13 INFO mapreduce.Job: Task Id : 
attempt_1452259292738_0001_m_000000_2, Status : FAILED
Error: org.kitesdk.data.IncompatibleSchemaException: The type cannot be used to 
read from or write to the dataset:
Type schema: 
{"type":"record","name":"QueryResult","fields":[{"name":"PROTOCOL_VERSION","type":"int"},{"name":"__cur_result_set","type":["null",{"type":"record","name":"ResultSet","namespace":"java.sql","fields":[]}],"default":null},{"name":"c1_int","type":["null","int"],"default":null},{"name":"c2_date","type":["null",{"type":"record","name":"Date","namespace":"java.sql","fields":[]}],"default":null},{"name":"c3_timestamp","type":["null",{"type":"record","name":"Timestamp","namespace":"java.sql","fields":[]}],"default":null},{"name":"c4_varchar20","type":["null","string"],"default":null},{"name":"__parser","type":["null",{"type":"record","name":"RecordParser","namespace":"com.cloudera.sqoop.lib","fields":[{"name":"delimiters","type":["null",{"type":"record","name":"DelimiterSet","fields":[{"name":"fieldDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"recordDelim","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"enclosedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"escapedBy","type":{"type":"int","java-class":"java.lang.Character"}},{"name":"encloseRequired","type":"boolean"}]}],"default":null},{"name":"outputs","type":["null",{"type":"array","items":"string","java-class":"java.util.ArrayList"}],"default":null}]}],"default":null}]}
Dataset schema: {"type":"record","name":"QueryResult","doc":"Sqoop import of 
QueryResult","fields":[{"name":"c1_int","type":["null","int"],"default":null,"columnName":"c1_int","sqlType":"4"},{"name":"c2_date","type":["null","long"],"default":null,"columnName":"c2_date","sqlType":"91"},{"name":"c3_timestamp","type":["null","long"],"default":null,"columnName":"c3_timestamp","sqlType":"93"},{"name":"c4_varchar20","type":["null","string"],"default":null,"columnName":"c4_varchar20","sqlType":"12"}],"tableName":"QueryResult"}
        at 
org.kitesdk.data.IncompatibleSchemaException.check(IncompatibleSchemaException.java:55)
        at 
org.kitesdk.data.spi.AbstractRefinableView.<init>(AbstractRefinableView.java:90)
        at 
org.kitesdk.data.spi.filesystem.FileSystemView.<init>(FileSystemView.java:71)
        at 
org.kitesdk.data.spi.filesystem.FileSystemPartitionView.<init>(FileSystemPartitionView.java:57)
        at 
org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:116)
        at 
org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:129)
        at 
org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:696)
        at 
org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
        at 
org.kitesdk.data.spi.AbstractDatasetRepository.load(AbstractDatasetRepository.java:40)
        at 
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadJobDataset(DatasetKeyOutputFormat.java:591)
        at 
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptDataset(DatasetKeyOutputFormat.java:602)
        at 
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateTaskAttemptView(DatasetKeyOutputFormat.java:615)
        at 
org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.getRecordWriter(DatasetKeyOutputFormat.java:448)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:647)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1707)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{code}

Looking into Sqoop and Kite source code I was not able to precisely identify 
where is the problem. Not until I found SQOOP-1395/SQOOP-2294 that are talking 
about similar problem, just for table based import. I do not clearly understand 
why the test added back in SQOOP-2582 is not failing, but I assume that it's 
due to the differences in classpath on minicluster versus real cluster.

I would suggest to change the avro schema name generated from {{QueryResult}} 
to something more generic, such as {{AutoGeneratedSchema}} that will avoid this 
problem. I'm not particularly concerned about backward compatibility here 
because it doesn't make much sense to depend on name that can be generated for 
every single query based import.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to