[ https://issues.apache.org/jira/browse/SQOOP-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sandish Kumar HN reassigned SQOOP-3151: --------------------------------------- Assignee: Sandish Kumar HN > Sqoop export HDFS file type auto detection can pick wrong type > -------------------------------------------------------------- > > Key: SQOOP-3151 > URL: https://issues.apache.org/jira/browse/SQOOP-3151 > Project: Sqoop > Issue Type: Bug > Affects Versions: 1.4.6 > Reporter: Boglarka Egyed > Assignee: Sandish Kumar HN > > It appears that Sqoop export tries to detect the file format by reading the > first 3 characters of a file. Based on that header, the appropriate file > reader is used. However, if the result set happens to contain the header > sequence, the wrong reader is chosen resulting in a misleading error. > For example, if someone is exporting a table in which one of the field values > is "PART". Since Sqoop sees the letters "PAR", it is invoking the Kite SDK as > it assumes the file is in Parquet format. This leads to a misleading error: > ERROR sqoop.Sqoop: Got exception running Sqoop: > org.kitesdk.data.DatasetNotFoundException: Descriptor location does not > exist: hdfs://<path>.metadata > org.kitesdk.data.DatasetNotFoundException: Descriptor location does not > exist: hdfs://<path>.metadata > This can be reproduced easily, using Hive as a real world example: > > create table test2 (val string); > > insert into test1 values ('PAR'); > Then run a sqoop export against the table data: > $ sqoop export --connect $MYCONN --username $MYUSER --password $MYPWD -m 1 > --export-dir /user/hive/warehouse/test --table $MYTABLE > Sqoop will fail with the following: > ERROR sqoop.Sqoop: Got exception running Sqoop: > org.kitesdk.data.DatasetNotFoundException: Descriptor location does not > exist: hdfs://<path>.metadata > org.kitesdk.data.DatasetNotFoundException: Descriptor location does not > exist: hdfs://<path>.metadata > Changing value from "PAR" to something else, like 'Obj' (Avro) or 'SEQ' > (sequencefile), which will result in similar errors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)