[
https://issues.apache.org/jira/browse/SQOOP-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sandish Kumar HN reassigned SQOOP-3151:
---------------------------------------
Assignee: Sandish Kumar HN
> Sqoop export HDFS file type auto detection can pick wrong type
> --------------------------------------------------------------
>
> Key: SQOOP-3151
> URL: https://issues.apache.org/jira/browse/SQOOP-3151
> Project: Sqoop
> Issue Type: Bug
> Affects Versions: 1.4.6
> Reporter: Boglarka Egyed
> Assignee: Sandish Kumar HN
>
> It appears that Sqoop export tries to detect the file format by reading the
> first 3 characters of a file. Based on that header, the appropriate file
> reader is used. However, if the result set happens to contain the header
> sequence, the wrong reader is chosen resulting in a misleading error.
> For example, if someone is exporting a table in which one of the field values
> is "PART". Since Sqoop sees the letters "PAR", it is invoking the Kite SDK as
> it assumes the file is in Parquet format. This leads to a misleading error:
> ERROR sqoop.Sqoop: Got exception running Sqoop:
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
> exist: hdfs://<path>.metadata
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
> exist: hdfs://<path>.metadata
> This can be reproduced easily, using Hive as a real world example:
> > create table test2 (val string);
> > insert into test1 values ('PAR');
> Then run a sqoop export against the table data:
> $ sqoop export --connect $MYCONN --username $MYUSER --password $MYPWD -m 1
> --export-dir /user/hive/warehouse/test --table $MYTABLE
> Sqoop will fail with the following:
> ERROR sqoop.Sqoop: Got exception running Sqoop:
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
> exist: hdfs://<path>.metadata
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not
> exist: hdfs://<path>.metadata
> Changing value from "PAR" to something else, like 'Obj' (Avro) or 'SEQ'
> (sequencefile), which will result in similar errors.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)