[jira] [Commented] (SQOOP-3151) Sqoop export HDFS file type auto detection can pick wrong type

Ram (Jira) Fri, 28 Aug 2020 06:56:39 -0700


    [ 
https://issues.apache.org/jira/browse/SQOOP-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186552#comment-17186552
 ]


Ram commented on SQOOP-3151:
----------------------------

[[email protected]] [~BoglarkaEgyed]

We are using *sqoop 1.4.7* to upload parquet data that is stored in HDFS - 
*Plain parquet files and NOT a Hive table*

**We're still facing the same issue - 

 
{code:java}
20/08/28 13:37:02 ERROR sqoop.Sqoop: Got exception running Sqoop: 
org.kitesdk.data.DatasetIOException: Cannot access descriptor location: 
hdfs:///<location>/part-00000-f9f92493-36a1-4714-bcc6-291c118cf599-c000/snappy/parquet/.metadata
org.kitesdk.data.DatasetIOException: Cannot access descriptor location:  
hdfs:///<location>/part-00000-f9f92493-36a1-4714-bcc6-291c118cf599-c000/snappy/parquet/.metadata{code}
The command we're running - 

 
{code:java}
/sqoop-1.4.7.bin__hadoop-2.6.0/bin/sqoop export --connect 
jdbc:postgresql://<postgres_db_details> --username <username> --password 
<password> --table <table_name> --export-dir 
hdfs:///<location>/part-00000-f9f92493-36a1-4714-bcc6-291c118cf599-c000.parquet
{code}
Postgres JAR - postgresql-42.2.11.jar

Please do suggest a solution ASAP.

> Sqoop export HDFS file type auto detection can pick wrong type
> --------------------------------------------------------------
>
>                 Key: SQOOP-3151
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3151
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.6
>            Reporter: Boglarka Egyed
>            Assignee: Sandish Kumar HN
>            Priority: Major
>
> It appears that Sqoop export tries to detect the file format by reading the 
> first 3 characters of a file. Based on that header, the appropriate file 
> reader is used. However, if the result set happens to contain the header 
> sequence, the wrong reader is chosen resulting in a misleading error.
> For example, if someone is exporting a table in which one of the field values 
> is "PART". Since Sqoop sees the letters "PAR", it is invoking the Kite SDK as 
> it assumes the file is in Parquet format. This leads to a misleading error:
> ERROR sqoop.Sqoop: Got exception running Sqoop: 
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not 
> exist: hdfs://<path>.metadata 
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not 
> exist: hdfs://<path>.metadata
> This can be reproduced easily, using Hive as a real world example:
> > create table test2 (val string);
> > insert into test1 values ('PAR');
> Then run a sqoop export against the table data:
> $ sqoop export --connect $MYCONN --username $MYUSER --password $MYPWD -m 1 
> --export-dir /user/hive/warehouse/test --table $MYTABLE
> Sqoop will fail with the following:
> ERROR sqoop.Sqoop: Got exception running Sqoop: 
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not 
> exist: hdfs://<path>.metadata
> org.kitesdk.data.DatasetNotFoundException: Descriptor location does not 
> exist: hdfs://<path>.metadata
> Changing value from "PAR" to something else, like 'Obj' (Avro) or 'SEQ' 
> (sequencefile), which will result in similar errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (SQOOP-3151) Sqoop export HDFS file type auto detection can pick wrong type

Reply via email to