[ 
https://issues.apache.org/jira/browse/SQOOP-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkat Ramachandran updated SQOOP-2387:
---------------------------------------
    Attachment: SQOOP-2387.2.patch

Attaching another patch with all the unit tests pass (including Avro Import 
tests). The approach here is different from the first patch.

Sqoop applies clean column that transforms the column names when generating ORM 
class and works e2e well when the output is HDFS (text or avro).

But, it does not work when the destination is Hive/HCAT as the DDL contains the 
original database column names. This patch actually uses the cleansed column 
names while creating DDL for Hive/HCAT.

IMO, this way the column names are consistent either in avro or Hive/HCAT (with 
special chars replaced by _).
 

> NPE thrown when sqoop tries to import table with column name containing some 
> special character
> ----------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-2387
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2387
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hive-integration
>    Affects Versions: 1.4.5, 1.4.6
>         Environment: HDP 2.2.0.0-2041
>            Reporter: Pavel Benes
>            Priority: Critical
>         Attachments: SQOOP-2387.1.patch, SQOOP-2387.2.patch, 
> SQOOP-2387.patch, joblog.txt, sqoop.log
>
>
> This sqoop import:
> {code}
> sqoop import --connect jdbc:mysql://some.merck.com:1234/dbname --username XXX 
> --password YYY --table some_table --hcatalog-database some_database 
> --hcatalog-table some_table --hive-partition-key mg_version 
> --hive-partition-value 2015-05-28-13-18 -m 1 --verbose --fetch-size 
> -2147483648
> {code}
> fails with with this error:
> {code}
> 2015-06-01 13:20:39,209 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.NullPointerException
>       at 
> org.apache.hive.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:105)
>       at 
> org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper.convertToHCatRecord(SqoopHCatImportHelper.java:194)
>       at 
> org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:52)
>       at 
> org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:34)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
> It seems that the error is caused by a column name containing a hyphen ('-'). 
>  Column names are converted to java identifiers but later this converted name 
> could not be found in HCatalog schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to