Vlad MARIN created SQOOP-2777:
---------------------------------

             Summary: Sqoop 1.4.6 import failure on special SQL Server column 
names
                 Key: SQOOP-2777
                 URL: https://issues.apache.org/jira/browse/SQOOP-2777
             Project: Sqoop
          Issue Type: Bug
          Components: codegen, connectors/sqlserver
    Affects Versions: 1.4.6
         Environment: Hortonworks Data Platform v2.3.0.0 (build 2557)
RHEL v6.7
Microsoft JDBC driver v4.1
            Reporter: Vlad MARIN


I'm using Sqoop to import various tables from an MS SQL Server instance. The 
majority of imports are fine but a couple of tables fail to import due to what 
apparently is the specific naming of some table columns. Such columns are 
actually named "group", "merge", "order", "from", etc.

It looks like when executing the final Java statement, SQL Server thinks those 
are actual SQL keywords instead of column names, and hence tries to execute 
commands like "group by" or "order by". Or at least so it seams...

Eliminating those specific columns from the Sqoop import job ends up in a 
successful table import. But of course, I need those columns as well and 
changing their name in SQL Server is not an option (production DB with 
thousands of users and many other live applications connected to it constantly).

Sqoop command sample:
{noformat}
sqoop import 
--connect 
'jdbc:sqlserver://SERVER\INSTANCE;database=DB;username=USR;password=PWD' 
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
--table TABLE 
--columns 
IdGGD,IdGGM,IdImage,LinkName,Invert,Merge,Mirror,Order,GGMXTId,GGMXTName,GGMXTP,MarkerP,MarkerW,MarkerH,MarkerX1,MarkerY1,Instruction
 
--direct 
--outdir /some/local/directory 
--as-textfile 
--target-dir /some/HDFS/directory 
--null-string NULL 
--null-non-string NULL 
--append
{noformat}

The obtained error looks like this:
{noformat}
2016-01-05 18:55:42,469 INFO  - [main:] ~  map 0% reduce 0% (Job:1367)
2016-01-05 18:55:47,531 INFO  - [main:] ~ Task Id : 
attempt_1451990134058_0040_m_000000_0, Status : FAILED (Job:1406)
Error: java.io.IOException: SQLException in nextKeyValue
        at 
org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
        at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at 
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax 
near the keyword 'Merge'.
        at 
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:217)
        at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1635)
        at 
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:426)
        at 
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:372)
        at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:6276)
        at 
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1793)
        at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:184)
        at 
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:159)
        at 
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:284)
        at 
org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
        at 
org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
        ... 12 more
{noformat}

One workaround that I found is to use the Sqoop --query option instead of the 
--table one. But even in this case I can only import all the columns of the 
specific table, without being able to select specific columns. 

Sqoop comand sample:
{noformat}
sqoop import 
--connect 
'jdbc:sqlserver://SERVER\INSTANCE;database=DB;username=USR;password=PWD' 
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver 
--query 'SELECT * FROM TABLE WHERE $CONDITIONS' 
--num-mappers 1 
--direct 
--outdir /some/local/directory 
--as-textfile 
--target-dir /some/HDFS/directory 
--null-string NULL 
--null-non-string NULL 
--append
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to