Vlad MARIN created SQOOP-2777:
---------------------------------
Summary: Sqoop 1.4.6 import failure on special SQL Server column
names
Key: SQOOP-2777
URL: https://issues.apache.org/jira/browse/SQOOP-2777
Project: Sqoop
Issue Type: Bug
Components: codegen, connectors/sqlserver
Affects Versions: 1.4.6
Environment: Hortonworks Data Platform v2.3.0.0 (build 2557)
RHEL v6.7
Microsoft JDBC driver v4.1
Reporter: Vlad MARIN
I'm using Sqoop to import various tables from an MS SQL Server instance. The
majority of imports are fine but a couple of tables fail to import due to what
apparently is the specific naming of some table columns. Such columns are
actually named "group", "merge", "order", "from", etc.
It looks like when executing the final Java statement, SQL Server thinks those
are actual SQL keywords instead of column names, and hence tries to execute
commands like "group by" or "order by". Or at least so it seams...
Eliminating those specific columns from the Sqoop import job ends up in a
successful table import. But of course, I need those columns as well and
changing their name in SQL Server is not an option (production DB with
thousands of users and many other live applications connected to it constantly).
Sqoop command sample:
{noformat}
sqoop import
--connect
'jdbc:sqlserver://SERVER\INSTANCE;database=DB;username=USR;password=PWD'
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver
--table TABLE
--columns
IdGGD,IdGGM,IdImage,LinkName,Invert,Merge,Mirror,Order,GGMXTId,GGMXTName,GGMXTP,MarkerP,MarkerW,MarkerH,MarkerX1,MarkerY1,Instruction
--direct
--outdir /some/local/directory
--as-textfile
--target-dir /some/HDFS/directory
--null-string NULL
--null-non-string NULL
--append
{noformat}
The obtained error looks like this:
{noformat}
2016-01-05 18:55:42,469 INFO - [main:] ~ map 0% reduce 0% (Job:1367)
2016-01-05 18:55:47,531 INFO - [main:] ~ Task Id :
attempt_1451990134058_0040_m_000000_0, Status : FAILED (Job:1406)
Error: java.io.IOException: SQLException in nextKeyValue
at
org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at
org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax
near the keyword 'Merge'.
at
com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:217)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1635)
at
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:426)
at
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:372)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:6276)
at
com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1793)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:184)
at
com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:159)
at
com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:284)
at
org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at
org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
... 12 more
{noformat}
One workaround that I found is to use the Sqoop --query option instead of the
--table one. But even in this case I can only import all the columns of the
specific table, without being able to select specific columns.
Sqoop comand sample:
{noformat}
sqoop import
--connect
'jdbc:sqlserver://SERVER\INSTANCE;database=DB;username=USR;password=PWD'
--driver com.microsoft.sqlserver.jdbc.SQLServerDriver
--query 'SELECT * FROM TABLE WHERE $CONDITIONS'
--num-mappers 1
--direct
--outdir /some/local/directory
--as-textfile
--target-dir /some/HDFS/directory
--null-string NULL
--null-non-string NULL
--append
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)