[ https://issues.apache.org/jira/browse/MAPREDUCE-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797336#action_12797336 ]
Leonid Furman commented on MAPREDUCE-1339: ------------------------------------------ Hi Aron, I am connecting to Oracle database. The issue with DataDrivenDBInputFormat explains the problem I am experiencing. Thank you! Leonid. > Sqoop full table import job times out when using the split-by attribute > ----------------------------------------------------------------------- > > Key: MAPREDUCE-1339 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1339 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/sqoop > Affects Versions: 0.22.0 > Reporter: Leonid Furman > Priority: Critical > Fix For: 0.22.0 > > > Problem > ------------ > When running sqoop command for full table import with split-by attribute > specified, as follows: > sqoop --connect CONNECT_STRING --username USER_NAME --password PASSWORD > --table TABLE_NAME --fields-terminated-by \\0x01 --as-textfile > --warehouse-dir OUTPUT_DIR split-by RECORD_ID > Sqoop is going to transform the split-by attribute to ORDER BY clause and run > the following query in SQL (say, Oracle): > SELECT * FROM TABLE_NAME ORDER BY RECORD_ID > If the table has, for example, 20 million records, the ORDER BY part will > increase the query running significantly, eventually causing time out, and > resulting in no output written to Hadoop file system. > Proposed solution > ------------------------- > Not to append the ORDER_BY clause to SQL query if no where clause is > specified. > Can there be any issues with this solution? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.