[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797336#action_12797336
 ] 

Leonid Furman commented on MAPREDUCE-1339:
------------------------------------------

Hi Aron,

I am connecting to Oracle database. The issue with DataDrivenDBInputFormat 
explains the problem I am experiencing.

Thank you!
Leonid.

> Sqoop full table import job times out when using the split-by attribute
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1339
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1339
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>    Affects Versions: 0.22.0
>            Reporter: Leonid Furman
>            Priority: Critical
>             Fix For: 0.22.0
>
>
> Problem
> ------------
> When running sqoop command for full table import with split-by attribute 
> specified, as follows:
> sqoop --connect CONNECT_STRING --username USER_NAME --password PASSWORD 
> --table TABLE_NAME --fields-terminated-by \\0x01 --as-textfile  
> --warehouse-dir OUTPUT_DIR split-by RECORD_ID
> Sqoop is going to transform the split-by attribute to ORDER BY clause and run 
> the following query in SQL (say, Oracle):
> SELECT * FROM TABLE_NAME ORDER BY RECORD_ID
> If the table has, for example, 20 million records, the ORDER BY part will 
> increase the query running significantly, eventually causing time out, and 
> resulting in no output written to Hadoop file system.
> Proposed solution
> -------------------------
> Not to append the ORDER_BY clause to SQL query if no where clause is 
> specified.
> Can there be any issues with this solution?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to