[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql
[ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742345#action_12742345 ] Martin Dittus commented on MAPREDUCE-685: - We just found that PostgreSQL shows the same behaviour. What do you think of making this a generic fix instead? It seems Postgres has the same mechanism to enable streaming of ResultSets: http://jdbc.postgresql.org/documentation/83/query.html -- Changing code to cursor mode is as simple as setting the fetch size of the Statement to the appropriate size. Setting the fetch size back to 0 will cause all rows to be cached (the default behaviour). Sqoop will fail with OutOfMemory on large tables using mysql Key: MAPREDUCE-685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.21.0 Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2 The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql
[ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742426#action_12742426 ] Aaron Kimball commented on MAPREDUCE-685: - Because it's not actually the same fix ;) Postgresql wants you to do {{statement.setFetchSize(something_reasonable)}} e.g., 40. MySQL wants you to do {{statement.setFetchSize(INT_MIN)}}. The only cursor modes MySQL supports are fully buffered (fetch size = 0) and fully row-wise cursors (fetch_size = INT_MIN). That having been said, I have just finished a postgresql patch ready to post up here this week :) Just waiting for some existing patches to get committed first so that it applies cleanly. Sqoop will fail with OutOfMemory on large tables using mysql Key: MAPREDUCE-685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.21.0 Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2 The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql
[ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730104#action_12730104 ] Hudson commented on MAPREDUCE-685: -- Integrated in Hadoop-Mapreduce-trunk #20 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/20/]) Sqoop will fail with OutOfMemory on large tables using mysql Key: MAPREDUCE-685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.21.0 Attachments: MAPREDUCE-685.3.patch, MAPREDUCE-685.patch, MAPREDUCE-685.patch.2 The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql
[ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728071#action_12728071 ] Tom White commented on MAPREDUCE-685: - The latest patch no longer applies. Can you please regenerate it Aaron? Sqoop will fail with OutOfMemory on large tables using mysql Key: MAPREDUCE-685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2 The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql
[ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727294#action_12727294 ] Aaron Kimball commented on MAPREDUCE-685: - These test errors are unrelated to this patch. Sqoop will fail with OutOfMemory on large tables using mysql Key: MAPREDUCE-685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2 The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql
[ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727224#action_12727224 ] Hadoop QA commented on MAPREDUCE-685: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12412298/MAPREDUCE-685.patch.2 against trunk revision 790971. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/354/console This message is automatically generated. Sqoop will fail with OutOfMemory on large tables using mysql Key: MAPREDUCE-685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2 The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-685) Sqoop will fail with OutOfMemory on large tables using mysql
[ https://issues.apache.org/jira/browse/MAPREDUCE-685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12726175#action_12726175 ] Aaron Kimball commented on MAPREDUCE-685: - Removed SQL_BIG_RESULT. Also, good call re. the null check; no reason not to pass straight through. I've modified the API for execute() accordingly. Sqoop will fail with OutOfMemory on large tables using mysql Key: MAPREDUCE-685 URL: https://issues.apache.org/jira/browse/MAPREDUCE-685 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-685.patch, MAPREDUCE-685.patch.2 The default MySQL JDBC client behavior is to buffer the entire ResultSet in the client before allowing the user to use the ResultSet object. On large SELECTs, this can cause OutOfMemory exceptions, even when the client intends to close the ResultSet after reading only a few rows. The MySQL ConnManager should configure its connection to use row-at-a-time delivery of results to the client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.