[jira] Commented: (MAPREDUCE-1224) Calling SELECT t.* from table AS t to get meta information is too expensive for big tables
[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796710#action_12796710 ] Hudson commented on MAPREDUCE-1224: --- Integrated in Hadoop-Mapreduce-trunk #196 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/196/]) Calling SELECT t.* from table AS t to get meta information is too expensive for big tables -- Key: MAPREDUCE-1224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Affects Versions: 0.20.1 Environment: all platforms, generic jdbc driver Reporter: Spencer Ho Assignee: Spencer Ho Fix For: 0.22.0 Attachments: MAPREDUCE-1224.patch, SqlManager.java The SqlManager uses the query, SELECT t.* from table AS t to get table spec is too expensive for big tables, and it was called twice to generate column names and types. For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1224) Calling SELECT t.* from table AS t to get meta information is too expensive for big tables
[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783776#action_12783776 ] Spencer Ho commented on MAPREDUCE-1224: --- @Aaron, This particular case that triggered the patch submission is for Microsoft SQL Server. For MySQL, I am using direct mode which works for most of the cases. Calling SELECT t.* from table AS t to get meta information is too expensive for big tables -- Key: MAPREDUCE-1224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Affects Versions: 0.20.1 Environment: all platforms, generic jdbc driver Reporter: Spencer Ho Attachments: MAPREDUCE-1224.patch, SqlManager.java The SqlManager uses the query, SELECT t.* from table AS t to get table spec is too expensive for big tables, and it was called twice to generate column names and types. For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1224) Calling SELECT t.* from table AS t to get meta information is too expensive for big tables
[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783810#action_12783810 ] Aaron Kimball commented on MAPREDUCE-1224: -- Good to know that this works with SQL Server as well. Thanks for the patch. Calling SELECT t.* from table AS t to get meta information is too expensive for big tables -- Key: MAPREDUCE-1224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Affects Versions: 0.20.1 Environment: all platforms, generic jdbc driver Reporter: Spencer Ho Attachments: MAPREDUCE-1224.patch, SqlManager.java The SqlManager uses the query, SELECT t.* from table AS t to get table spec is too expensive for big tables, and it was called twice to generate column names and types. For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1224) Calling SELECT t.* from table AS t to get meta information is too expensive for big tables
[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12783192#action_12783192 ] Aaron Kimball commented on MAPREDUCE-1224: -- @Jeff Sqoop is already using the ResultSetMetaData associated with the query, rather than trying to read the DatabaseMetaData directly. Especially when we eventually support arbitrary user-supplied queries, this will be necessary. It can also be tricky to set all the parameters for a DatabaseMetaData correctly in a generic way. But to get at ResultSetMetaData (which definitely includes the proper typing information), a query must be submitted. @Spenser This is a good catch and improvement! What database are you testing against? This patch passes unit tests against HSQLDB, PostgreSQL, and Oracle, so +1 from me. For PostgreSQL and MySQL, Sqoop uses {{connection.setFetchSize()}} to specify a row-buffered (rather than table-buffered) result, so it returns fast. But unfortunately, {{setFetchSize()}} is, like everything else in JDBC, poorly specified, so there isn't a good way to do this generically. This is a good way to ensure that the query returns quickly even if the database does not respect a row-buffered connection. Calling SELECT t.* from table AS t to get meta information is too expensive for big tables -- Key: MAPREDUCE-1224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Affects Versions: 0.20.1 Environment: all platforms, generic jdbc driver Reporter: Spencer Ho Attachments: MAPREDUCE-1224.patch, SqlManager.java The SqlManager uses the query, SELECT t.* from table AS t to get table spec is too expensive for big tables, and it was called twice to generate column names and types. For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1224) Calling SELECT t.* from table AS t to get meta information is too expensive for big tables
[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780867#action_12780867 ] Jeff Hammerbacher commented on MAPREDUCE-1224: -- Should we try using actual JDBC metadata calls first? See, e.g., http://blog.codebeach.com/2008/12/database-metadata-with-jdbc.html Calling SELECT t.* from table AS t to get meta information is too expensive for big tables -- Key: MAPREDUCE-1224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Affects Versions: 0.20.1 Environment: all platforms, generic jdbc driver Reporter: Spencer Ho Attachments: MAPREDUCE-1224.patch, SqlManager.java The SqlManager uses the query, SELECT t.* from table AS t to get table spec is too expensive for big tables, and it was called twice to generate column names and types. For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1224) Calling SELECT t.* from table AS t to get meta information is too expensive for big tables
[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780906#action_12780906 ] Hadoop QA commented on MAPREDUCE-1224: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12425689/MAPREDUCE-1224.patch against trunk revision 882790. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/256/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/256/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/256/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/256/console This message is automatically generated. Calling SELECT t.* from table AS t to get meta information is too expensive for big tables -- Key: MAPREDUCE-1224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Affects Versions: 0.20.1 Environment: all platforms, generic jdbc driver Reporter: Spencer Ho Attachments: MAPREDUCE-1224.patch, SqlManager.java The SqlManager uses the query, SELECT t.* from table AS t to get table spec is too expensive for big tables, and it was called twice to generate column names and types. For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1224) Calling SELECT t.* from table AS t to get meta information is too expensive for big tables
[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12780349#action_12780349 ] Todd Lipcon commented on MAPREDUCE-1224: Perhaps this could be changed to add WHERE 1 = 0. Any SQL optimizer should evaluate this very quickly and return an empty result set, allowing metadata to be grabbed without actually doing work. Aaron? Calling SELECT t.* from table AS t to get meta information is too expensive for big tables -- Key: MAPREDUCE-1224 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/sqoop Affects Versions: 0.20.1 Environment: all platforms, generic jdbc driver Reporter: Spencer Ho The SqlManager uses the query, SELECT t.* from table AS t to get table spec is too expensive for big tables, and it was called twice to generate column names and types. For tables that are big enough to be map-reduced, this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.