[ https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783192#action_12783192 ]
Aaron Kimball commented on MAPREDUCE-1224: ------------------------------------------ @Jeff Sqoop is already using the ResultSetMetaData associated with the query, rather than trying to read the DatabaseMetaData directly. Especially when we eventually support arbitrary user-supplied queries, this will be necessary. It can also be tricky to set all the parameters for a DatabaseMetaData correctly in a generic way. But to get at ResultSetMetaData (which definitely includes the proper typing information), a query must be submitted. @Spenser This is a good catch and improvement! What database are you testing against? This patch passes unit tests against HSQLDB, PostgreSQL, and Oracle, so +1 from me. For PostgreSQL and MySQL, Sqoop uses {{connection.setFetchSize()}} to specify a row-buffered (rather than table-buffered) result, so it returns fast. But unfortunately, {{setFetchSize()}} is, like everything else in JDBC, poorly specified, so there isn't a good way to do this generically. This is a good way to ensure that the query returns quickly even if the database does not respect a row-buffered connection. > Calling "SELECT t.* from <table> AS t" to get meta information is too > expensive for big tables > ---------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-1224 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/sqoop > Affects Versions: 0.20.1 > Environment: all platforms, generic jdbc driver > Reporter: Spencer Ho > Attachments: MAPREDUCE-1224.patch, SqlManager.java > > > The SqlManager uses the query, "SELECT t.* from <table> AS t" to get table > spec is too expensive for big tables, and it was called twice to generate > column names and types. For tables that are big enough to be map-reduced, > this is too expensive to make sqoop useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.