[jira] Commented: (MAPREDUCE-1224) Calling "SELECT t.* from AS t" to get meta information is too expensive for big tables

Fri, 27 Nov 2009 17:05:45 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783192#action_12783192
 ] 

Aaron Kimball commented on MAPREDUCE-1224:
------------------------------------------

@Jeff Sqoop is already using the ResultSetMetaData associated with the query, 
rather than trying to read the DatabaseMetaData directly. Especially when we 
eventually support arbitrary user-supplied queries, this will be necessary. It 
can also be tricky to set all the parameters for a DatabaseMetaData correctly 
in a generic way. But to get at ResultSetMetaData (which definitely includes 
the proper typing information), a query must be submitted.

@Spenser This is a good catch and improvement! What database are you testing 
against? This patch passes unit tests against HSQLDB, PostgreSQL, and Oracle, 
so +1 from me. 

For PostgreSQL and MySQL, Sqoop uses {{connection.setFetchSize()}} to specify a 
row-buffered (rather than table-buffered) result, so it returns fast. But 
unfortunately, {{setFetchSize()}} is, like everything else in JDBC, poorly 
specified, so there isn't a good way to do this generically. This is a good way 
to ensure that the query returns quickly even if the database does not respect 
a row-buffered connection.


> Calling "SELECT t.* from <table> AS t" to get meta information is too 
> expensive for big tables
> ----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1224
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1224
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/sqoop
>    Affects Versions: 0.20.1
>         Environment: all platforms, generic jdbc driver
>            Reporter: Spencer Ho
>         Attachments: MAPREDUCE-1224.patch, SqlManager.java
>
>
> The SqlManager uses the query, "SELECT t.* from <table> AS t" to get table 
> spec is too expensive for big tables, and it was called twice to generate 
> column names and types.  For tables that are big enough to be map-reduced, 
> this is too expensive to make sqoop useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to