> On July 11, 2013, 9:54 p.m., Jarek Cecho wrote:
> > connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java,
> >  line 184
> > <https://reviews.apache.org/r/12451/diff/1/?file=319957#file319957line184>
> >
> >     I'm concerned a bit of using count() aggregate function as it might 
> > lead to another full table scan which might significantly hurt performance. 
> > Maybe we could make the ability for checking nulls in the split by column 
> > optional?
> 
> Mengwei Ding wrote:
>     Yes, this is an issue. I will use 'count(1)' instead.
> 
> Jarek Cecho wrote:
>     I'm afraid that count(1) won't help either. In case that the database 
> engine is not storing the precise number of columns (such as InnoDB in 
> MySQL), queries of type "select count(*/1) from table" will result in full 
> table scan, which might be quite heavy operation.

Yes, I did some research just now. For null values, they won't be indexed in 
database. Thus, to retrieve all null values, it has to scan the whole table. I 
just thought out another idea that we don't necessarily need to check whether 
the column has nulls, instead we could add an extra partition for nulls at any 
time. In this way, we reduce the full table scan to one, since we cannot avoid 
full table scan. By the way, what do you mean by checking nulls in the split by 
column optional ?


- Mengwei


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12451/#review23028
-----------------------------------------------------------


On July 10, 2013, 7:02 p.m., Mengwei Ding wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12451/
> -----------------------------------------------------------
> 
> (Updated July 10, 2013, 7:02 p.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Bugs: SQOOP-1049
>     https://issues.apache.org/jira/browse/SQOOP-1049
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> commit 47e73c30b49be0168459d76bf8993205c7a4f4fc
> Author: Mengwei Ding <mengwei.d...@gmail.com>
> Date:   Wed Jul 10 11:41:05 2013 -0700
> 
>     SQOOP-1049: Sqoop2: Record not imported if partition column value is NULL
> 
> :100644 100644 abcc89d... a940d15... M        
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorConstants.java
> :100644 100644 671bb4a... d331ae8... M        
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorError.java
> :100644 100644 96818ba... 357fefb... M        
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
> :100644 100644 4401800... ff80ed3... M        
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java
> 
> 
> Diffs
> -----
> 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorConstants.java
>  abcc89d 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnectorError.java
>  671bb4a 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
>  96818ba 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportPartitioner.java
>  4401800 
> 
> Diff: https://reviews.apache.org/r/12451/diff/
> 
> 
> Testing
> -------
> 
> Have done a manual test, in which I successfully import a table with some 
> null values in partition column.
> 
> 
> Thanks,
> 
> Mengwei Ding
> 
>

Reply via email to