Re: Review Request 57551: SQOOP-2272 - Import decimal columns from mysql to hive 0.14

Eric Lin Mon, 27 Mar 2017 21:36:07 -0700


> On March 20, 2017, 4:07 p.m., Attila Szabo wrote:
> > Hey Eric,
> > 
> > For the first shot I do understand you motivation here, and also attached 
> > some comments/requests around the implementation.
> > 
> > Although I'm a bit concerned about the big picture as well B/C I do feel in 
> > the current form it would be a breaking change.
> > 
> > E.g. Still so far we'd created tables in hive with double columns, but 
> > after applying the change all of these imports + tables would contain 
> > DECIMAL values. Wouldn't this change confuse the end users? (for me it 
> > would be very much unexpected especially after a version upgrade!)
> > 
> > Questions on my side:
> > Am I right your change would only effect Sqoop imports when the import ends 
> > up in creating a new Hive table?
> > Wouldn't this cause problems on the end user side (e.g. in case of big 
> > incremental data ingest cases with many tables)?
> > What will happen with the jobs inserting data into an already existing Hive 
> > table? Have you tested that scenario as well?
> > Although your change seems to be much more type/Hive agnostic, shouldn't we 
> > introduce a fallback scenario for the users (e.g. turning off the 
> > Decimal->Decimal mapping with an option). Or should we just advise the 
> > explicit type mapping for the users in this case?
> > 
> > Thanks for your answers,
> > Attila


Hi Attila,

Thanks for your feedback. To answer your questions:

> Am I right your change would only effect Sqoop imports when the import ends 
> up in creating a new Hive table?
Yes, it only affects when creating a new table, append mode will not be affected

> Wouldn't this cause problems on the end user side (e.g. in case of big 
> incremental data ingest cases with many tables)?
I am not sure about your example here, can you give me a bit more details on 
your use case?

> What will happen with the jobs inserting data into an already existing Hive 
> table? Have you tested that scenario as well?
I have tested manually, the CREATE TABLE IF EXISTS will be ignored and whatever 
table was before will not be changed. After that it will be LOAD DATA INPATH 
command, so nothing will change.

> Although your change seems to be much more type/Hive agnostic, shouldn't we 
> introduce a fallback scenario for the users (e.g. turning off the 
> Decimal->Decimal mapping with an option). Or should we just advise the 
> explicit type mapping for the users in this case?
I would think that using --map-column-hive should be sufficient, correct me if 
I am wrong here.

Thanks


> On March 20, 2017, 4:07 p.m., Attila Szabo wrote:
> > src/java/org/apache/sqoop/hive/TableDefWriter.java
> > Lines 126-130 (patched)
> > <https://reviews.apache.org/r/57551/diff/4/?file=1668178#file1668178line127>
> >
> >     Why do we add "null" values to a typed Integer list here?
> >     It looks like a bit wired for the first shot, and could be the source 
> > of unexpected NPE.
> >     
> >     Could you please provide some more information here?

Hi Attila, the function connManager.getColumnInfoForQuery returns column info 
as an arraylist with 3 integer values (typeId, precision and scale). As 
precision and scale do not apply to other column types, putting value of "0" 
will be misleading and difficult for any code need to decide if precision or 
scale need to be included or not. So I think putting NULL value makes sense, so 
that we can check accordingly. But if you think  there is a better way, please 
let me know and I am happy to change it.


> On March 20, 2017, 4:07 p.m., Attila Szabo wrote:
> > src/java/org/apache/sqoop/hive/TableDefWriter.java
> > Lines 142-145 (patched)
> > <https://reviews.apache.org/r/57551/diff/4/?file=1668178#file1668178line143>
> >
> >     We should not depend on null references.
> >     
> >     Please use rather and empty list and check if it is still empty after 
> > the retrieval of the columnInfo.

Function connManager.getColumnInfoForQuery can return NULL value, so I think we 
should check that. I have actually added checking "columnTypes.size() == 0", 
however, it breaks quite a few test cases in class TestTableDefWriter, so I 
think I will just not add the checking.


> On March 20, 2017, 4:07 p.m., Attila Szabo wrote:
> > src/java/org/apache/sqoop/hive/TableDefWriter.java
> > Lines 192-200 (patched)
> > <https://reviews.apache.org/r/57551/diff/4/?file=1668178#file1668178line193>
> >
> >     I'm quite confused here as well according to the null values 
> > (especially because if we'd put null values into the position 1 and 2 in 
> > the code befor, then it will just assign null to the variables which hold 
> > null values already).
> >     
> >     Could you please provide more highlight here?

There are two possible ways to get the columnTypes, see below code:

    if (externalColTypes != null) {
      columnTypes = new SqlTypeMap<String, List<Integer>>();
      // Use pre-defined column types.
      for(String key : externalColTypes.keySet()) {
        List<Integer> info = new ArrayList<Integer>();
        info.add(externalColTypes.get(key));
        info.add(null);
        info.add(null);
        columnTypes.put(key, info);
      }

    } else {
      // Get these from the database.
      if (null != inputTableName) {
        columnTypes = connManager.getColumnInfo(inputTableName);
      } else {
        columnTypes = connManager.getColumnInfoForQuery(options.getSqlQuery());
      }
    }
    
So if we hit "if" statement, it will be NULL, but if we hit "else" statement, 
it will return precision and scale values. The code here is to simply retrieve 
the precision and scale values, which can be checked later.


> On March 20, 2017, 4:07 p.m., Attila Szabo wrote:
> > src/java/org/apache/sqoop/hive/TableDefWriter.java
> > Lines 210-221 (patched)
> > <https://reviews.apache.org/r/57551/diff/4/?file=1668178#file1668178line211>
> >
> >     Although this is a good to know / nice catch thingy, at least we do 
> > need some log statements here (e.g. with info/debug level). We should not 
> > do anything like this silently!
> >     
> >     Please introduce logging statement(s) with the required level of 
> > information!

I will add the LOG in the latest patch, thanks for the suggestion, it totally 
makes sense.


- Eric


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57551/#review169453
-----------------------------------------------------------


On March 28, 2017, 4:35 a.m., Eric Lin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57551/
> -----------------------------------------------------------
> 
> (Updated March 28, 2017, 4:35 a.m.)
> 
> 
> Review request for Sqoop, Attila Szabo and Szabolcs Vasas.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Currently Sqoop converts DECIMAL from RDMS into DOUBLE in Hive, which is not 
> correct as user will lose precisions. Since Hive supports DECIMAL long ago, 
> we should support DECIMAL to DECIMAL conversion from Sqoop to Hive.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/hive/HiveTypes.java ad00535 
>   src/java/org/apache/sqoop/hive/TableDefWriter.java 32fcca3 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 33e0cc4 
>   src/test/com/cloudera/sqoop/hive/TestTableDefWriter.java dbf0dde 
>   testdata/hive/scripts/decimalImport.q PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/57551/diff/5/
> 
> 
> Testing
> -------
> 
> Test case + maunaul test
> 
> 
> Thanks,
> 
> Eric Lin
> 
>

Re: Review Request 57551: SQOOP-2272 - Import decimal columns from mysql to hive 0.14

Reply via email to