Re: Review Request 67628: Implement an alternative solution for Parquet reading and writing

2018-06-22 Thread Szabolcs Vasas

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67628/
---

(Updated June 22, 2018, 4:36 p.m.)


Review request for Sqoop.


Bugs: SQOOP-3328
https://issues.apache.org/jira/browse/SQOOP-3328


Repository: sqoop-trunk


Description
---

The new implementation uses classes from parquet.hadoop packages.
TestParquetIncrementalImportMerge has been introduced to cover some gaps we had 
in the Parquet merge support.
The test infrastructure is also modified a bit which was needed because of 
TestParquetIncrementalImportMerge.

Note that this JIRA does not cover the Hive Parquet import support I will 
create another JIRA for that.


Diffs (updated)
-

  src/java/org/apache/sqoop/SqoopOptions.java 
d9984af369f901c782b1a74294291819e7d13cdd 
  src/java/org/apache/sqoop/avro/AvroUtil.java 
57c2062568778c5bb53cd4118ce4f030e4ff33f2 
  src/java/org/apache/sqoop/manager/ConnManager.java 
c80dd5d9cbaa9b114c12b693e9a686d2cbbe51a3 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 
3b5421028d3006e790ed4b711a06dbdb4035b8a0 
  src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 
17c9ed39b1e613a6df36b54cd5395b80e5f8fb0b 
  src/java/org/apache/sqoop/mapreduce/parquet/ParquetConstants.java 
ae53a96bddc523a52384715dd97705dc3d9db607 
  src/java/org/apache/sqoop/mapreduce/parquet/ParquetExportJobConfigurator.java 
8d7b87f6d6832ce8d81d995af4c4bd5eeae38e1b 
  src/java/org/apache/sqoop/mapreduce/parquet/ParquetImportJobConfigurator.java 
fa1bc7d1395fbbbceb3cb72802675aebfdb27898 
  
src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactory.java 
ed5103f1d84540ef2fa5de60599e94aa69156abe 
  
src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactoryProvider.java
 2286a52030778925349ebb32c165ac062679ff71 
  
src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java
 PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/parquet/ParquetMergeJobConfigurator.java 
67fdf6602bcbc6c091e1e9bf4176e56658ce5222 
  
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java
 PRE-CREATION 
  
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportMapper.java
 PRE-CREATION 
  
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java
 PRE-CREATION 
  
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportMapper.java
 PRE-CREATION 
  
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetJobConfiguratorFactory.java
 PRE-CREATION 
  
src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java
 PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 
7f21205e1c4be4200f7248d3f1c8513e0c8e490c 
  
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java
 ca02c7bdcaf2fa981e15a6a96b111dec38ba2b25 
  src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 
2d88a9c8ea4eb32001e1eb03e636d9386719 
  
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java
 87828d1413eb71761aed44ad3b138535692f9c97 
  src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 
20adf6e422cc4b661a74c8def114d44a14787fc6 
  
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java
 055e1166b07aeef711cd162052791500368c628d 
  
src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java
 9fecf282885f7aeac011a66f7d5d05512624976f 
  src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java 
e68bba90d8b08ac3978fcc9ccae612bdf02388e8 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java 
c62ee98c2b22d819c9a994884b254f76eb518b6a 
  src/java/org/apache/sqoop/tool/ImportTool.java 
2c474b7eeeff02b59204e4baca8554d668b6c61e 
  src/java/org/apache/sqoop/tool/MergeTool.java 
4c20f7d151514b26a098dafdc1ee265cbde5ad20 
  src/test/org/apache/sqoop/TestBigDecimalExport.java 
ccea17345c0c8a2bdb7c8fd141f37e3c822ee41e 
  src/test/org/apache/sqoop/TestMerge.java 
11806fea6c59ea897bc1aa23f6657ed172d093d5 
  src/test/org/apache/sqoop/TestParquetExport.java 
43dabb57b7862b607490369e09b197b6de65a147 
  src/test/org/apache/sqoop/TestParquetImport.java 
27d407aa3f9f2781f675294fa98431bc46f3dcfa 
  src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java PRE-CREATION 
  src/test/org/apache/sqoop/TestSqoopOptions.java 
bb7c20ddcb8fb5fc9c3b1edfb73fecb739bba269 
  src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java 
f6d591b73373fdf33b27202cb8116025fb694ef1 
  src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java 
a5f85a06ba21b01e99c1655450d36016c2901cc0 
  src/test/org/apache/sqoop/testutil/ImportJobTestCase.java 
dbefe209770885063d1b4d0c3940d078b8d91cad 
  src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java 

Re: Review Request 67628: Implement an alternative solution for Parquet reading and writing

2018-06-22 Thread Szabolcs Vasas


> On June 18, 2018, 1:59 p.m., Fero Szabo wrote:
> > src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopMergeParquetReducer.java
> > Lines 30 (patched)
> > 
> >
> >

This is not a bug we need to specify a key of type Void because of 
parquet.hadoop.ParquetRecordWriter#write method which is used under the hood. 
Since Void cannot be instantiated null is the only applicable parameter here.


> On June 18, 2018, 1:59 p.m., Fero Szabo wrote:
> > src/test/org/apache/sqoop/manager/sqlserver/SQLServerHiveImportTest.java
> > Line 145 (original), 147 (patched)
> > 
> >
> > Might be a good idea to remove the Sysout while this is modified. 
> > 
> > Also, you could consider using the new builder pattern, though I see 
> > that might be better suited for a separate Jira.

I have removed my changes from this class they will be in an upcoming patch.


> On June 18, 2018, 1:59 p.m., Fero Szabo wrote:
> > src/test/org/apache/sqoop/manager/sqlserver/SQLServerHiveImportTest.java
> > Line 147 (original), 149 (patched)
> > 
> >
> > So, Hadoop flags are no longer included. Was that your intention here? 
> > If so, the 'includeHadoopFlags' boolean parameter is confusing for me.

I have removed my changes from this class they will be in an upcoming patch.


> On June 18, 2018, 1:59 p.m., Fero Szabo wrote:
> > src/test/org/apache/sqoop/manager/sqlserver/SQLServerHiveImportTest.java
> > Line 147 (original), 149 (patched)
> > 
> >
> >

I have removed my changes from this class they will be in an upcoming patch.


> On June 18, 2018, 1:59 p.m., Fero Szabo wrote:
> > src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java
> > Lines 669 (patched)
> > 
> >
> > Might not be an issue, but shouldn't this handle dates as well?
> > 
> > I can't think of any other text-like types.
> > 
> > BigDecimals are probably OK.

Converting a date to string can be done in many different ways and it is 
usually done on the caller side of this method so I would not put it into this 
implementation now.


- Szabolcs


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67628/#review204919
---


On June 22, 2018, 4:36 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67628/
> ---
> 
> (Updated June 22, 2018, 4:36 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3328
> https://issues.apache.org/jira/browse/SQOOP-3328
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> The new implementation uses classes from parquet.hadoop packages.
> TestParquetIncrementalImportMerge has been introduced to cover some gaps we 
> had in the Parquet merge support.
> The test infrastructure is also modified a bit which was needed because of 
> TestParquetIncrementalImportMerge.
> 
> Note that this JIRA does not cover the Hive Parquet import support I will 
> create another JIRA for that.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/SqoopOptions.java 
> d9984af369f901c782b1a74294291819e7d13cdd 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 
> 57c2062568778c5bb53cd4118ce4f030e4ff33f2 
>   src/java/org/apache/sqoop/manager/ConnManager.java 
> c80dd5d9cbaa9b114c12b693e9a686d2cbbe51a3 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 
> 3b5421028d3006e790ed4b711a06dbdb4035b8a0 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 
> 17c9ed39b1e613a6df36b54cd5395b80e5f8fb0b 
>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetConstants.java 
> ae53a96bddc523a52384715dd97705dc3d9db607 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetExportJobConfigurator.java 
> 8d7b87f6d6832ce8d81d995af4c4bd5eeae38e1b 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetImportJobConfigurator.java 
> fa1bc7d1395fbbbceb3cb72802675aebfdb27898 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactory.java
>  ed5103f1d84540ef2fa5de60599e94aa69156abe 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactoryProvider.java
>  2286a52030778925349ebb32c165ac062679ff71 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetMergeJobConfigurator.java 
> 67fdf6602bcbc6c091e1e9bf4176e56658ce5222 
>   
> 

Re: Review Request 67628: Implement an alternative solution for Parquet reading and writing

2018-06-22 Thread Szabolcs Vasas


> On June 18, 2018, 10:10 a.m., daniel voros wrote:
> > Hey Szabolcs,
> > 
> > Thank you for submitting this! Verified UTs, opened some minor issues.
> > 
> > Could you please add a few lines of Javadoc to the new classes to make it 
> > clear what they're used for?
> > 
> > Thanks,
> > Daniel

Hi Dani,

Thank you for reviewing the code, I have added lots of Javadoc to my classes, I 
am not sure it makes sense everywhere but I think it helps in understanding the 
purpose of the new classes.


> On June 18, 2018, 10:10 a.m., daniel voros wrote:
> > src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactoryProvider.java
> > Lines 51 (patched)
> > 
> >
> > Wrong error msg: Is unknown? Or is _not_ set?

I have done some refactoring, introduced a new enum so this class does not 
exist anymore.


- Szabolcs


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67628/#review204915
---


On June 22, 2018, 4:36 p.m., Szabolcs Vasas wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67628/
> ---
> 
> (Updated June 22, 2018, 4:36 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3328
> https://issues.apache.org/jira/browse/SQOOP-3328
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> The new implementation uses classes from parquet.hadoop packages.
> TestParquetIncrementalImportMerge has been introduced to cover some gaps we 
> had in the Parquet merge support.
> The test infrastructure is also modified a bit which was needed because of 
> TestParquetIncrementalImportMerge.
> 
> Note that this JIRA does not cover the Hive Parquet import support I will 
> create another JIRA for that.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/SqoopOptions.java 
> d9984af369f901c782b1a74294291819e7d13cdd 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 
> 57c2062568778c5bb53cd4118ce4f030e4ff33f2 
>   src/java/org/apache/sqoop/manager/ConnManager.java 
> c80dd5d9cbaa9b114c12b693e9a686d2cbbe51a3 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 
> 3b5421028d3006e790ed4b711a06dbdb4035b8a0 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 
> 17c9ed39b1e613a6df36b54cd5395b80e5f8fb0b 
>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetConstants.java 
> ae53a96bddc523a52384715dd97705dc3d9db607 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetExportJobConfigurator.java 
> 8d7b87f6d6832ce8d81d995af4c4bd5eeae38e1b 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetImportJobConfigurator.java 
> fa1bc7d1395fbbbceb3cb72802675aebfdb27898 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactory.java
>  ed5103f1d84540ef2fa5de60599e94aa69156abe 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactoryProvider.java
>  2286a52030778925349ebb32c165ac062679ff71 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetMergeJobConfigurator.java 
> 67fdf6602bcbc6c091e1e9bf4176e56658ce5222 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportMapper.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportMapper.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetJobConfiguratorFactory.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java
>  PRE-CREATION 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 
> 7f21205e1c4be4200f7248d3f1c8513e0c8e490c 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java
>  ca02c7bdcaf2fa981e15a6a96b111dec38ba2b25 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 
> 2d88a9c8ea4eb32001e1eb03e636d9386719 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java
>  87828d1413eb71761aed44ad3b138535692f9c97 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 
> 20adf6e422cc4b661a74c8def114d44a14787fc6 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java
>  055e1166b07aeef711cd162052791500368c628d 
>   
> 

[jira] [Commented] (SQOOP-3337) Invalid Argument arrays in SQLServerManagerImportTest

2018-06-22 Thread Fero Szabo (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520445#comment-16520445
 ] 

Fero Szabo commented on SQOOP-3337:
---

The fix in SQOOP-3334 will help solve this issue easily.

> Invalid Argument arrays in SQLServerManagerImportTest 
> --
>
> Key: SQOOP-3337
> URL: https://issues.apache.org/jira/browse/SQOOP-3337
> Project: Sqoop
>  Issue Type: Bug
>Reporter: Fero Szabo
>Assignee: Fero Szabo
>Priority: Major
>
> The argument array builder is only initialized for each test configuration, 
> so the 5 tests are reusing the same one. Each test case adds it's own tool 
> option, meaning that starting from the second case, an invalid array is 
> generated. For example, the last case contains the extra tool options from 
> all of the test cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SQOOP-3337) Invalid Argument arrays in SQLServerManagerImportTest

2018-06-22 Thread Fero Szabo (JIRA)
Fero Szabo created SQOOP-3337:
-

 Summary: Invalid Argument arrays in SQLServerManagerImportTest 
 Key: SQOOP-3337
 URL: https://issues.apache.org/jira/browse/SQOOP-3337
 Project: Sqoop
  Issue Type: Bug
Reporter: Fero Szabo
Assignee: Fero Szabo


The argument array builder is only initialized for each test configuration, so 
the 5 tests are reusing the same one. Each test case adds it's own tool option, 
meaning that starting from the second case, an invalid array is generated. For 
example, the last case contains the extra tool options from all of the test 
cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67629: SQOOP-3334 Improve ArgumentArrayBuilder, so arguments are replaceable

2018-06-22 Thread Fero Szabo via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67629/
---

(Updated June 22, 2018, 2:08 p.m.)


Review request for Sqoop, Boglarka Egyed and Szabolcs Vasas.


Bugs: SQOOP-3334
https://issues.apache.org/jira/browse/SQOOP-3334


Repository: sqoop-trunk


Description
---

Changed the implementation so that it uses maps instead of lists.


Diffs (updated)
-

  src/test/org/apache/sqoop/testutil/ArgumentArrayBuilder.java 00ce4fe8 
  src/test/org/apache/sqoop/testutil/TestArgumentArrayBuilder.java PRE-CREATION 


Diff: https://reviews.apache.org/r/67629/diff/2/

Changes: https://reviews.apache.org/r/67629/diff/1-2/


Testing
---

Added 2 new unit tests.
Ran 3rdparty and unit tests.


Thanks,

Fero Szabo



Re: Review Request 67689: Use hive executable in (non-JDBC) Hive imports

2018-06-22 Thread Fero Szabo via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67689/#review205234
---



Hi Dani,

I had a look at your patch and it basically looks good to me, it applied 
cleanly on my system and all tests passed.

My only concern is that we lose a bit of test coverage. Wouldn't it make sense 
to reimplement the testcase you deleted in a different way? As far as I can 
see, it was the only test for external hive tables...

It might take some effort to do this though, I didn't have time to understand 
how it works exactly. One might be able to reuse the code in the TestHiveImport 
class.


src/java/org/apache/sqoop/hive/HiveImport.java
Line 330 (original)


Was this config effectively only used in testing and no longer needed?


- Fero Szabo


On June 21, 2018, 1:39 p.m., daniel voros wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67689/
> ---
> 
> (Updated June 21, 2018, 1:39 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3323
> https://issues.apache.org/jira/browse/SQOOP-3323
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> When doing Hive imports the old way (not via JDBC that was introduced in 
> SQOOP-3309) we're trying to use the CliDriver class from Hive and fall back 
> to the hive executable (a.k.a. Hive Cli) if that class is not found.
> 
> Since CliDriver and the hive executable that's relying on it are deprecated 
> (see also HIVE-10511), we should switch to using beeline to talk to Hive. 
> With recent additions (e.g. HIVE-18963) this should be easier than before.
> 
> As a first step we could switch to using hive executable. With HIVE-19728 it 
> will be possible (in Hive 3.1) to configure hive to actually run beeline when 
> using the hive executable. This way we could leave it to the user to decide 
> whether to use the deprecated cli or use beeline instead.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/hive/HiveImport.java 5da00a74 
>   src/test/org/apache/sqoop/TestIncrementalImport.java 1ab98021 
>   src/test/org/apache/sqoop/TestSqoopJobDataPublisher.java b3579ac1 
>   src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e51 
>   
> src/test/org/apache/sqoop/manager/postgresql/PostgresqlExternalTableImportTest.java
>  dd4cfb48 
> 
> 
> Diff: https://reviews.apache.org/r/67689/diff/1/
> 
> 
> Testing
> ---
> 
> run thirdparty and normal UTs, also tested on a cluster
> 
> I'm removing PostgresqlExternalTableImportTest since it was relying on the 
> CliDriver path to do an actual Hive import.
> 
> 
> Thanks,
> 
> daniel voros
> 
>



Review Request 67675: SQOOP-3332 Extend Documentation of --resilient option

2018-06-22 Thread Fero Szabo via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67675/
---

Review request for Sqoop, Boglarka Egyed, daniel voros, and Szabolcs Vasas.


Bugs: SQOOP-3332
https://issues.apache.org/jira/browse/SQOOP-3332


Repository: sqoop-trunk


Description
---

This is the documentation part of SQOOP-.


Diffs
-

  src/docs/user/connectors.txt f1c7aebe 
  src/java/org/apache/sqoop/manager/SQLServerManager.java c98ad2db 
  src/java/org/apache/sqoop/manager/SqlServerManagerContextConfigurator.java 
cf58f631 


Diff: https://reviews.apache.org/r/67675/diff/1/


Testing
---

Unit tests, 3rdparty tests, ant docs.

I've also investigated how export and import works: 

Import has it's retry mechanism in 
org.apache.sqoop.mapreduce.db.SQLServerDBRecordReader#nextKeyValue
In case of error, it re-calculates the db query, thus the implicit requirements

Export has it's retry loop in 
org.apache.sqoop.mapreduce.SQLServerAsyncDBExecThread#write
It doesn't recalculate the query, thus is a lot safer.


Thanks,

Fero Szabo



[jira] [Commented] (SQOOP-3323) Use hive executable in (non-JDBC) Hive imports

2018-06-22 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520178#comment-16520178
 ] 

Daniel Voros commented on SQOOP-3323:
-

Attached review request.

> Use hive executable in (non-JDBC) Hive imports
> --
>
> Key: SQOOP-3323
> URL: https://issues.apache.org/jira/browse/SQOOP-3323
> Project: Sqoop
>  Issue Type: Improvement
>  Components: hive-integration
>Affects Versions: 3.0.0
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 3.0.0
>
>
> When doing Hive imports the old way (not via JDBC that was introduced in 
> SQOOP-3309) we're trying to use the {{CliDriver}} class from Hive and fall 
> back to the {{hive}} executable (a.k.a. [Hive 
> Cli|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]) if 
> that class is not found.
> Since {{CliDriver}} and the {{hive}} executable that's relying on it are 
> [deprecated|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli]
>  (see also HIVE-10511), we should switch to using {{beeline}} to talk to 
> Hive. With recent additions (e.g. HIVE-18963) this should be easier than 
> before.
> As a first step we could switch to using {{hive}} executable. With HIVE-19728 
> it will be possible (in Hive 3.1) to configure hive to actually run beeline 
> when using the {{hive}} executable. This way we could leave it to the user to 
> decide whether to use the deprecated cli or use beeline instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SQOOP-3336) Splitting on integer column can create more splits than necessary

2018-06-22 Thread Daniel Voros (JIRA)


[ 
https://issues.apache.org/jira/browse/SQOOP-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520176#comment-16520176
 ] 

Daniel Voros commented on SQOOP-3336:
-

Attached review request.

This also affects splitting on date/timestamp columns, since DateSplitter uses 
the same logic.

> Splitting on integer column can create more splits than necessary
> -
>
> Key: SQOOP-3336
> URL: https://issues.apache.org/jira/browse/SQOOP-3336
> Project: Sqoop
>  Issue Type: Bug
>Affects Versions: 1.4.7
>Reporter: Daniel Voros
>Assignee: Daniel Voros
>Priority: Major
> Fix For: 1.5.0, 3.0.0
>
>
> Running an import with {{-m 2}} will result in three splits if there are only 
> three consecutive integers in the table ({{\{1, 2, 3\}}}).
> Work is (probably) spread more evenly between mappers this way, but ending up 
> with more files than expected could be an issue.
> Split-limit can also result in more values than asked for in the last chunk 
> (due to the closed interval in the end).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)