> On Jan. 13, 2014, 1:36 a.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java,
> >  line 111
> > <https://reviews.apache.org/r/16812/diff/2/?file=421137#file421137line111>
> >
> >     Won't work with: 1,"Hi, here is Jarcec"
> >     
> >     Potentially use apache commons csv?

I'm afraid that Apache Commons CSV do not have any official build that we can 
use. Is there any light other CSV parsing library?


> On Jan. 13, 2014, 1:36 a.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java,
> >  lines 155-162
> > <https://reviews.apache.org/r/16812/diff/2/?file=421137#file421137line155>
> >
> >     Potentially use StringUtils.join

+1


> On Jan. 13, 2014, 1:36 a.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java,
> >  lines 170-172
> > <https://reviews.apache.org/r/16812/diff/2/?file=421137#file421137line170>
> >
> >     Is expected and desirable that user can't reset schema to NULL?

I'm not sure if it's the right thing to do here, but generally saying the 
schema is required.


> On Jan. 13, 2014, 1:36 a.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java,
> >  lines 275-285
> > <https://reviews.apache.org/r/16812/diff/2/?file=421137#file421137line275>
> >
> >     This will convert the binary string to an array of 0 and 1, right? The 
> > intermediate data format is suggesting the MySQLdump approach that is 
> > serializing the bytes as they are though. The 0 and 1 seems to be quite 
> > inefficient.
> 
> Abraham Elmahrek wrote:
>     How about just a string representation of the bytes represented as 
> "ISO-8859-1". This is an 8-bit charset that is available in every JVM 
> http://docs.oracle.com/javase/7/docs/api/java/nio/charset/Charset.html. The 
> Java string implementation should quickly convert the bytes to a string.

+1, our docs are suggesting MySQL dump approach that we should use. I would 
suggest to simply write the bytes as they are without any extra encoding step.


> On Jan. 13, 2014, 1:36 a.m., Abraham Elmahrek wrote:
> > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java,
> >  lines 313-315
> > <https://reviews.apache.org/r/16812/diff/2/?file=421137#file421137line313>
> >
> >     The conditional escaped argument seems quite dangerous to me - what if 
> > the user has saved the data on disk with escaped = true and is using this 
> > class to read the data from disk? The default value is false, so this will 
> > remove some actual data, right?
> 
> Abraham Elmahrek wrote:
>     In the case of CSV, do you feel that it should always be escaped? That 
> does make sense so that we don't run into any issues, but it also would 
> require the consumer of the imported data to always unescape.

The intermediate data format should be very strict without much configuration 
options to make it as fast as possible. Having said that serialization format 
on disk is a different question.


> On Jan. 13, 2014, 1:36 a.m., Abraham Elmahrek wrote:
> > core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java, line 
> > 113
> > <https://reviews.apache.org/r/16812/diff/2/?file=421141#file421141line113>
> >
> >     "Can we make this a Class instead of String?"
> >     
> >     It seems we want a string representation of the classname so that we 
> > can pass it to different contexts during the execution of the job?

We are using "Class" for this purpose on other places and it seems more natural 
then string. Having class instead of string will also ensure that the class is 
indeed available in the JVM (e.g. fail sooner rather then later).


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16812/#review31600
-----------------------------------------------------------


On Jan. 13, 2014, 1:17 a.m., Abraham Elmahrek wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16812/
> -----------------------------------------------------------
> 
> (Updated Jan. 13, 2014, 1:17 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Previous review from hari is in first iteration of this review only.
> 
> commit 1c5122611fbd4a46a629421f7c55746cfc14f136
> Author: Hari Shreedharan <[email protected]>
> Date:   Fri Jan 10 15:07:08 2014 -0800
> 
>     SQOOP-777. Sqoop2: Pluggable Intermediate Data Format
>     
>     Implemented a pluggable intermediate data format that decouples the 
> internal representation of the data from the connector and the output 
> formats. Connectors can choose to implement and support a format that is more 
> efficient for them. Also separated the SqoopWritable so th
>     
>     I ran a full build - all tests including integration tests pass. I have 
> not added any new tests, yet. I will add unit tests for the new classes. 
> Also, I have not tried running this on an actual cluster - so things may be 
> broken. I'd like some initial feedback based on the c
>     
>     I also implemented escaping of characters. There is some work remaining 
> to support binary format, but it is mostly integration, the basic 
> implementation is in place.
> 
> :100644 100644 db11b5b... 9bfa07d... M  common/pom.xml
> :100644 100644 3e1adc7... f971240... M  
> common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
> :100644 100644 d81364e... e547875... M  
> common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
> :100644 100644 8b630b2... 30c26a3... M  
> common/src/main/java/org/apache/sqoop/schema/type/Column.java
> :100644 100644 e0da80f... 9c70db9... M  
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java
> :100644 100644 ef39cdc... 075890f... M  
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
> :100644 100644 96818ba... 795ffdb... M  
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
> :000000 100644 0000000... 6c33423... A  
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java
> :100644 100644 aa1c4ff... 7220018... M  
> connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java
> :100644 100644 a7ed6ba... 28399f2... M  
> connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java
> :100644 100644 4056e14... f54837d... M  connector/connector-sdk/pom.xml
> :000000 100644 0000000... d4874f2... A  
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
> :000000 100644 0000000... 63e14d2... A  
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java
> :000000 100644 0000000... 6e5479f... A  
> connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java
> :100644 100644 e052584... 3b2ef94... M  
> core/src/main/java/org/apache/sqoop/framework/JobManager.java
> :100644 100644 53d0039... 9f5c47d... M  
> core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java
> :100644 100644 f9a2a0e... 9754afd... M  execution/mapreduce/pom.xml
> :100644 100644 5c0a027... 7680e33... M  
> execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java
> :100644 100644 7fd9a01... 3604898... M  
> execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java
> :100644 100644 1978ec6... 099cdd3... M  
> execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java
> :100644 100644 a07c511... ee6bf39... M  
> execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java
> :100644 100644 4621942... 7b799ca... M  
> execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java
> :000000 100644 0000000... 71ae980... A  
> execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java
> :100644 100644 356ae8a... b495cc9... M  
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java
> :100644 100644 92de37e... 8164ffe... M  
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java
> :100644 100644 90de6ef... b3503da... M  
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java
> :100644 100644 7dedee9... 16e59d8... M  
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java
> :100644 100644 98a2c51... 5aaceb3... M  
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java
> :100644 100644 e21f15b... 4e9e6ea... M  
> execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java
> :100644 100644 b7079dd... ddda423... M  
> execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java
> :100644 100644 f849aae... 2968411... M  
> execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java
> :100644 100644 7b264c6... 954990f... M  
> execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java
> :000000 100644 0000000... aea7de3... A  
> execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java
> :100644 100644 bee8ab7... 663dfb5... M  
> execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java
> :100644 100644 1e2f005... a722c74... M  pom.xml
> :100644 100644 0b240e8... 43f17d4... M  spi/pom.xml
> :100644 100644 2becc56... 298e8f5... M  
> spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java
> :100644 100644 6fc485b... 8d1a4e8... M  
> submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java
> 
> 
> Diffs
> -----
> 
>   common/pom.xml db11b5ba70093704bc3f7a0a98cf7d44172e2b7d 
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 
> 3e1adc7084cf9ef1b93c8146110c751c7de34376 
>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java 
> d81364ef81bc747437f5b78520bacec8ee2613a3 
>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 
> 8b630b28295700b204da9f5a948eaef106ca559b 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java
>  e0da80f77230fb836ef6ab36e3bd867d6cef553a 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
>  ef39cdcbc41280a0dec338c8c2ab27b7cf76385b 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
>  96818ba27b25538643f9c5777ed8abebdf0eb1e9 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java
>  PRE-CREATION 
>   
> connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java
>  aa1c4ff7ff138de5efc5bc1bca5b0d869d214a1c 
>   
> connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java
>  a7ed6bab95c76c4928414e7850cc3e52a3ce4595 
>   
> connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportInitializer.java
>  a33fa3631400c3382a72e6286b1efb3fca0c9cb9 
>   connector/connector-sdk/pom.xml 4056e14978cb5e679233e9fdde11d53f1ca2756b 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
>  PRE-CREATION 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java
>  PRE-CREATION 
>   
> connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java
>  PRE-CREATION 
>   core/src/main/java/org/apache/sqoop/framework/JobManager.java 
> e0525846029bb394b2614a33595ebe6fcdd4aaf2 
>   core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 
> 53d003980a172e4e0acf18630a3496909c17cb5c 
>   execution/mapreduce/pom.xml f9a2a0e57b499cc785bdb6548a2d6113e7b867c0 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java
>  5c0a02739a5f7598dc83ab82631b80410ab39213 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java 
> 7fd9a017140fb91292a88e5a944ae549915c86e4 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java
>  1978ec61e07beb6b1e75bc4da8c698c738ac4e4c 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java
>  a07c5111e9e037dab4dacb128a53918f61599495 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java
>  462194238755e603dd60226a97cf357cf63e0f20 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java 
> PRE-CREATION 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java
>  356ae8ab6f3cad13b644e1f033278d9012938956 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 
> 92de37e1b7ffd3c4ce584e4d10c4e1f9af03c369 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java
>  90de6efa5f3145d0480e375d82bd5c6a9760a799 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java
>  7dedee9f12b11e3af5cf6ee5c81bbc377a04a4fd 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java 
> 98a2c518b65130ebe272faf721b77099a1e85a63 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java 
> e21f15b8be2aacb7463ab4c41cfce63cbb71cf57 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java 
> b7079dd47c80c776a508b07f0ee5152c1d3fd20e 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java 
> f849aaef10c3b995afbe6326353256affbceee67 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 
> 7b264c68ae761fbaf91bc51ed312de70b9ba3fd3 
>   
> execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java
>  PRE-CREATION 
>   
> execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java
>  bee8ab75f1f8e6508fca905aafe7c67f6b5d8d9a 
>   pom.xml 1e2f0057ca2e9cd635ccbce75e5d6e39eeabb163 
>   spi/pom.xml 0b240e8a1146cc7a06a2c50c181fb73b11e6dd8c 
>   spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 
> 2becc56364343f1cffdf91d9b17144666acbdd2e 
>   
> submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java
>  6fc485b66ea8c33d09d172675a104954d570f3a7 
> 
> Diff: https://reviews.apache.org/r/16812/diff/
> 
> 
> Testing
> -------
> 
> mvn test
> 
> 
> Thanks,
> 
> Abraham Elmahrek
> 
>

Reply via email to