-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12936/#review25304
-----------------------------------------------------------


Hi Hari,
thank you again for working on this JIRA and please accept my apologies for the 
late response. I've deep dived into the patch and I do have couple of comments 
and questions.


common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment49646>

    Extreme nit: Read data from the execution framework...
    
    (not necessary MR)



common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment49647>

    Extreme nit: Read data from execution framework...



common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
<https://reviews.apache.org/r/12936/#comment49648>

    Nit: Read data from execution framework as a native format.
    
    This should be independent native format right?



common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
<https://reviews.apache.org/r/12936/#comment49649>

    Extreme nit: Write an array of objects into the execution framework...



common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
<https://reviews.apache.org/r/12936/#comment49650>

    Extreme nit: Write data into execution framework....



common/src/main/java/org/apache/sqoop/schema/type/Column.java
<https://reviews.apache.org/r/12936/#comment49651>

    Can we add descriptive java doc describing what exactly are expecting to 
validate?



connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/12936/#comment50066>

    Those imports seems to be unused.



connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/12936/#comment50067>

    We should also take into account exportJobConfiguration.table.schema when 
building the query.



connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
<https://reviews.apache.org/r/12936/#comment50068>

    Let's clean up unused imports after this refactoring.



connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java
<https://reviews.apache.org/r/12936/#comment49652>

    This method seems to be generated automatically, but it seems to be 
removing the fail() call, is it intentional?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49656>

    The wiki [1] is also saying that the 0x5C should be substituted.
    
    1: 
https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49658>

    Similarly as above, it seems that we are missing the \\ replacement for 
0x5C.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50050>

    Is it wise to consume the exception here without any counters or other 
reporting except of log message?
    
    It also seems that other parts of the code are not null safe, so this error 
handling will most likely just cause NPE somewhere else.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49660>

    The setData() method is conditionally running the validations, are they 
intentionally skipped here in setTextData()?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment49661>

    I'm afraid that String.split() won't work correctly for following example 
input:
    
    1,"Hi, here is Jarcec"
    
    That contains two columns, but three columns will be created when splitting 
by comma.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50051>

    Nit: Please pass the correct array size instead of the zero.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50052>

    I would suggest to be strict here and accept only upper case variant of 
NULL.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50053>

    I would suggest to die quickly for unsupported data types rather than just 
silently pass them as a un-escaped strings.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50055>

    Is there a reason why to do our own join rather than StringUtils.join()?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50056>

    Is expected and desirable that user can't reset schema to NULL?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50035>

    Nit: Would it make more sense to have IntermediateDataFormat (without type 
here) or CSVIntermediateDataFormat since we are not allowing nothing else?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50036>

    This will convert the binary string to an array of 0 and 1, right? The 
intermediate data format is suggesting the MySQLdump approach that is 
serializing the bytes as they are though. The 0 and 1 seems to be quite 
inefficient.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50037>

    Since the regular expressions are pre-build and therefore the 
PatternSyntaxException should not be thrown, is there any other exception that 
we are expecting?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50060>

    The conditional escaped argument seems quite dangerous to me - what if the 
user has saved the data on disk with escaped = true and is using this class to 
read the data from disk? The default value is false, so this will remove some 
actual data, right?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50038>

    Since the regular expressions are pre-build and therefore the 
PatternSyntaxException should not be thrown, is there any other exception that 
we are expecting?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50058>

    Since couple of the methods, such as setSchema(), getSchema(), 
validateData() will have to be implemented by every IDF in a very similar way, 
I'm wondering if it would make sense to convert the IDF to abstract class and 
provide the base implementation?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java
<https://reviews.apache.org/r/12936/#comment50039>

    Can we javadoc for those two methods?
    
    Also it seems that they are not used, so I'm wondering if we are planning 
to use them in the future or if they are artifact from previous development?



core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java
<https://reviews.apache.org/r/12936/#comment50040>

    Can we make this a Class instead of String?



execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java
<https://reviews.apache.org/r/12936/#comment50041>

    Nit: Can we keep the style from previous lines and put this on single line?



execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java
<https://reviews.apache.org/r/12936/#comment50063>

    The length variable seems to be used only readFields method, so I'm 
wondering if local variable wouldn't be better fit?



execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java
<https://reviews.apache.org/r/12936/#comment50064>

    Casting the data to UTF might be quite dangerous, especially for binary 
values as they will be interpreted and possibly corrupted. This won't be an 
issue with current implementation that stores binary data as a stream of 0s and 
1s, but will become an issue when (if) we switch to the format designed on the 
wiki.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java
<https://reviews.apache.org/r/12936/#comment50070>

    This import do not seem to be relevant any more.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java
<https://reviews.apache.org/r/12936/#comment50042>

    Nit: Please keep the line on a single line.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java
<https://reviews.apache.org/r/12936/#comment50043>

    Nit: This change do not seem necessary.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java
<https://reviews.apache.org/r/12936/#comment50071>

    This import do not seem to be relevant any more.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java
<https://reviews.apache.org/r/12936/#comment50044>

    Nit: Can we please put this on single line?



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java
<https://reviews.apache.org/r/12936/#comment50072>

    This import do not seem to be relevant any more.



execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java
<https://reviews.apache.org/r/12936/#comment50045>

    Nit: Please put the line on a single line.



execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java
<https://reviews.apache.org/r/12936/#comment50069>

    This import do not seem relevant any more.



execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java
<https://reviews.apache.org/r/12936/#comment50046>

    Nit: Please put this on a single line.



execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java
<https://reviews.apache.org/r/12936/#comment50047>

    It seems that we are building the same schema in multiple test cases. Would 
it make sense to create a helper method for this rather than copy&pasting the 
code?



execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java
<https://reviews.apache.org/r/12936/#comment50065>

    Hari got 20 experience points for using awesome text messages in the tests!



spi/pom.xml
<https://reviews.apache.org/r/12936/#comment50048>

    I'm not sure whether we want to make all connectors depending on the SDK. 
The SDK should be light library of code for connectors, We should not force the 
connector to reuse anything. Is the dependency required?



spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java
<https://reviews.apache.org/r/12936/#comment50049>

    Nit: Please put this on a single line.


Jarcec

- Jarek Cecho


On Aug. 1, 2013, 3:41 a.m., Hari Shreedharan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/12936/
> -----------------------------------------------------------
> 
> (Updated Aug. 1, 2013, 3:41 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Implemented a pluggable intermediate data format that decouples the internal 
> representation of the data from the connector and the output formats. 
> Connectors can choose to implement and support a format that is more 
> efficient for them. Also separated the SqoopWritable so that we can use the 
> intermediate data format independent of (current) Hadoop. 
> 
> I ran a full build - all tests including integration tests pass. I have not 
> added any new tests, yet. I will add unit tests for the new classes. Also, I 
> have not tried running this on an actual cluster - so things may be broken. 
> I'd like some initial feedback based on the current patch. 
> 
> I also implemented escaping of characters. There is some work remaining to 
> support binary format, but it is mostly integration, the basic implementation 
> is in place.
> 
> 
> Diffs
> -----
> 
>   common/pom.xml db11b5b 
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7 
>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364e 
>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b2 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java
>  e0da80f 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
>  7212843 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
>  96818ba 
>   
> connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java
>  PRE-CREATION 
>   
> connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java
>  aa1c4ff 
>   
> connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java
>  a7ed6ba 
>   connector/connector-sdk/pom.xml 4056e14 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
>  PRE-CREATION 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java
>  PRE-CREATION 
>   
> connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java
>  PRE-CREATION 
>   core/src/main/java/org/apache/sqoop/framework/JobManager.java d0a087d 
>   core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 
> 53d0039 
>   execution/mapreduce/pom.xml f9a2a0e 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java
>  767080c 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java 
> 7fd9a01 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java
>  1978ec6 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java
>  a07c511 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java
>  4621942 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java 
> PRE-CREATION 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java
>  356ae8a 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 
> 59cf391 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java
>  90de6ef 
>   
> execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java
>  739eb17 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java 
> b31161c 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java 
> e21f15b 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java 
> b7079dd 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java 
> f849aae 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 
> 7b264c6 
>   
> execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java
>  PRE-CREATION 
>   
> execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java
>  bee8ab7 
>   pom.xml 5ea0633 
>   spi/pom.xml 0b240e8 
>   spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 
> 2becc56 
>   
> submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java
>  6fc485b 
> 
> Diff: https://reviews.apache.org/r/12936/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Hari Shreedharan
> 
>

Reply via email to