----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/42327/#review115691 -----------------------------------------------------------
connector/connector-hdfs/pom.xml (line 88) <https://reviews.apache.org/r/42327/#comment176705> Does the exclusion from dependencyManagement apply here? I don't think you need both. connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsExtractor.java (line 64) <https://reviews.apache.org/r/42327/#comment176706> Minor: when you relocate dependencies like this, you are more likely to get conflicts when backporting changes or cherry-picking across branches. I recommend reverting a move like this and setting your editor to avoid relocations. connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsExtractor.java (line 249) <https://reviews.apache.org/r/42327/#comment176713> Why not use the existing Hadoop class? That's what we did for Kite: https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/spi/filesystem/InputFormatReader.java#L69 Then you don't need to include your own anonymous class. There's a bit of overhead to get the right constructor but I think that's a better strategy. connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsExtractor.java (line 503) <https://reviews.apache.org/r/42327/#comment176708> I think a better and cheaper check is to read the first 4 bytes (and/or the last 4 bytes) and make sure they are "PAR1". The first 4 and the last 4 for all valid parquet files contain that magic pattern. connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/hdfsWriter/HdfsParquetWriter.java (line 53) <https://reviews.apache.org/r/42327/#comment176753> Does this make the default UNCOMPRESSED? I think I'd rather see SNAPPY by default. connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/hdfsWriter/HdfsParquetWriter.java (line 58) <https://reviews.apache.org/r/42327/#comment176754> Why use the OutputFormat and RecordWriter here instead of an AvroParquetWriter? That is more straight-forward. connector/connector-hdfs/src/test/java/org/apache/sqoop/connector/hdfs/TestLoader.java (line 156) <https://reviews.apache.org/r/42327/#comment176755> Should this be assertTrue instead of assert? connector/connector-hdfs/src/test/java/org/apache/sqoop/connector/hdfs/TestLoader.java (line 179) <https://reviews.apache.org/r/42327/#comment176758> Why is default compression not supported? connector/connector-hdfs/src/test/java/org/apache/sqoop/connector/hdfs/TestLoader.java (line 188) <https://reviews.apache.org/r/42327/#comment176759> Why is this value changed? connector/connector-sdk/src/main/java/org/apache/sqoop/connector/common/SqoopAvroUtils.java (line 46) <https://reviews.apache.org/r/42327/#comment176762> I think this should be an error. Changing names without warning will confuse users. If the name is invalid, then it should be mapped somehow or there should be an option (defaulted to false) to turn on this behavior. pom.xml (line 716) <https://reviews.apache.org/r/42327/#comment176703> You should only need to depend on parquet-avro and parquet-hadoop. The parquet-common module will be included transitively and I don't think there is anything in there you would need to depend on directly. pom.xml (line 720) <https://reviews.apache.org/r/42327/#comment176704> Why exclude Avro? The dependency version should be managed by maven to match the version depended on by Sqoop directly. test/src/test/java/org/apache/sqoop/integration/connector/hdfs/ParquetTest.java (line 55) <https://reviews.apache.org/r/42327/#comment176769> This only has positive test cases. I'd like to see at least a negative case for null handling that I saw above. Any other cases where the CSV passed into the writer is unexpected would be good to validate this isn't going to blow up at runtime when the data producer does something unexpected. test/src/test/java/org/apache/sqoop/integration/connector/hdfs/ParquetTest.java (line 96) <https://reviews.apache.org/r/42327/#comment176763> Why is this a multiset? - Ryan Blue On Jan. 21, 2016, 11:06 a.m., Abraham Fine wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/42327/ > ----------------------------------------------------------- > > (Updated Jan. 21, 2016, 11:06 a.m.) > > > Review request for Sqoop. > > > Bugs: SQOOP-2788 > https://issues.apache.org/jira/browse/SQOOP-2788 > > > Repository: sqoop-sqoop2 > > > Description > ------- > > read and write parquet in hdfsconnector > > > Diffs > ----- > > connector/connector-hdfs/pom.xml 599631418ca63cc43d645c1ee1e7a73dc824b313 > > connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsExtractor.java > a813c479a07d68e14ed49936f642e762e5b37437 > > connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsLoader.java > 774221aaf5c8cdb8d26ca108fae239598b42229b > > connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsPartition.java > 644de60581faf90ceb2fcef8d3e0544067791fcc > > connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/configuration/ToFormat.java > 27d121f529ecb4d5bd79e2b8c74ab8f7cc15fb10 > > connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/hdfsWriter/GenericHdfsWriter.java > 2ccccc4a94a582c8b47ccdefa523d1fd1632e627 > > connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/hdfsWriter/HdfsParquetWriter.java > PRE-CREATION > > connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/hdfsWriter/HdfsSequenceWriter.java > 75c2e7ef192d7d9628e622cc3c5ef176e33a73d0 > > connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/hdfsWriter/HdfsTextWriter.java > 78cf9732fdb89689b04d43e4af70ca5a43732dbf > > connector/connector-hdfs/src/test/java/org/apache/sqoop/connector/hdfs/TestLoader.java > 11fcef2a38209c79928f582cf8aa03e889247f22 > > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/common/SqoopAvroUtils.java > 985149cbb0d28b55a19d17076d996364d7f2ae90 > > connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/AVROIntermediateDataFormat.java > d78fa8b72ecfe62eeec240e01597e7f2a7e4dd76 > pom.xml cb8a973abc96af1de905cebd80d30177cbaf1cb4 > test/pom.xml 644a9c7dbc746d4a3268532bdcf0babd4faaafba > > test/src/test/java/org/apache/sqoop/integration/connector/hdfs/ParquetTest.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/42327/diff/ > > > Testing > ------- > > integration tests pass > > > Thanks, > > Abraham Fine > >