> On Jan. 21, 2016, 2:49 p.m., Ryan Blue wrote:
> > connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsExtractor.java,
> >  line 506
> > <https://reviews.apache.org/r/42327/diff/8/?file=1204487#file1204487line506>
> >
> >     I think a better and cheaper check is to read the first 4 bytes (and/or 
> > the last 4 bytes) and make sure they are "PAR1". The first 4 and the last 4 
> > for all valid parquet files contain that magic pattern.
> 
> Abraham Fine wrote:
>     we "could" run into an issue where we run into a non-parquet file that 
> starts/ends with that magic pattern right? isn't this approach more complete 
> and will prevent false-positives?

Checking the first and last bytes should be sufficient. I'm not sure I think 
the last bytes need to be checked, but both patterns definitely identify the 
file as Parquet.


> On Jan. 21, 2016, 2:49 p.m., Ryan Blue wrote:
> > connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsExtractor.java,
> >  line 252
> > <https://reviews.apache.org/r/42327/diff/8/?file=1204487#file1204487line252>
> >
> >     Why not use the existing Hadoop class? That's what we did for Kite: 
> > https://github.com/kite-sdk/kite/blob/master/kite-data/kite-data-core/src/main/java/org/kitesdk/data/spi/filesystem/InputFormatReader.java#L69
> >     
> >     Then you don't need to include your own anonymous class. There's a bit 
> > of overhead to get the right constructor but I think that's a better 
> > strategy.
> 
> Abraham Fine wrote:
>     i agree that the end result it prettier but im not sure the added 
> complexity is worth it in this case. 
>     
>     i would like to get a committer's view on this one

I prefer the reflection approach, but I'm fine either way. In the long run, I'd 
like to see you submit a feature request (or patch) to Parquet requesting the 
ability to set a region when building a reader. The internals can handle this 
use case, we just don't expose it other than through the MR API. But since the 
file format is designed for selecting parts of the file it makes sense to 
expose that feature.


> On Jan. 21, 2016, 2:49 p.m., Ryan Blue wrote:
> > pom.xml, line 720
> > <https://reviews.apache.org/r/42327/diff/8/?file=1204498#file1204498line720>
> >
> >     Why exclude Avro? The dependency version should be managed by maven to 
> > match the version depended on by Sqoop directly.
> 
> Abraham Fine wrote:
>     this is an issue that i believe to be related to our connector class 
> loader.
>     
>     not excluding avro causes the following exception:
>     
>     ```
>     java.lang.LinkageError: loader constraint violation: when resolving 
> method 
> "org.apache.sqoop.connector.idf.AVROIntermediateDataFormat.getAvroSchema()Lorg/apache/avro/Schema;"
>  the class loader (instance of 
> org/apache/sqoop/classloader/ConnectorClassLoader) of the current class, 
> org/apache/sqoop/connector/hdfs/hdfsWriter/HdfsParquetWriter, and the class 
> loader (instance of sun/misc/Launcher$AppClassLoader) for the method's 
> defining class, org/apache/sqoop/connector/idf/AVROIntermediateDataFormat, 
> have different Class objects for the type org/apache/avro/Schema used in the 
> signature
>       at 
> org.apache.sqoop.connector.hdfs.hdfsWriter.HdfsParquetWriter.initialize(HdfsParquetWriter.java:60)
>       at org.apache.sqoop.connector.hdfs.HdfsLoader$1.run(HdfsLoader.java:95)
>       at org.apache.sqoop.connector.hdfs.HdfsLoader$1.run(HdfsLoader.java:61)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.sqoop.connector.hdfs.HdfsLoader.load(HdfsLoader.java:61)
>       at org.apache.sqoop.connector.hdfs.HdfsLoader.load(HdfsLoader.java:46)
>       at 
> org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor$ConsumerThread$1.call(SqoopOutputFormatLoadExecutor.java:276)
>       at 
> org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor$ConsumerThread$1.call(SqoopOutputFormatLoadExecutor.java:257)
>       at 
> org.apache.sqoop.utils.ClassUtils.executeWithClassLoader(ClassUtils.java:281)
>       at 
> org.apache.sqoop.job.mr.SqoopOutputFormatLoadExecutor$ConsumerThread.run(SqoopOutputFormatLoadExecutor.java:256)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)oader (instance of 
> sun/misc/Launcher$AppClassLoader) for the method's defining class, 
> org/apache/sqoop/connector/idf/AVROIntermediateDataFormat, have different 
> Class objects for the type org/apache/avro/Schema used in the signature
>     ```
>     
>     this is occurring within individual mappers.

Looks like a bug in Sqoop's custom ClassLoader to me. Generally, this kind of 
problem means that the class loader is not delegating properly to its parent 
class loaders. I'm -1 on the exclusion fix because it is just hiding another 
issue. And there should be a comment in the POM to indicate that you're 
avoiding a larger bug.


- Ryan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42327/#review115691
-----------------------------------------------------------


On Jan. 21, 2016, 5:04 p.m., Abraham Fine wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/42327/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2016, 5:04 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-2788
>     https://issues.apache.org/jira/browse/SQOOP-2788
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> read and write parquet in hdfsconnector
> 
> 
> Diffs
> -----
> 
>   connector/connector-hdfs/pom.xml 599631418ca63cc43d645c1ee1e7a73dc824b313 
>   
> connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsExtractor.java
>  a813c479a07d68e14ed49936f642e762e5b37437 
>   
> connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsLoader.java
>  774221aaf5c8cdb8d26ca108fae239598b42229b 
>   
> connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/HdfsPartition.java
>  644de60581faf90ceb2fcef8d3e0544067791fcc 
>   
> connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/configuration/ToFormat.java
>  27d121f529ecb4d5bd79e2b8c74ab8f7cc15fb10 
>   
> connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/hdfsWriter/GenericHdfsWriter.java
>  2ccccc4a94a582c8b47ccdefa523d1fd1632e627 
>   
> connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/hdfsWriter/HdfsParquetWriter.java
>  PRE-CREATION 
>   
> connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/hdfsWriter/HdfsSequenceWriter.java
>  75c2e7ef192d7d9628e622cc3c5ef176e33a73d0 
>   
> connector/connector-hdfs/src/main/java/org/apache/sqoop/connector/hdfs/hdfsWriter/HdfsTextWriter.java
>  78cf9732fdb89689b04d43e4af70ca5a43732dbf 
>   
> connector/connector-hdfs/src/test/java/org/apache/sqoop/connector/hdfs/TestLoader.java
>  11fcef2a38209c79928f582cf8aa03e889247f22 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/common/SqoopAvroUtils.java
>  985149cbb0d28b55a19d17076d996364d7f2ae90 
>   
> connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/AVROIntermediateDataFormat.java
>  d78fa8b72ecfe62eeec240e01597e7f2a7e4dd76 
>   pom.xml cb8a973abc96af1de905cebd80d30177cbaf1cb4 
>   test/pom.xml 644a9c7dbc746d4a3268532bdcf0babd4faaafba 
>   
> test/src/test/java/org/apache/sqoop/integration/connector/hdfs/ParquetTest.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/42327/diff/
> 
> 
> Testing
> -------
> 
> integration tests pass
> 
> 
> Thanks,
> 
> Abraham Fine
> 
>

Reply via email to