-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206195
-----------------------------------------------------------



Hi!

I was trying to run this on a minicluster but got the following error:

```
2018-07-18 09:20:41,799 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error 
running child : java.lang.NoSuchMethodError: 
org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
        at 
org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115)
        at 
org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117)
        at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:389)
        at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:350)
        at 
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:653)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
```

This is happening when we have newer version of parquet (1.8.1 IIRC) with older 
Avro (1.7.7 in this case).

Where is parquet coming from?
  - 1.9 is coming from Sqoop since this new patch
  - Hive's hive-exec jar also contains parquet classes shaded with the original 
packaging

Which gets picked seems to be random to me (even changing between reexecution 
of mappers!). Both are in the distributed cache.

Where is avro coming from?
  - There can be multiple versions under Sqoop/Hive but it doesn't really 
matter. Hadoop is packaged with avro under `share/hadoop/*/lib`. The jars there 
will take precedence over user classpath. This can be changed with 
`mapreduce.job.user.classpath.first=true`, but then we'd have to make sure not 
to override anything that Hadoop relies on.

I've come across this issue before and solved it with shading parquet classes. 
Note that this could be harder to do with Sqoop's ant build scripts.

Some other minor observations:
  - Hadoop 3.1.0 still has Avro 1.7.7
  - Hive has been using incompatible versions of Avro and Parquet for a long 
time, but they're not relying on parts of Parquet that require Avro.

Szabolcs, I've been struggling this for too long, and a fresh pair of eyes 
might help spot some other options! Can you please take a look and validate 
what I've found?

Regards,
Daniel

- daniel voros


On July 16, 2018, 3:56 p.m., Szabolcs Vasas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> -----------------------------------------------------------
> 
> (Updated July 16, 2018, 3:56 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3329
>     https://issues.apache.org/jira/browse/SQOOP-3329
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added 
> org.apache.parquet.avro-parquet 1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so 
> I needed to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These 
> scenarios are already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
> 
> 
> Diffs
> -----
> 
>   ivy.xml 1f587f3eb 
>   ivy/libraries.properties 565a8bf50 
>   src/docs/user/hive-notes.txt af97d94b3 
>   src/docs/user/import.txt a2c16d956 
>   src/java/org/apache/sqoop/SqoopOptions.java cc1b75281 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java
>  050c85488 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java
>  2180cc20e 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java
>  90b910a34 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java
>  66ebc5b80 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 
> 02816d77f 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java
>  6ebc5a31b 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 
> 122ff3fc9 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java
>  7e179a27d 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 
> 0a91e4a20 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java
>  bd07c09f4 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java
>  ed045cd14 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java 
> a4768c932 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987 
>   src/test/org/apache/sqoop/TestMerge.java 2b3280a5a 
>   src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c 
>   src/test/org/apache/sqoop/TestParquetImport.java b1488e8af 
>   src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc11 
>   src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512 
>   src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f 
>   src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4 
>   src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10a 
> 
> 
> Diff: https://reviews.apache.org/r/67929/diff/1/
> 
> 
> Testing
> -------
> 
> Ran unit and third party tests.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>

Reply via email to