> On July 18, 2018, 9:52 a.m., daniel voros wrote:
> > Hi!
> > 
> > I was trying to run this on a minicluster but got the following error:
> > 
> > ```
> > 2018-07-18 09:20:41,799 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
> > Error running child : java.lang.NoSuchMethodError: 
> > org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
> >         at 
> > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178)
> >         at 
> > org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
> >         at 
> > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
> >         at 
> > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
> >         at 
> > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
> >         at 
> > org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
> >         at 
> > org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115)
> >         at 
> > org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117)
> >         at 
> > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:389)
> >         at 
> > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:350)
> >         at 
> > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:653)
> >         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> >         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:422)
> >         at 
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
> >         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
> > ```
> > 
> > This is happening when we have newer version of parquet (1.8.1 IIRC) with 
> > older Avro (1.7.7 in this case).
> > 
> > Where is parquet coming from?
> >   - 1.9 is coming from Sqoop since this new patch
> >   - Hive's hive-exec jar also contains parquet classes shaded with the 
> > original packaging
> > 
> > Which gets picked seems to be random to me (even changing between 
> > reexecution of mappers!). Both are in the distributed cache.
> > 
> > Where is avro coming from?
> >   - There can be multiple versions under Sqoop/Hive but it doesn't really 
> > matter. Hadoop is packaged with avro under `share/hadoop/*/lib`. The jars 
> > there will take precedence over user classpath. This can be changed with 
> > `mapreduce.job.user.classpath.first=true`, but then we'd have to make sure 
> > not to override anything that Hadoop relies on.
> > 
> > I've come across this issue before and solved it with shading parquet 
> > classes. Note that this could be harder to do with Sqoop's ant build 
> > scripts.
> > 
> > Some other minor observations:
> >   - Hadoop 3.1.0 still has Avro 1.7.7
> >   - Hive has been using incompatible versions of Avro and Parquet for a 
> > long time, but they're not relying on parts of Parquet that require Avro.
> > 
> > Szabolcs, I've been struggling this for too long, and a fresh pair of eyes 
> > might help spot some other options! Can you please take a look and validate 
> > what I've found?
> > 
> > Regards,
> > Daniel

Hi Dani,

Thanks for looking into this! 

What is this minicluster environment you are referring to, how can I set it up 
on my side?

I have taken a quick look at the dependencies and I can see that Hive 
references Parquet 1.6 so that might cause an issue.
We can change this patch to keep the parquet-avro 1.6.0 dependency (which was 
brought in by Kite earlier) so we would be in-line with the Hive dependencies 
and later with the Hadoop 3/Hive 3 upgrade we could take a look how we could 
upgrade the Parquet dependency.

At this point we do not require Parquet 1.9, I have just added it since it a 
quite recent version but there is nothing in the patch which relies on it.

I will upload the graphml dependency files for reference.


- Szabolcs


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67929/#review206195
-----------------------------------------------------------


On July 16, 2018, 3:56 p.m., Szabolcs Vasas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67929/
> -----------------------------------------------------------
> 
> (Updated July 16, 2018, 3:56 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3329
>     https://issues.apache.org/jira/browse/SQOOP-3329
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> - Removed kitesdk dependency from ivy.xml
> - Removed Kite Dataset API based Parquet import implementation
> - Since Parquet library was a transitive dependency of the Kite SDK I added 
> org.apache.parquet.avro-parquet 1.9 as a direct dependency
> - In this dependency the parquet package has changed to org.apache.parquet so 
> I needed to make changes in several classes according to this
> - Removed all the Parquet related test cases from TestHiveImport. These 
> scenarios are already covered in TestHiveServer2ParquetImport.
> - Modified the documentation to reflect these changes.
> 
> 
> Diffs
> -----
> 
>   ivy.xml 1f587f3eb 
>   ivy/libraries.properties 565a8bf50 
>   src/docs/user/hive-notes.txt af97d94b3 
>   src/docs/user/import.txt a2c16d956 
>   src/java/org/apache/sqoop/SqoopOptions.java cc1b75281 
>   src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java
>  050c85488 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java
>  2180cc20e 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java
>  90b910a34 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java
>  66ebc5b80 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 
> 02816d77f 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java
>  6ebc5a31b 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 
> 122ff3fc9 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java
>  7e179a27d 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 
> 0a91e4a20 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java
>  bd07c09f4 
>   
> src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java
>  ed045cd14 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java 
> a4768c932 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987 
>   src/test/org/apache/sqoop/TestMerge.java 2b3280a5a 
>   src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c 
>   src/test/org/apache/sqoop/TestParquetImport.java b1488e8af 
>   src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc11 
>   src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512 
>   src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f 
>   src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4 
>   src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10a 
> 
> 
> Diff: https://reviews.apache.org/r/67929/diff/1/
> 
> 
> Testing
> -------
> 
> Ran unit and third party tests.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>

Reply via email to