> On July 18, 2018, 9:52 a.m., daniel voros wrote: > > Hi! > > > > I was trying to run this on a minicluster but got the following error: > > > > ``` > > 2018-07-18 09:20:41,799 FATAL [main] org.apache.hadoop.mapred.YarnChild: > > Error running child : java.lang.NoSuchMethodError: > > org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType; > > at > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178) > > at > > org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214) > > at > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171) > > at > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130) > > at > > org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227) > > at > > org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124) > > at > > org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115) > > at > > org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117) > > at > > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:389) > > at > > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:350) > > at > > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:653) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171) > > ``` > > > > This is happening when we have newer version of parquet (1.8.1 IIRC) with > > older Avro (1.7.7 in this case). > > > > Where is parquet coming from? > > - 1.9 is coming from Sqoop since this new patch > > - Hive's hive-exec jar also contains parquet classes shaded with the > > original packaging > > > > Which gets picked seems to be random to me (even changing between > > reexecution of mappers!). Both are in the distributed cache. > > > > Where is avro coming from? > > - There can be multiple versions under Sqoop/Hive but it doesn't really > > matter. Hadoop is packaged with avro under `share/hadoop/*/lib`. The jars > > there will take precedence over user classpath. This can be changed with > > `mapreduce.job.user.classpath.first=true`, but then we'd have to make sure > > not to override anything that Hadoop relies on. > > > > I've come across this issue before and solved it with shading parquet > > classes. Note that this could be harder to do with Sqoop's ant build > > scripts. > > > > Some other minor observations: > > - Hadoop 3.1.0 still has Avro 1.7.7 > > - Hive has been using incompatible versions of Avro and Parquet for a > > long time, but they're not relying on parts of Parquet that require Avro. > > > > Szabolcs, I've been struggling this for too long, and a fresh pair of eyes > > might help spot some other options! Can you please take a look and validate > > what I've found? > > > > Regards, > > Daniel
Hi Dani, Thanks for looking into this! What is this minicluster environment you are referring to, how can I set it up on my side? I have taken a quick look at the dependencies and I can see that Hive references Parquet 1.6 so that might cause an issue. We can change this patch to keep the parquet-avro 1.6.0 dependency (which was brought in by Kite earlier) so we would be in-line with the Hive dependencies and later with the Hadoop 3/Hive 3 upgrade we could take a look how we could upgrade the Parquet dependency. At this point we do not require Parquet 1.9, I have just added it since it a quite recent version but there is nothing in the patch which relies on it. I will upload the graphml dependency files for reference. - Szabolcs ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67929/#review206195 ----------------------------------------------------------- On July 16, 2018, 3:56 p.m., Szabolcs Vasas wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67929/ > ----------------------------------------------------------- > > (Updated July 16, 2018, 3:56 p.m.) > > > Review request for Sqoop. > > > Bugs: SQOOP-3329 > https://issues.apache.org/jira/browse/SQOOP-3329 > > > Repository: sqoop-trunk > > > Description > ------- > > - Removed kitesdk dependency from ivy.xml > - Removed Kite Dataset API based Parquet import implementation > - Since Parquet library was a transitive dependency of the Kite SDK I added > org.apache.parquet.avro-parquet 1.9 as a direct dependency > - In this dependency the parquet package has changed to org.apache.parquet so > I needed to make changes in several classes according to this > - Removed all the Parquet related test cases from TestHiveImport. These > scenarios are already covered in TestHiveServer2ParquetImport. > - Modified the documentation to reflect these changes. > > > Diffs > ----- > > ivy.xml 1f587f3eb > ivy/libraries.properties 565a8bf50 > src/docs/user/hive-notes.txt af97d94b3 > src/docs/user/import.txt a2c16d956 > src/java/org/apache/sqoop/SqoopOptions.java cc1b75281 > src/java/org/apache/sqoop/avro/AvroUtil.java 1663b1d1a > > src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java > 050c85488 > > src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java > 2180cc20e > > src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java > 90b910a34 > > src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java > 66ebc5b80 > > src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java > 02816d77f > > src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java > 6ebc5a31b > > src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java > 122ff3fc9 > > src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java > 7e179a27d > > src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java > 0a91e4a20 > > src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java > bd07c09f4 > > src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java > ed045cd14 > src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java > a4768c932 > src/java/org/apache/sqoop/tool/BaseSqoopTool.java 87fc5e987 > src/test/org/apache/sqoop/TestMerge.java 2b3280a5a > src/test/org/apache/sqoop/TestParquetExport.java 0fab1880c > src/test/org/apache/sqoop/TestParquetImport.java b1488e8af > src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java adad0cc11 > src/test/org/apache/sqoop/hive/TestHiveImport.java 436f0e512 > src/test/org/apache/sqoop/hive/TestHiveServer2ParquetImport.java b55179a4f > src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java dbda8b7f4 > src/test/org/apache/sqoop/util/ParquetReader.java f1c2fe10a > > > Diff: https://reviews.apache.org/r/67929/diff/1/ > > > Testing > ------- > > Ran unit and third party tests. > > > Thanks, > > Szabolcs Vasas > >