[ https://issues.apache.org/jira/browse/BEAM-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906020#comment-16906020 ]
Ryan Skraba edited comment on BEAM-5164 at 8/13/19 9:50 AM: ------------------------------------------------------------ I checked with a spark local run on Spark 2.4.3, there is no issue (expected, since it includes the "right" parquet jars in the spark-supplied classpath). The workaround that I proposed in the stack overflow question works for Spark 2.2, 2.3 and 2.4, but is only a workaround... I added the following relocations to the "fat jar" build proposed in the Spark runner instructions: {code} <relocation> <pattern>org.apache.parquet</pattern> <shadedPattern>shaded.org.apache.parquet</shadedPattern> </relocation> <!-- Some packages are shaded already, and on the original spark classpath. Shade them more. --> <relocation> <pattern>shaded.parquet</pattern> <shadedPattern>reshaded.parquet</shadedPattern> </relocation> <relocation> <pattern>org.apache.avro</pattern> <shadedPattern>shaded.org.apache.avro</shadedPattern> </relocation> {code} was (Author: ryanskraba): I checked with a spark local run on Spark 2.4.3, there is no issue (expected, since it includes the "right" parquet jars in the spark-supplied classpath). The workaround that I proposed in the stack overflow question works for Spark 2.2, 2.3 and 2.4, but is only a workaround... I added the following relocations to the "fat jar" build proposed in the Spark runner instructions: ``` <relocation> <pattern>org.apache.parquet</pattern> <shadedPattern>shaded.org.apache.parquet</shadedPattern> </relocation> <!-- Some packages are shaded already, and on the original spark classpath. Shade them more. --> <relocation> <pattern>shaded.parquet</pattern> <shadedPattern>reshaded.parquet</shadedPattern> </relocation> <relocation> <pattern>org.apache.avro</pattern> <shadedPattern>shaded.org.apache.avro</shadedPattern> </relocation> ``` > ParquetIOIT fails on Spark and Flink > ------------------------------------ > > Key: BEAM-5164 > URL: https://issues.apache.org/jira/browse/BEAM-5164 > Project: Beam > Issue Type: Bug > Components: testing > Reporter: Lukasz Gajowy > Priority: Minor > > When run on Spark or Flink remote cluster, ParquetIOIT fails with the > following stacktrace: > {code:java} > org.apache.beam.sdk.io.parquet.ParquetIOIT > writeThenReadAll FAILED > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(Lorg/apache/parquet/io/OutputFile;)V > at > org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:66) > at > org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99) > at > org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:87) > at org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:116) > at org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:61) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) > at > org.apache.beam.sdk.io.parquet.ParquetIOIT.writeThenReadAll(ParquetIOIT.java:133) > Caused by: > java.lang.NoSuchMethodError: > org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(Lorg/apache/parquet/io/OutputFile;)V{code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)