[jira] [Commented] (BEAM-5164) ParquetIOIT fails on Spark and Flink

Ryan Skraba (JIRA) Tue, 13 Aug 2019 02:51:15 -0700


    [ 
https://issues.apache.org/jira/browse/BEAM-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906020#comment-16906020
 ]


Ryan Skraba commented on BEAM-5164:
-----------------------------------

I checked with a spark local run on Spark 2.4.3, there is no issue (expected, 
since it includes the "right" parquet jars in the spark-supplied classpath).

The workaround that I proposed in the stack overflow question works for Spark 
2.2, 2.3 and 2.4, but is only a workaround... I added the following relocations 
to the "fat jar" build proposed in the Spark runner instructions:

```
            <relocation>
              <pattern>org.apache.parquet</pattern>
              <shadedPattern>shaded.org.apache.parquet</shadedPattern>
            </relocation>
            <!-- Some packages are shaded already, and on the original spark 
classpath. Shade them more. -->
            <relocation>
              <pattern>shaded.parquet</pattern>
              <shadedPattern>reshaded.parquet</shadedPattern>
            </relocation>
            <relocation>
              <pattern>org.apache.avro</pattern>
              <shadedPattern>shaded.org.apache.avro</shadedPattern>
            </relocation>
 ```

> ParquetIOIT fails on Spark and Flink
> ------------------------------------
>
>                 Key: BEAM-5164
>                 URL: https://issues.apache.org/jira/browse/BEAM-5164
>             Project: Beam
>          Issue Type: Bug
>          Components: testing
>            Reporter: Lukasz Gajowy
>            Priority: Minor
>
> When run on Spark or Flink remote cluster, ParquetIOIT fails with the 
> following stacktrace: 
> {code:java}
> org.apache.beam.sdk.io.parquet.ParquetIOIT > writeThenReadAll FAILED
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.NoSuchMethodError: 
> org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(Lorg/apache/parquet/io/OutputFile;)V
> at 
> org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:66)
> at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
> at 
> org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:87)
> at org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:116)
> at org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:61)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
> at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
> at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
> at 
> org.apache.beam.sdk.io.parquet.ParquetIOIT.writeThenReadAll(ParquetIOIT.java:133)
> Caused by:
> java.lang.NoSuchMethodError: 
> org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(Lorg/apache/parquet/io/OutputFile;)V{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (BEAM-5164) ParquetIOIT fails on Spark and Flink

Reply via email to