[ https://issues.apache.org/jira/browse/BEAM-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906445#comment-16906445 ]
Luke Cwik commented on BEAM-5164: --------------------------------- In this specific case I think we could shade the parquet library. Your reasoning listed above is correct: _(1) we should shade to prevent transitive dependency collisions in runners when necessary, but (2) don't shade systematically by default "just in case", and (3) once a dependency has reached a certain threshold, like the extremely common guava and grpc jars, vendor them for reuse._ The downside to shading/vendoring is that it makes it more difficult for users to force a dependency version change without having the Apache Beam folks perform a release and also getting the shading/vendoring done correctly is quite annoying and very error prone. Vendoring requires two releases (the vendored artifact, and then core Beam projects that are updated to consume it) while shading only needs one but vendoring is much easier to reason about/builds faster/... The best option is typically to try and get all parts aligned to use the same version but this is not possible always (such as in the case where you are trying to use multiple versions of Spark and Spark itself is incompatible with the newer version of a library) then your forced to shade/vendor. > ParquetIOIT fails on Spark and Flink > ------------------------------------ > > Key: BEAM-5164 > URL: https://issues.apache.org/jira/browse/BEAM-5164 > Project: Beam > Issue Type: Bug > Components: testing > Reporter: Lukasz Gajowy > Priority: Minor > > When run on Spark or Flink remote cluster, ParquetIOIT fails with the > following stacktrace: > {code:java} > org.apache.beam.sdk.io.parquet.ParquetIOIT > writeThenReadAll FAILED > org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.NoSuchMethodError: > org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(Lorg/apache/parquet/io/OutputFile;)V > at > org.apache.beam.runners.spark.SparkPipelineResult.beamExceptionFrom(SparkPipelineResult.java:66) > at > org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99) > at > org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:87) > at org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:116) > at org.apache.beam.runners.spark.TestSparkRunner.run(TestSparkRunner.java:61) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350) > at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331) > at > org.apache.beam.sdk.io.parquet.ParquetIOIT.writeThenReadAll(ParquetIOIT.java:133) > Caused by: > java.lang.NoSuchMethodError: > org.apache.parquet.hadoop.ParquetWriter$Builder.<init>(Lorg/apache/parquet/io/OutputFile;)V{code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)