Ryan Skraba created BEAM-7979:
---------------------------------

             Summary: Avro incompatibilities with Spark 2.2 and Spark 2.3
                 Key: BEAM-7979
                 URL: https://issues.apache.org/jira/browse/BEAM-7979
             Project: Beam
          Issue Type: Bug
          Components: io-java-gcp, io-java-parquet, sdk-java-core
            Reporter: Ryan Skraba


Much of the code that depends on Avro (notably the wrappers built with 
[BeamSQL|https://github.com/apache/beam/blob/ae83448597f64474c3f5754d7b8e3f6b02347a6b/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L34]
 but also 
[some|https://github.com/apache/beam/blob/ae83448597f64474c3f5754d7b8e3f6b02347a6b/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java]
 
[connectors|https://github.com/apache/beam/blob/ae83448597f64474c3f5754d7b8e3f6b02347a6b/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java#L42])
 require a version > 1.8.x

This library is not present in Spark 2.2 and Spark 2.3 clusters, which are 
meant to be supported.  These pipelines will fail with ClassNotFoundException / 
MethodNotFoundExceptions.

Spark 2.4+ should be unaffected.

Relocating or vendoring is probably not appropriate, since Avro is frequently 
exposed in the API through parameters and potentially in generated specific 
records.





--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to