[ 
https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16583077#comment-16583077
 ] 

Steve Loughran commented on SPARK-24771:
----------------------------------------

All the wire stuff (e.g. to HDFS is protobuf). Rummaging around for Avro 
records, I only see them being used as a persistence format for the event 
history of the MR client. 



h2. General Avro API use

Assume: not going to break with the version upgrade.

h3. @Stringable

Both Path and Text import/use org.apache.avro.reflect.Stringable & are tagged 
as @Stringable;
a runtime attr which tells avro that toString() can be used to marshall it.
Shoudn't even need avro on the classpath.

h3. Package org.apache.hadoop.io.serializer.avro. Lets you ser/desr avro 
records.

Declared as one of the default serializations in 
{{org.apache.hadoop.io.serializer.SerializationFactory}}
if not overridden in {{io.serializations}} conf option.

{code}
<property>
  <name>io.serializations</name>
  <value>org.apache.hadoop.io.serializer.WritableSerialization, 
org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization, 
org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value>
  <description>A list of serialization classes that can be used for
  obtaining serializers and deserializers.</description>
</property>
{code}

Don't think that'll be brittle to change, it just means that avro gets handled 
as a wire format.

*I have no idea what would break here, or how*.

h3. class org.apache.hadoop.fs.AvroFSInput

lets avro use FSDataInputStreams as {{org.apache.avro.file.SeekableInput}} 
sources.

h2. Avro schemas and record generation

(outside of tests of that serialization)

h3. org.apache.hadoop.mapreduce

* uses it for history events. 
* defines records in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/avro/Events.avpr

Unless spark is using the hadoop-mapreduce-client code, this isn't going to be 
directly relevant.

if there's something downstream which needs to have spark & mr coexist on the 
classpath,
well, that'll be something for them to address.


> Upgrade AVRO version from 1.7.7 to 1.8
> --------------------------------------
>
>                 Key: SPARK-24771
>                 URL: https://issues.apache.org/jira/browse/SPARK-24771
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>              Labels: release-notes
>             Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to