[ https://issues.apache.org/jira/browse/SPARK-35744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362804#comment-17362804 ]
Steven Aerts commented on SPARK-35744: -------------------------------------- [~Gengliang.Wang] in the avro java/scala world there are two ways of handling data. You can use [GenericData |https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/generic/GenericData.html]which gives you a generic way to handle any avro data. This is also what spark-avro uses internally. The other option you have is to use [SpecificData|https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/specific/SpecificData.html], where you let the[ avro codegen generate|https://avro.apache.org/docs/1.10.2/gettingstartedjava.html#Serializing+and+deserializing+with+code+generation] specific classes and you can use these classes specifically generated for your avro schema. If you use these classes in spark you will hit the issue mentioned. I am not sure how common this issue is. And I would totally understand if you would close this issue as too exotic. > Performance degradation in avro SpecificRecordBuilders > ------------------------------------------------------ > > Key: SPARK-35744 > URL: https://issues.apache.org/jira/browse/SPARK-35744 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.2.0 > Reporter: Steven Aerts > Priority: Minor > > Creating this bug to let you know that when we tested out spark 3.2.0 we saw > a significant performance degradation where our code was handling Avro > Specific Record objects. This slowed down some of our jobs with a factor 4. > Spark 3.2.0 upsteps the avro version from 1.8.2 to 1.10.2. > The degradation was caused by a change introduced in avro 1.9.0. This change > degrades performance when creating avro specific records in certain > classloader topologies, like the ones used in spark. > We notified and [proposed|https://github.com/apache/avro/pull/1253] a simple > fix upstream in the avro project. (Links contain more details) > It is unclear for us how many other projects are using avro specific records > in a spark context and will be impacted by this degradation. > Feel free to close this issue if you think this issue is too much of a > corner case. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org