Maxim Gekk created SPARK-24881:
----------------------------------

             Summary: New options - compression and compressionLevel
                 Key: SPARK-24881
                 URL: https://issues.apache.org/jira/browse/SPARK-24881
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 2.3.1
            Reporter: Maxim Gekk


Currently Avro datasource takes the compression codec name from SQL config 
(config key is hard coded in AvroFileFormat): 
https://github.com/apache/spark/blob/106880edcd67bc20e8610a16f8ce6aa250268eeb/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala#L121-L125
 . The obvious cons of it is modification of the global config can impact of 
multiple writes.

A purpose of the ticket is to add new Avro option - "compression" the same as 
we already have for other datasource like JSON, CSV and etc. If new option is 
not set by an user, we take settings from SQL config 
spark.sql.avro.compression.codec. If the former one is not set too, default 
compression codec will be snappy (this is current behavior in the master).

Besides of the compression option, need to add another option - 
compressionLevel which should reflect another SQL config in Avro: 
https://github.com/apache/spark/blob/106880edcd67bc20e8610a16f8ce6aa250268eeb/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala#L122



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to