[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2023-01-08 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655799#comment-17655799
 ] 

ming95 commented on SPARK-39743:


[~euigeun_chung] 

 

Currently, Spark uses `airCompressor` lib to support ORC writes, and does not 
support setting ZSTD levels~

https://github.com/airlift/aircompressor/blob/79265477482a4594efcc87aaef92b589c6ec2277/src/main/java/io/airlift/compress/zstd/ZstdCompressor.java#L113

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Assignee: ming95
>Priority: Minor
> Fix For: 3.4.0
>
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2023-01-05 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655244#comment-17655244
 ] 

ming95 commented on SPARK-39743:


[~euigeun_chung] 

 

 see this Jira : https://issues.apache.org/jira/browse/SPARK-33978

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Assignee: ming95
>Priority: Minor
> Fix For: 3.4.0
>
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2023-01-05 Thread Eugene Chung (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655211#comment-17655211
 ] 

Eugene Chung commented on SPARK-39743:
--

I'm sorry to ask a question here but I couldn't find the answer on Internet.

Can I set the zstd compression level for ORC?

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Assignee: ming95
>Priority: Minor
> Fix For: 3.4.0
>
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2022-08-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575542#comment-17575542
 ] 

Apache Spark commented on SPARK-39743:
--

User 'ming95' has created a pull request for this issue:
https://github.com/apache/spark/pull/37416

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2022-07-28 Thread zhiming she (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572551#comment-17572551
 ] 

zhiming she commented on SPARK-39743:
-

[~hyukjin.kwon] 

As [~yeachan153]  said, is it possible to add some clarification in spark docs 
or post external docs about some parameters that are not controlled by spark  
(like parquet zstd level). Not sure if this is possible as I haven't found a 
link to other libraries documentation in the spark doc.

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2022-07-28 Thread zhiming she (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572546#comment-17572546
 ] 

zhiming she commented on SPARK-39743:
-

[~yeachan153] 

Only some more important parameters are supported in spark, such as 
`avro.mapred.deflate.level`. Other less important parameters are usually not 
supported in spark, such as `avro.mapred.zstd.level` and 
`parquet.compression.codec.zstd.level`. Unless there is a special reason.

In addition, the problem you mentioned does exist, and I will modify the 
corresponding parameter description in the document later.

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2022-07-28 Thread Yeachan Park (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572308#comment-17572308
 ] 

Yeachan Park commented on SPARK-39743:
--

OK, thanks for the clarification. Should we look at improving the documentation 
for this?

For example, for AVRO files, we have a config option to determine the 
compression level in  https://spark.apache.org/docs/latest/configuration.html, 
is there a reason we also don't do that for parquet? I don't see it in the 
config docs or the parquet files docs: 
https://spark.apache.org/docs/latest/sql-data-sources-parquet.html. Or maybe we 
should add a link to 
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/README.md?

The docs also say that `spark.io.compression.zstd.level` controls the 
compression level for the zstd codec. Based on this, I didn't realise that this 
only applied to internal data. Should we make that clearer in the config?





> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2022-07-27 Thread zhiming she (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572169#comment-17572169
 ] 

zhiming she commented on SPARK-39743:
-

[~hyukjin.kwon]

 

can you mark this issue as `Resolved`.

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2022-07-27 Thread shezm (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571995#comment-17571995
 ] 

shezm commented on SPARK-39743:
---

[~yeachan153] 

spark.io.compression.zstd.level is adapted to 
{{{}spark.io.compression.codec{}}}. It only works on internal data.

 

If you want to set a different zstd level to write parquet files , you can set 
`parquet.compression.codec.zstd.level` in sparkConf, like :

 
{code:java}
val spark = SparkSession
      .builder()
      .master("local")
      .appName("spark example")
      .config("spark.sql.parquet.compression.codec", "zstd")
      .config("parquet.compression.codec.zstd.level", 10)  // here 
      .getOrCreate(); 

val csvfile = spark.read.csv("file:///home/test_data/Reviews.csv")
csvfile.coalesce(1).write.parquet("file:///home/test_data/nn_parq_10"){code}
 

 

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2022-07-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570435#comment-17570435
 ] 

Apache Spark commented on SPARK-39743:
--

User 'ming95' has created a pull request for this issue:
https://github.com/apache/spark/pull/37263

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2022-07-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570434#comment-17570434
 ] 

Apache Spark commented on SPARK-39743:
--

User 'ming95' has created a pull request for this issue:
https://github.com/apache/spark/pull/37263

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2022-07-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566077#comment-17566077
 ] 

Hyukjin Kwon commented on SPARK-39743:
--

No need to assign somebody. you can start working on this directly

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39743) Unable to set zstd compression level while writing parquet files

2022-07-12 Thread shezm (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17565923#comment-17565923
 ] 

shezm commented on SPARK-39743:
---

I'm Interested in this, could you assign it to me?

> Unable to set zstd compression level while writing parquet files
> 
>
> Key: SPARK-39743
> URL: https://issues.apache.org/jira/browse/SPARK-39743
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yeachan Park
>Priority: Minor
>
> While writing zstd compressed parquet files, the following setting 
> `spark.io.compression.zstd.level` does not have any affect with regards to 
> the compression level of zstd.
> All files seem to be written with the default zstd compression level, and the 
> config option seems to be ignored.
> Using the zstd cli tool, we confirmed that setting a higher compression level 
> for the same file tested in spark resulted in a smaller file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org