[ https://issues.apache.org/jira/browse/SPARK-25513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-25513: ---------------------------------- Affects Version/s: (was: 3.0.0) 3.1.0 > Read zipped CSV and JSON files > ------------------------------ > > Key: SPARK-25513 > URL: https://issues.apache.org/jira/browse/SPARK-25513 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Maxim Gekk > Priority: Minor > > Spark can read compression files if there is compression codec for them. By > default, Hadoop provides compressors/decompressors for bzip2, deflate, gzip, > lz4 and snappy but they cannot be used directly for reading zip archives. > In general zip archives can contain multiple entries but in practice users > use zip archives to store only one file. This use case is pretty often in > wild. > The ticket aims to support reading of zipped CSV and JSON files in multi-line > mode when each zip archive contains only one file. > For example, current approach of reading zipped CSV files looks complicated : > https://docs.azuredatabricks.net/spark/latest/data-sources/zip-files.html#zip-files > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org