Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22528#discussion_r219686326 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala --- @@ -41,7 +42,12 @@ object CodecStreams { getDecompressionCodec(config, file) .map(codec => codec.createInputStream(inputStream)) - .getOrElse(inputStream) + .orElse { + if (file.getName.toLowerCase.endsWith(".zip")) { + val zip = new ZipArchiveInputStream(inputStream) + if (zip.getNextEntry != null) Some(zip) else None + } else None + }.getOrElse(inputStream) --- End diff -- @MaxGekk, I got that we can support zipped one but isn't this difficult to extend this support to non multiline modes as well? Basically deflate is the same codec and I wonder if we really should allow this zip one specifically in multiline mode for CSV / JSON specifically with a clear restriction (single file). Please correct me if I misunderstood.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org