Github user scottcarey commented on the issue: https://github.com/apache/spark/pull/21070 I tested this with the addition of some changes to ParquetOptions.scala, but this alone does not allow for writing or reading zstd compressed parquet files, because it is using reflection to acquire hadoop classes for compression which are not in the supplied dependencies. From what I can see, anyone that wants to use the new compression codecs are going to have to build their own custom version of spark. And probably with modified versions of hadoop libraries as well, including changing how the native bindings are built.... because that would be easier than updating the whole thing to hadoop-common 3.0 where the required compressors exist. Alternatively, spark+parquet should avoid the hadoop dependencies like the plague for compression / decompression. They bring in a steaming heap of dependencies and possible library conflicts and users often have versions (or CDH versions) that don't exactly match. In my mind, parquet should handle the compression itself, or with a light-weight dependency. Perhaps it can use either the hadoop flavor, or if that is not found, another one, or even a user-supplied one so that it works stand-alone or from inside hadoop without issue. Right now it is bound together with reflection and an awkward stack of brittle dependencies with no escape hatch. Or am I missing something here, and it is possible to read/write with the new codecs if I configure it differently?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org