Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22453#discussion_r220200276 --- Diff: docs/sql-programming-guide.md --- @@ -1002,6 +1002,21 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession </p> </td> </tr> +<tr> + <td><code>spark.sql.parquet.writeLegacyFormat</code></td> + <td>false</td> + <td> + This configuration indicates whether we should use legacy Parquet format adopted by Spark 1.4 + and prior versions or the standard format defined in parquet-format specification to write + Parquet files. This is not only related to compatibility with old Spark ones, but also other + systems like Hive, Impala, Presto, etc. This is especially important for decimals. If this + configuration is not enabled, decimals will be written in int-based format in Spark 1.5 and + above, other systems that only support legacy decimal format (fixed length byte array) will not + be able to read what Spark has written. Note other systems may have added support for the + standard format in more recent versions, which will make this configuration unnecessary. Please --- End diff -- It sounds like it isn't quite a legacy format, but one still used by Hive and even considered valid if not current by Parquet? This part I am not sure of, but basing it on Hyukjin's comment above. I suggest a somewhat shorter text like this, what do you think? its length would be more suitable as a config doc below. If `true`, then decimal values will be written in Apache Parquet's fixed-length byte array format. This is used by Spark 1.4 and earlier, and systems like Apache Hive and Apache Impala. If `false`, decimals will be written using the newer int format in Parquet. If Parquet output is intended for use with systems that do not support this newer format, set to `true`.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org