Re: [SPARK-29898][SQL] Support Avro Custom Logical Types

2019-11-22 Thread Gengliang Wang
Hi Carlos, To write Avro files with a schema different from the default mapping, you can use the option "avroSchema": df.write.format("avro").option("avroSchema", avroSchemaAsJSONStringFormat)... See

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

2019-11-22 Thread Sean Owen
I haven't been following this closely, but I'm aware that there are some tricky compatibility problems between Avro and Parquet, both of which are used in Spark. That's made it pretty hard to update in 2.x. master/3.0 is on Parquet 1.10.1 and Avro 1.8.2. Just a general question: is that the best

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

2019-11-22 Thread Michael Heuer
Hello, I am sorry for asking a somewhat inappropriate question. For context, our projects depend on a fix in Parquet master but not yet released. Parquet 1.11.0 is in release-candidate phase. It looks like we can't build against Parquet 1.11.0 RC to include the fix and run successfully on

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-22 Thread Dongjoon Hyun
Thank you, Steve and all. As a conclusion of this thread, we will merge the following PR and move forward. [SPARK-29981][BUILD] Add hive-1.2/2.3 profiles https://github.com/apache/spark/pull/26619 Please leave your comments if you have any concern. And, the following PRs and more will

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

2019-11-22 Thread Dongjoon Hyun
Hi, Michael. I'm not sure Apache Spark is in the status close to what you want. First, both Apache Spark 3.0.0-preview and Apache Spark 2.4 is using Avro 1.8.2. Also, `master` and `branch-2.4` branch does. Cutting new releases do not provide you what you want. Do we have a PR on the master

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

2019-11-22 Thread Ryan Blue
Just to clarify, I don't think that Parquet 1.10.1 to 1.11.0 is a runtime-incompatible change. The example mixed 1.11.0 and 1.10.1 in the same execution. Michael, please be more careful about announcing compatibility problems in other communities. If you've observed problems, let's find out the

Re: Spark 2.4.5 release for Parquet and Avro dependency updates?

2019-11-22 Thread Nan Zhu
I am not sure if it is a good practice to have breaking changes in dependencies for maintenance releases On Fri, Nov 22, 2019 at 8:56 AM Michael Heuer wrote: > Hello, > > Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that > Parquet 1.10.1 to 1.11 will be a

Spark 2.4.5 release for Parquet and Avro dependency updates?

2019-11-22 Thread Michael Heuer
Hello, Avro 1.8.2 to 1.9.1 is a binary incompatible update, and it appears that Parquet 1.10.1 to 1.11 will be a runtime-incompatible update (see thread on dev@parquet ).

[SPARK-29898][SQL] Support Avro Custom Logical Types

2019-11-22 Thread Carlos del Prado Mota
Hi there, I recently proposed a change to add support for custom logical types for Avro in Spark. This change provides capabilities to build custom types conversions between StructType and Avro and is fully compatible with the current solution. This is the link for the solution and I would

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-22 Thread Steve Loughran
On Tue, Nov 19, 2019 at 10:40 PM Cheng Lian wrote: > Hey Steve, > > In terms of Maven artifact, I don't think the default Hadoop version > matters except for the spark-hadoop-cloud module, which is only meaningful > under the hadoop-3.2 profile. All the other spark-* artifacts published to >

Re: The Myth: the forked Hive 1.2.1 is stabler than XXX

2019-11-22 Thread Steve Loughran
On Thu, Nov 21, 2019 at 12:53 AM Dongjoon Hyun wrote: > Thank you for much thoughtful clarification. I agree with your all options. > > Especially, for Hive Metastore connection, `Hive isolated client loader` > is also important with Hive 2.3 because Hive 2.3 client cannot talk with > Hive 2.1