Re: Hive using Spark engine vs native spark with hive integration.

2020-10-07 Thread Patrick McCarthy
I think a lot will depend on what the scripts do. I've seen some legacy hive scripts which were written in an awkward way (e.g. lots of subqueries, nested explodes) because pre-spark it was the only way to express certain logic. For fairly straightforward operations I expect Catalyst would reduce

Re: Hive using Spark engine vs native spark with hive integration.

2020-10-06 Thread Ricardo Martinelli de Oliveira
My 2 cents is that this is a complicated question since I'm not confident that Spark is 100% compatible with Hive in terms of query language. I have an unanswered question in this list about this:

Hive using Spark engine vs native spark with hive integration.

2020-10-06 Thread Manu Jacob
Hi All, Not sure if I need to ask this question on spark community or hive community. We have a set of hive scripts that runs on EMR (Tez engine). We would like to experiment by moving some of it onto Spark. We are planning to experiment with two options. 1. Use the current code based on