Optimising multiple hive table join and query in spark

2020-03-14 Thread Manjunath Shetty H
Hi All, We have 10 tables in data warehouse (hdfs/hive) written using ORC format. We are serving a usecase on top of that by joining 4-5 tables using Hive as of now. But it is not fast as we wanted it to be, so we are thinking of using spark for this use case. Any suggestion on this ? Is it

Re: FYI: The evolution on `CHAR` type behavior

2020-03-14 Thread Reynold Xin
I don’t understand this change. Wouldn’t this “ban” confuse the hell out of both new and old users? For old users, their old code that was working for char(3) would now stop working. For new users, depending on whether the underlying metastore char(3) is either supported but different from ansi

FYI: The evolution on `CHAR` type behavior

2020-03-14 Thread Dongjoon Hyun
Hi, All. Apache Spark has been suffered from a known consistency issue on `CHAR` type behavior among its usages and configurations. However, the evolution direction has been gradually moving forward to be consistent inside Apache Spark because we don't have `CHAR` offically. The following is the

[PySpark] How to write HFiles as an 'append' to the same directory?

2020-03-14 Thread Gautham Acharya
I have a process in Apache Spark that attempts to write HFiles to S3 in a batched process. I want the resulting HFiles in the same directory, as they are in the same column family. However, I'm getting a 'directory already exists error' when I try to run this on AWS EMR. How can I write Hfiles

sample syntax in spark-env.sh for env.

2020-03-14 Thread Zahid Rahman
WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable *I was chasing this warning when I found misinformationfrom SPARK training companies such as eudreka who offer weird and wonderful suggestions*