Re: Spark SQL reads all leaf directories on a partitioned Hive table

2019-08-12 Thread Subash Prabakar
I had the similar issue reading the external parquet table . In my case I had permission issue in one partition so I added filter to exclude that partition but still the spark didn’t prune it. Then I read that in order for spark to be aware of all the partitions it first read the folders and then

Continuous processing mode and python udf

2019-08-12 Thread zenglong chen
Does Spark 2.4.0 support Python UDFs with Continuous Processing mode? I try it and occur error like below: WARN scheduler.TaskSetManager: Lost task 4.0 in stage 0.0 (TID 4, 172.22.9.179, executor 1): java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:347) at

Custom aggregations: modular and lightweight solutions?

2019-08-12 Thread Andrew Leverentz
Hi All, I'm attempting to clean up some Spark code which performs groupByKey / mapGroups to compute custom aggregations, and I could use some help understanding the Spark API's necessary to make my code more modular and maintainable. In particular, my current approach is as follows: - Start

Re: Spark streaming dataframe extract message to new columns

2019-08-12 Thread Gourav Sengupta
Hi, I think that it should be possible to write a query on the streaming data frame and then write the output of the query to S3 or any other sink layer. Regards, Gourav Sengupta On Sat, Aug 10, 2019 at 9:24 AM zenglong chen wrote: > How to extract some message in streaming dataframe and make