Is there a way to do partial sort in update mode in Spark Structured Streaming?

2017-06-03 Thread kant kodali
Hi All, 1. Is there a way to do partial sort of say (timestamp column) in update mode? I am currently using Spark 2.1.1 and its looks like it is not possible however I am wondering if this possible in 2.2? 2. can we do full sort in update mode with a specified watermark? since after a specified

SparkAppHandle.Listener.infoChanged behaviour

2017-06-03 Thread Mohammad Tariq
Dear fellow Spark users, I am having a bit of difficulty in understanding the exact behaviour of *SparkAppHandle.Listener.infoChanged(SparkAppHandle handle)* method. The documentation says : *Callback for changes in any information that is not the handle's state.* What exactly is meant by *any

Re: Is there a way to do conditional group by in spark 2.1.1?

2017-06-03 Thread Bryan Jeffrey
You should be able to project a new column that is your group column. Then you can group on the projected column. Get Outlook for Android On Sat, Jun 3, 2017 at 6:26 PM -0400, "upendra 1991" wrote: Use a function Sent from Yahoo Mail on

Re: Is there a way to do conditional group by in spark 2.1.1?

2017-06-03 Thread upendra 1991
Use a function Sent from Yahoo Mail on Android On Sat, Jun 3, 2017 at 5:01 PM, kant kodali wrote: Hi All, Is there a way to do conditional group by in spark 2.1.1? other words, I want to do something like this if (field1 == "foo") {        df.groupBy(field1) } else

Is there a way to do conditional group by in spark 2.1.1?

2017-06-03 Thread kant kodali
Hi All, Is there a way to do conditional group by in spark 2.1.1? other words, I want to do something like this if (field1 == "foo") { df.groupBy(field1) } else if (field2 == "bar") df.groupBy(field2) Thanks

Parquet Read Speed: Spark SQL vs Parquet MR

2017-06-03 Thread Mike Wheeler
Hi Spark User, I have run into some situation where Spark SQL is much slower than Parquet MR for processing parquet files. Can you provide some guidance on optimization? Suppose I have a table "person" with columns: gender, age, name, address, etc, which is stored in parquet files. I tried two