Feature request: split dataset based on condition

2019-02-01 Thread Moein Hosseini
I've seen many application need to split dataset to multiple datasets based on some conditions. As there is no method to do it in one place, developers use *filter *method multiple times. I think it can be useful to have method to split dataset based on condition in one iteration, something like

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-01 Thread Koert Kuipers
introducing hive serdes in sql core sounds a bit like a step back to me. how can you build spark without hive support if there are imports for org. apache.hadoop.hive.serde2 in sql core? are these imports very limited in scope (and not suck all of hive into it)? On Fri, Feb 1, 2019 at 3:03 PM

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-01 Thread Felix Cheung
What’s the update and next step on this? We have real users getting blocked by this issue. From: Xiao Li Sent: Wednesday, January 16, 2019 9:37 AM To: Ryan Blue Cc: Marcelo Vanzin; Hyukjin Kwon; Sean Owen; Felix Cheung; Yuming Wang; dev Subject: Re: [DISCUSS]

Re: Structured streaming from Kafka by timestamp

2019-02-01 Thread Tomas Bartalos
Hello, sorry for my late answer. You're right, what I'm doing is a one time query, not a structured streaming. Probably it will be best to describe my use case: I'd like to expose live data (via jdbc/odbc) residing in Kafka with the power of spark's distributed sql engine. As jdbc server I use