Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-01 Thread Davis Varghese
Sure. I will get one over the weekend -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: HashingTFModel/IDFModel in Structured Streaming

2017-11-01 Thread Bago Amirbekian
Davis I'm looking into this. If you could include some code that I can use to reproduce the error & the stack trace it would be really helpful. On Fri, Oct 20, 2017 at 11:01 AM Joseph Bradley wrote: > Hi Davis, > We've started tracking these issues under this umbrella: >

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-01 Thread Noman Khan
+1 for ultra-low latency Thanks & Regards Noman Get Outlook for Android From: Reynold Xin Sent: Wednesday, 1 November, 21:07 Subject: [Vote] SPIP: Continuous Processing Mode for Structured Streaming To: dev@spark.apache.org Earlier I sent out a discussion thread for

SPARK-22211: Removing an incorrect FOJ optimization

2017-11-01 Thread Henry Robinson
Hi - I'm digging into some Spark SQL tickets, and wanted to ask a procedural question about SPARK-22211 and optimizer changes in general. To summarise the JIRA, Catalyst appears to be incorrectly pushing a limit down below a FULL OUTER JOIN, risking possibly incorrect results. I don't believe

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-01 Thread Reynold Xin
I just replied. On Wed, Nov 1, 2017 at 5:50 PM, Cody Koeninger wrote: > Was there any answer to my question around the effect of changes to > the sink api regarding access to underlying offsets? > > On Wed, Nov 1, 2017 at 11:32 AM, Reynold Xin wrote: >

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-01 Thread Cody Koeninger
Was there any answer to my question around the effect of changes to the sink api regarding access to underlying offsets? On Wed, Nov 1, 2017 at 11:32 AM, Reynold Xin wrote: > Most of those should be answered by the attached design sketch in the JIRA > ticket. > > On Wed, Nov

Announcing Spark on Kubernetes release 0.5.0

2017-11-01 Thread Yinan Li
The Spark on Kubernetes development community is pleased to announce release 0.5.0 of Apache Spark with Kubernetes as a native scheduler back-end! This release includes a few bug fixes and the following features: - Spark R support - Kubernetes 1.8 support - Mounts emptyDir volumes for

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-01 Thread Debasish Das
+1 Is there any design doc related to API/internal changes ? Will CP be the default in structured streaming or it's a mode in conjunction with exisiting behavior. Thanks. Deb On Nov 1, 2017 8:37 AM, "Reynold Xin" wrote: Earlier I sent out a discussion thread for CP in

[Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-01 Thread Reynold Xin
Earlier I sent out a discussion thread for CP in Structured Streaming: https://issues.apache.org/jira/browse/SPARK-20928 It is meant to be a very small, surgical change to Structured Streaming to enable ultra-low latency. This is great timing because we are also designing and implementing data

Re: [SS] Custom Sinks

2017-11-01 Thread Reynold Xin
They will probably both change, but I wouldn't block on the change if you have an immediate need. On Wed, Nov 1, 2017 at 10:41 AM, Anton Okolnychyi < anton.okolnyc...@gmail.com> wrote: > Hi all, > > I have a question about the future of custom data sinks in Structured > Streaming. In

[SS] Custom Sinks

2017-11-01 Thread Anton Okolnychyi
Hi all, I have a question about the future of custom data sinks in Structured Streaming. In particular, I want to know how continuous processing and the Datasource API V2 will impact them. Right now, it is possible to have custom data sinks via the current Datasource API (V1) by implementing

Jenkins upgrade/Test Parallelization & Containerization

2017-11-01 Thread Xin Lu
Hi everyone, I tried sending emails to this list and I'm not sure if it went through so I'm trying again. Anyway, a couple months ago before I left Databricks I was working on a proof of concept that parallelized Spark tests on jenkins. The way it worked was basically it build the spark jars