Re: Time for 2.3.2?

2018-07-03 Thread Saisai Shao
FYI, currently we have one block issue ( https://issues.apache.org/jira/browse/SPARK-24535), will start the release after this is fixed. Also please let me know if there're other blocks or fixes want to land to 2.3.2 release. Thanks Saisai Saisai Shao 于2018年7月2日周一 下午1:16写道: > I will start

Re: Revisiting Online serving of Spark models?

2018-07-03 Thread Matei Zaharia
Just wondering, is there an update on this? I haven’t seen a summary of the offline discussion but maybe I’ve missed it. Matei > On Jun 11, 2018, at 8:51 PM, Holden Karau wrote: > > So I kicked of a thread on user@ to collect people's feedback there but I'll > summarize the offline results

Re: [SPARK][SQL][CORE] Running sql-tests

2018-07-03 Thread Marco Gaido
Hi Daniel, please check sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala . You should find all your answers in the comments there. Thanks, Marco 2018-07-03 19:08 GMT+02:00 dmateusp : >

Use Gensim with PySpark

2018-07-03 Thread philipghu
I'm new to PySpark and Python in general, I'm trying to use PySpark to process a large text file using Gensim's preprocess_string function. What I did was simply putting preprocess_string in PySpark's map function: rdd1 = text.map(lambda x: (x.preprocess_string(x, CUSTOM_FILTERS))) it gave me

Re: Spark model serving

2018-07-03 Thread Saikat Kanjilal
Ping, would love to hear back on this. From: Saikat Kanjilal Sent: Tuesday, June 26, 2018 7:27 AM To: dev@spark.apache.org Subject: Spark model serving HoldenK and interested folks, Am just following up on the spark model serving discussions as this is highly

[SPARK][SQL][CORE] Running sql-tests

2018-07-03 Thread dmateusp
Hey everyone! Newbie question, I'm trying to run the tests under spark/sql/core/src/test/resources/sql-tests/inputs/ but I got no luck so far How are they called ? What is even the format of those files ? I've never seen testing in the format of the inputs/ and results/ what's the name of it ?

[Spark Streaming] mapStateByKey Python SDK

2018-07-03 Thread tpawlowski
I am a user of Spark Streaming Python SDK. In many cases stateful operations are the only way to approach problems I am solving. UpdateStateByKey method, which I believe is the only way to introduce a stateful operator, requires some extra code to extract result from the states and to remove

Run Python User Defined Functions / code in Spark with Scala Codebase

2018-07-03 Thread Chetan Khatri
Hello Dear Spark User / Dev, I would like to pass Python user defined function to Spark Job developed using Scala and return value of that function would be returned to DF / Dataset API. Can someone please guide me, which would be best approach to do this. Python function would be mostly

Re: Spark data source resiliency

2018-07-03 Thread assaf.mendelson
You are correct, this solved it. Thanks -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark data source resiliency

2018-07-03 Thread Wenchen Fan
I believe you are using something like `local[8]` as your Spark mater, which can't retry tasks. Please try `local[8, 3]` which can re-try failed tasks 3 times. On Tue, Jul 3, 2018 at 2:42 PM assaf.mendelson wrote: > That is what I expected, however, I did a very simple test (using println >

Re: Spark data source resiliency

2018-07-03 Thread assaf.mendelson
That is what I expected, however, I did a very simple test (using println just to see when the exception is triggered in the iterator) using local master and I saw it failed once and cause the entire operation to fail. Is this something which may be unique to local master (or some default

Re: Spark data source resiliency

2018-07-03 Thread Wenchen Fan
a failure in the data reader results to a task failure, and Spark will re-try the task for you (IIRC re-try 3 times before fail the job). Can you check your Spark log and see if the task fails consistently? On Tue, Jul 3, 2018 at 2:17 PM assaf.mendelson wrote: > Hi All, > > I am implemented a

Spark data source resiliency

2018-07-03 Thread assaf.mendelson
Hi All, I am implemented a data source V2 which integrates with an internal system and I need to make it resilient to errors in the internal data source. The issue is that currently, if there is an exception in the data reader, the exception seems to fail the entire task. I would prefer instead