Re: [DISCUSS] Enable blacklisting feature by default in 3.0

2019-04-01 Thread Ankur Gupta
Thanks for your thoughts Chris! Please find my response below: - Rather than a fixed timeout, could we do some sort of exponential backoff? Start with a 10 or 20 second blacklist and increase from there? The nodes with catastrophic errors should quickly hit long blacklist intervals. - +1 I like

Re: [DISCUSS] Spark Columnar Processing

2019-04-01 Thread Reynold Xin
I just realized I didn't make it very clear my stance here ... here's another try: I think it's a no brainer to have a good columnar UDF interface. This would facilitate a lot of high performance applications, e.g. GPU-based accelerations for machine learning algorithms. On rewriting the

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-01 Thread shane knapp
well now! color me completely surprised... i decided to whip up a fresh python3.6.8 conda environment this morning to "see if things just worked". well, apparently they do! :) regardless, this is pretty awesome news as i will be able to easily update the 'py3k' python3.4 environment to a

Re: [DISCUSS] Enable blacklisting feature by default in 3.0

2019-04-01 Thread Chris Stevens
Hey Ankur, I think the significant decrease in "spark.blacklist.timeout" (1 hr down to 5 minutes) in your updated suggestion is the key here. Looking at a few *successful* runs of the application I was debugging, here are the error rates when I did *not* have blacklisting enabled: Run A: 8

Re: Unsubscribe

2019-04-01 Thread William Shen
Vinod, You can send an email to dev-unsubscr...@spark.apache.org to unsubscribe. You should receive an email with instruction to confirm the unsubscribe. On Sun, Mar 31, 2019 at 7:42 AM Vinod V Rangayyan wrote: > I wish to unsubscribe from dev@spark.apache.org > > > > > On Mar 31, 2019, at

Re: [DISCUSS] Enable blacklisting feature by default in 3.0

2019-04-01 Thread Ankur Gupta
Hi Chris, Thanks for sending over the example. As far as I can understand, it seems that this would not have been a problem if "spark.blacklist.application.maxFailedTasksPerExecutor" was set to a higher threshold, as mentioned in my previous email. Though, with 8/7 executors and 2

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-04-01 Thread shane knapp
i'd much prefer that we minimize the number of python versions that we test against... would 2.7 and 3.6 be sufficient? On Fri, Mar 29, 2019 at 10:23 PM Felix Cheung wrote: > I don’t take it as Sept 2019 is end of life for python 3.5 tho. It’s just > saying the next release. > > In any case I