Re: Collections passed from driver to executors

2019-09-23 Thread Reynold Xin
A while ago we changed it so the task gets broadcasted too, so I think the two are fairly similar. On Mon, Sep 23, 2019 at 8:17 PM, Dhrubajyoti Hati < dhruba.w...@gmail.com > wrote: > > I was wondering if anyone could help with this question. > > On Fri, 20 Sep, 2019, 11:52 AM Dhrubajyoti

Re: Collections passed from driver to executors

2019-09-23 Thread Dhrubajyoti Hati
I was wondering if anyone could help with this question. On Fri, 20 Sep, 2019, 11:52 AM Dhrubajyoti Hati, wrote: > Hi, > > I have a question regarding passing a dictionary from driver to executors > in spark on yarn. This dictionary is needed in an udf. I am using pyspark. > > As I understand

Re: [DISCUSS] Spark 2.5 release

2019-09-23 Thread Dongjoon Hyun
Hi, Ryan. This thread has many replied as you see. That is the evidence that the community is interested in your suggestion a lot. > I'm offering to help build a stable release without breaking changes. But if there is no community interest in it, I'm happy to drop this. In this thread, the

[build system] our colo is having power issues again. there will be a few 'events' this week

2019-09-23 Thread Shane Knapp
the main transformer for our colo is experiencing major issues, and campus be performing emergency work on it starting tomorrow morning (tuesday sept 24, 9am PDT). it's pretty dire. :( there's a lot going on, but please expect some sporadic jenkins downtime until monday. here's the abbreviated

Re: Spark 3.0 preview release on-going features discussion

2019-09-23 Thread Xingbo Jiang
Thanks everyone, let me first work on the feature list and major changes that have already been finished in the master branch. Cheers! Xingbo Ryan Blue 于2019年9月20日周五 上午10:56写道: > I’m not sure that DSv2 list is accurate. We discussed this in the DSv2 > sync this week (just sent out the notes)

Re: [DISCUSS] Spark 2.5 release

2019-09-23 Thread Holden Karau
I would personally love to see us provide a gentle migration path to Spark 3 especially if much of the work is already going to happen anyways. Maybe giving it a different name (eg something like Spark-2-to-3-transitional) would make it more clear about its intended purpose and encourage folks to

Re: [DISCUSS] Spark 2.5 release

2019-09-23 Thread Ryan Blue
My understanding is that 3.0-preview is not going to be a production-ready release. For those of us that have been using backports of DSv2 in production, that doesn't help. It also doesn't help as a stepping stone because users would need to handle all of the incompatible changes in 3.0. Using

RE: Spark Tasks Progress

2019-09-23 Thread Ilya Matiach
@Sultan Alamro great question. I had a similar scenario, where workers needed to aggregate host:port information for initializing an MPI ring, and I used direct socket communication between the workers and driver. This is where the driver accepts sockets from

Re: DataFrameReader bottleneck in DataSource#checkAndGlobPathIfNecessary when reading S3 files

2019-09-23 Thread Arwin Tio
Hi Steve, I filed a JIRA and opened a PR for this issue: https://issues.apache.org/jira/browse/SPARK-29089 https://github.com/apache/spark/pull/25899 Please lmk what you think Cheers, Arwin From: Steve Loughran Sent: September 7, 2019 9:22 AM To: Arwin Tio