Re: [ANNOUNCE] Apache Spark 3.4.1 released

2023-06-23 Thread Mridul Muralidharan
Thanks Dongjoon ! Regards, Mridul On Fri, Jun 23, 2023 at 6:58 PM Dongjoon Hyun wrote: > We are happy to announce the availability of Apache Spark 3.4.1! > > Spark 3.4.1 is a maintenance release containing stability fixes. This > release is based on the branch-3.4 maintenance branch of Spark.

Re: Slack for PySpark users

2023-03-30 Thread Mridul Muralidharan
Thanks for flagging the concern Dongjoon, I was not aware of the discussion - but I can understand the concern. Would be great if you or Matei could update the thread on the result of deliberations, once it reaches a logical consensus: before we set up official policy around it. Regards, Mridul

Re: Spark Push-Based Shuffle causing multiple stage failures

2022-05-24 Thread Mridul Muralidharan
+CC zhouye...@gmail.com On Mon, May 23, 2022 at 7:11 AM Han Altae-Tran wrote: > Hi, > > First of all, I am very thankful for all of the amazing work that goes > into this project! It has opened up so many doors for me! I am a long > time Spark user, and was very excited to start working with

Re: [ANNOUNCE] Apache Spark 3.2.0

2021-10-19 Thread Mridul Muralidharan
Congratulations everyone ! And thanks Gengliang for sheparding the release out :-) Regards, Mridul On Tue, Oct 19, 2021 at 9:25 AM Yuming Wang wrote: > Congrats and thanks! > > On Tue, Oct 19, 2021 at 10:17 PM Gengliang Wang wrote: > >> Hi all, >> >> Apache Spark 3.2.0 is the third release

Re: Mesos + Spark users going forward?

2021-04-07 Thread Mridul Muralidharan
Unfortunate about Mesos, +1 on deprecation of mesos integration. Regards, Mridul On Wed, Apr 7, 2021 at 7:12 AM Sean Owen wrote: > I noted that Apache Mesos is moving to the attic, so won't be actively > developed soon: > >

Re: [ANNOUNCE] Announcing Apache Spark 3.1.1

2021-03-02 Thread Mridul Muralidharan
Thanks Hyukjin and congratulations everyone on the release ! Regards, Mridul On Tue, Mar 2, 2021 at 8:54 PM Yuming Wang wrote: > Great work, Hyukjin! > > On Wed, Mar 3, 2021 at 9:50 AM Hyukjin Kwon wrote: > >> We are excited to announce Spark 3.1.1 today. >> >> Apache Spark 3.1.1 is the

Re: Apache Spark 3.1 Preparation Status (Oct. 2020)

2020-10-04 Thread Mridul Muralidharan
+1 on pushing the branch cut for increased dev time to match previous releases. Regards, Mridul On Sat, Oct 3, 2020 at 10:22 PM Xiao Li wrote: > Thank you for your updates. > > Spark 3.0 got released on Jun 18, 2020. If Nov 1st is the target date of > the 3.1 branch cut, the feature

Re: LiveListenerBus is occupying most of the Driver Memory and frequent GC is degrading the performance

2020-08-11 Thread Mridul Muralidharan
Hi, 50% of driver time being spent in gc just for listenerbus sounds very high in a 30G heap. Did you try to take a heap dump and see what is occupying so much memory ? This will help us eliminate if the memory usage is due to some user code/library holding references to large objects/graph of

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Mridul Muralidharan
Great job everyone ! Congratulations :-) Regards, Mridul On Thu, Jun 18, 2020 at 10:21 AM Reynold Xin wrote: > Hi all, > > Apache Spark 3.0.0 is the first release of the 3.x line. It builds on many > of the innovations from Spark 2.x, bringing new ideas as well as continuing > long-term

Re: Reading from and writing to different S3 buckets in spark

2016-10-12 Thread Mridul Muralidharan
If using RDD's, you can use saveAsHadoopFile or saveAsNewAPIHadoopFile with the conf passed in which overrides the keys you need. For example, you can do : val saveConf = new Configuration(sc.hadoopConfiguration) // configure saveConf with overridden s3 config rdd.saveAsNewAPIHadoopFile(..., conf

Re: [ANNOUNCE] Apache Bahir 2.0.0

2016-08-15 Thread Mridul Muralidharan
Congratulations, great job everyone ! Regards, Mridul On Mon, Aug 15, 2016 at 2:19 PM, Luciano Resende wrote: > The Apache Bahir PMC is pleased to announce the release of Apache Bahir > 2.0.0 which is our first major release and provides the following > extensions for

Re: [ANNOUNCE] Apache Bahir 2.0.0

2016-08-15 Thread Mridul Muralidharan
Congratulations, great job everyone ! Regards Mridul On Monday, August 15, 2016, Luciano Resende wrote: > The Apache Bahir PMC is pleased to announce the release of Apache Bahir > 2.0.0 which is our first major release and provides the following > extensions for Apache

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Mridul Muralidharan
I think Reynold's suggestion of using ram disk would be a good way to test if these are the bottlenecks or something else is. For most practical purposes, pointing local dir to ramdisk should effectively give you 'similar' performance as shuffling from memory. Are there concerns with taking that

Re: [discuss] making SparkEnv private in Spark 2.0

2016-03-19 Thread Mridul Muralidharan
We use it in executors to get to : a) spark conf (for getting to hadoop config in map doing custom writing of side-files) b) Shuffle manager (to get shuffle reader) Not sure if there are alternative ways to get to these. Regards, Mridul On Wed, Mar 16, 2016 at 2:52 PM, Reynold Xin

Re: [discuss] making SparkEnv private in Spark 2.0

2016-03-18 Thread Mridul Muralidharan
wrote: > > On Wed, Mar 16, 2016 at 3:29 PM, Mridul Muralidharan <mri...@gmail.com> > wrote: >> >> b) Shuffle manager (to get shuffle reader) > > > What's the use case for shuffle manager/reader? This seems like u

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-14 Thread Mridul Muralidharan
What I understood from Imran's mail (and what was referenced in his mail) the RDD mentioned seems to be violating some basic contracts on how partitions are used in spark [1]. They cannot be arbitrarily numbered,have duplicates, etc. Extending RDD to add functionality is typically for niche

Re: Asked to remove non-existent executor exception

2015-07-26 Thread Mridul Muralidharan
Simply customize your log4j confit instead of modifying code if you don't want messages from that class. Regards Mridul On Sunday, July 26, 2015, Sea 261810...@qq.com wrote: This exception is so ugly!!! The screen is full of these information when the program runs a long time, and they

Re: 2GB limit for partitions?

2015-02-04 Thread Mridul Muralidharan
, seems promising. thanks, Imran On Tue, Feb 3, 2015 at 7:32 PM, Mridul Muralidharan mri...@gmail.com javascript:_e(%7B%7D,'cvml','mri...@gmail.com'); wrote: That is fairly out of date (we used to run some of our jobs on it ... But that is forked off 1.1 actually). Regards Mridul

Re: 2GB limit for partitions?

2015-02-03 Thread Mridul Muralidharan
That is fairly out of date (we used to run some of our jobs on it ... But that is forked off 1.1 actually). Regards Mridul On Tuesday, February 3, 2015, Imran Rashid iras...@cloudera.com wrote: Thanks for the explanations, makes sense. For the record looks like this was worked on a while

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Mridul Muralidharan
Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which