Re: [VOTE] Update the committer guidelines to clarify when to commit changes.

2020-07-31 Thread Mridul Muralidharan
+1 Thanks, Mridul On Thu, Jul 30, 2020 at 4:49 PM Holden Karau wrote: > Hi Spark Developers, > > After the discussion of the proposal to amend Spark committer guidelines, > it appears folks are generally in agreement on policy clarifications. (See >

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-29 Thread Mridul Muralidharan
I agree, that would be a new feature; and unless compelling reason (like security concerns) would not qualify. Regards, Mridul On Wed, Jul 15, 2020 at 11:46 AM Wenchen Fan wrote: > Supporting Python 3.8.0 sounds like a new feature, and doesn't qualify a > backport. But I'm open to other

Re: [DISCUSS] Amend the commiter guidelines on the subject of -1s & how we expect PR discussion to be treated.

2020-07-23 Thread Mridul Muralidharan
Thanks Holden, this version looks good to me. +1 Regards, Mridul On Thu, Jul 23, 2020 at 3:56 PM Imran Rashid wrote: > Sure, that sounds good to me. +1 > > On Wed, Jul 22, 2020 at 1:50 PM Holden Karau wrote: > >> >> >> On Wed, Jul 22, 2020 at 7:39 AM Imran Rashid < iras...@apache.org > >>

Re: Welcoming some new Apache Spark committers

2020-07-15 Thread Mridul Muralidharan
Congratulations ! Regards, Mridul On Tue, Jul 14, 2020 at 12:37 PM Matei Zaharia wrote: > Hi all, > > The Spark PMC recently voted to add several new committers. Please join me > in welcoming them to their new roles! The new committers are: > > - Huaxin Gao > - Jungtaek Lim > - Dilip Biswal >

Re: [VOTE] Decommissioning SPIP

2020-07-01 Thread Mridul Muralidharan
+1 Thanks, Mridul On Wed, Jul 1, 2020 at 6:36 PM Hyukjin Kwon wrote: > +1 > > 2020년 7월 2일 (목) 오전 10:08, Marcelo Vanzin 님이 작성: > >> I reviewed the docs and PRs from way before an SPIP was explicitly >> asked, so I'm comfortable with giving a +1 even if I haven't really >> fully read the new

Re: [DISCUSS][SPIP] Graceful Decommissioning

2020-06-28 Thread Mridul Muralidharan
Thanks for shepherding this Holden ! I left a few comments, but overall it looks good to me. Regards, Mridul On Sat, Jun 27, 2020 at 9:34 PM Holden Karau wrote: > There’s been some comments & a few additions in the doc, but it seems like > the folks taking a look generally agree on the

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Mridul Muralidharan
Great job everyone ! Congratulations :-) Regards, Mridul On Thu, Jun 18, 2020 at 10:21 AM Reynold Xin wrote: > Hi all, > > Apache Spark 3.0.0 is the first release of the 3.x line. It builds on many > of the innovations from Spark 2.x, bringing new ideas as well as continuing > long-term

Re: [vote] Apache Spark 3.0 RC3

2020-06-07 Thread Mridul Muralidharan
+1 Regards, Mridul On Sat, Jun 6, 2020 at 1:20 PM Reynold Xin wrote: > Apologies for the mistake. The vote is open till 11:59pm Pacific time on > Mon June 9th. > > On Sat, Jun 6, 2020 at 1:08 PM Reynold Xin wrote: > >> Please vote on releasing the following candidate as Apache Spark version

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Mridul Muralidharan
Is this a behavior change in 2.4.x from earlier version ? Or are we proposing to introduce a functionality to help with adoption ? Regards, Mridul On Wed, Jun 3, 2020 at 10:32 AM Xiao Li wrote: > Yes. Spark 3.0 RC2 works well. > > I think the current behavior in Spark 2.4 affects the

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-02 Thread Mridul Muralidharan
+1 (binding) Thanks, Mridul On Sun, May 31, 2020 at 4:47 PM Holden Karau wrote: > Please vote on releasing the following candidate as Apache Spark > version 2.4.6. > > The vote is open until June 5th at 9AM PST and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. >

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Mridul Muralidharan
I agree with what Sean detailed. The only place where I can see some amount of investigation being required would be for security issues or correctness issues. Knowing the affected versions, particularly if an earlier supported version does not have the bug, will help users understand the

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-06 Thread Mridul Muralidharan
I am in broad agreement with the prposal, as any developer, I prefer stable well designed API's :-) Can we tie the proposal to stability guarantees given by spark and reasonable expectation from users ? In my opinion, an unstable or evolving could change - while an experimental api which has been

Re: Is RDD thread safe?

2019-11-25 Thread Mridul Muralidharan
Very well put Imran. This is a variant of executor failure after an RDD has been computed (including caching). In general, non determinism in spark is going to lead to inconsistency. The only reasonable solution for us, at that time, was to make pseudo-randomness repeatable and checkpoint after so

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-20 Thread Mridul Muralidharan
Just for completeness sake, spark is not version neutral to hadoop; particularly in yarn mode, there is a minimum version requirement (though fairly generous I believe). I agree with Steve, it is a long standing pain that we are bundling a positively ancient version of hive. Having said that, we

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-01 Thread Mridul Muralidharan
Makes more sense to drop support for zstd assuming the fix is not something at spark end (configuration, etc). Does not make sense to try to detect deadlock in codec. Regards, Mridul On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim wrote: > > Hi devs, > > I've discovered an issue with event logger,

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-29 Thread Mridul Muralidharan
Add a +1 from me as well. Just managed to finish going over it. Thanks Bobby for leading this effort ! Regards, Mridul On Wed, May 29, 2019 at 2:51 PM Tom Graves wrote: > > Ok, I'm going to call this vote and send the result email. We had 9 +1's (4 > binding) and 1 +0 and no -1's. > > Tom > >

Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API

2019-05-08 Thread Mridul Muralidharan
Unfortunately I do not have bandwidth to do a detailed review, but a few things come to mind after a quick read: - While it might be tactically beneficial to align with existing implementation, a clean design which does not tie into existing shuffle implementation would be preferable (if it can

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Mridul Muralidharan
I am -1 on this vote for pretty much all the reasons that Mark mentioned. A major version change gives us an opportunity to remove deprecated interfaces, stabilize experimental/developer api, drop support for outdated functionality/platforms and evolve the project with a vision for foreseeable

Re: Automated formatting

2018-11-22 Thread Mridul Muralidharan
Is this handling only scala or java as well ? Regards, Mridul On Thu, Nov 22, 2018 at 9:11 AM Cody Koeninger wrote: > Plugin invocation is ./build/mvn mvn-scalafmt_2.12:format > > It takes about 5 seconds, and errors out on the first different file > that doesn't match formatting. > > I made a

Re: data source api v2 refactoring

2018-09-01 Thread Mridul Muralidharan
Is it only me or are all others getting Wenchen’s mails ? (Obviously Ryan did :-) ) I did not see it in the mail thread I received or in archives ... [1] Wondering which othersenderswere getting dropped (if yes). Regards Mridul [1]

Re: SPIP: Executor Plugin (SPARK-24918)

2018-08-29 Thread Mridul Muralidharan
+1 I left a couple of comments in NiharS's PR, but this is very useful to have in spark ! Regards, Mridul On Fri, Aug 3, 2018 at 10:00 AM Imran Rashid wrote: > > I'd like to propose adding a plugin api for Executors, primarily for > instrumentation and debugging >

Re: Set up Scala 2.12 test build in Jenkins

2018-08-06 Thread Mridul Muralidharan
o non-serializable objects etc. > In all these cases you know you are adding references you shouldn't. > If users were used to another UX we can try fix it, not sure how well this > worked in the past though and if covered all cases. > > Regards, > Stavros > > On Mon, Aug 6,

Re: Set up Scala 2.12 test build in Jenkins

2018-08-05 Thread Mridul Muralidharan
I agree, we should not work around the testcase but rather understand and fix the root cause. Closure cleaner should have null'ed out the references and allowed it to be serialized. Regards, Mridul On Sun, Aug 5, 2018 at 8:38 PM Wenchen Fan wrote: > > It seems to me that the closure cleaner

Re: time for Apache Spark 3.0?

2018-06-15 Thread Mridul Muralidharan
I agree, I dont see pressing need for major version bump as well. Regards, Mridul On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra wrote: > > Changing major version numbers is not about new features or a vague notion > that it is time to do something that will be seen to be a significant >

Re: Hadoop 3 support

2018-04-02 Thread Mridul Muralidharan
Specifically to run spark with hadoop 3 docker support, I have filed a few jira's tracked under [1]. Regards, Mridul [1] https://issues.apache.org/jira/browse/SPARK-23717 On Mon, Apr 2, 2018 at 1:00 PM, Reynold Xin wrote: > Does anybody know what needs to be done in order

Re: Welcoming some new committers

2018-03-03 Thread Mridul Muralidharan
Congratulations ! Regards, Mridul On Fri, Mar 2, 2018 at 2:41 PM, Matei Zaharia wrote: > Hi everyone, > > The Spark PMC has recently voted to add several new committers to the > project, based on their contributions to Spark 2.3 and other past work: > > - Anirudh

Re: [Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions for RDD

2018-02-01 Thread Mridul Muralidharan
On Wed, Jan 31, 2018 at 1:15 AM, Ruifeng Zheng wrote: > HI all: > > > >1, Dataset API supports operation “sortWithinPartitions”, but in RDD > API there is no counterpart (I know there is > “repartitionAndSortWithinPartitions”, but I don’t want to repartition the >

Re: Kubernetes backend and docker images

2018-01-05 Thread Mridul Muralidharan
We should definitely clean this up and make it the default, nicely done Marcelo ! Thanks, Mridul On Fri, Jan 5, 2018 at 5:06 PM Marcelo Vanzin wrote: > Hey all, especially those working on the k8s stuff. > > Currently we have 3 docker images that need to be built and

Re: Publishing official docker images for KubernetesSchedulerBackend

2017-11-29 Thread Mridul Muralidharan
We do support running on Apache Mesos via docker images - so this would not be restricted to k8s. But unlike mesos support, which has other modes of running, I believe k8s support more heavily depends on availability of docker images. Regards, Mridul On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen

Re: Should Flume integration be behind a profile?

2017-10-01 Thread Mridul Muralidharan
I agree, proposal 1 sounds better among the options. Regards, Mridul On Sun, Oct 1, 2017 at 3:50 PM, Reynold Xin wrote: > Probably should do 1, and then it is an easier transition in 3.0. > > On Sun, Oct 1, 2017 at 1:28 AM Sean Owen wrote: >> >> I

Re: Welcoming Tejas Patil as a Spark committer

2017-09-29 Thread Mridul Muralidharan
Congratulations Tejas ! Regards, Mridul On Fri, Sep 29, 2017 at 12:58 PM, Matei Zaharia wrote: > Hi all, > > The Spark PMC recently added Tejas Patil as a committer on the > project. Tejas has been contributing across several areas of Spark for > a while, focusing

Re: Should Flume integration be behind a profile?

2017-09-26 Thread Mridul Muralidharan
Sounds good to me. +1 Regards, Mridul On Tue, Sep 26, 2017 at 2:36 AM, Sean Owen wrote: > Not a big deal, but I'm wondering whether Flume integration should at least > be opt-in and behind a profile? it still sees some use (at least on our end) > but not applicable to the

Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Mridul Muralidharan
Congratulations Jerry, well deserved ! Regards, Mridul On Mon, Aug 28, 2017 at 6:28 PM, Matei Zaharia wrote: > Hi everyone, > > The PMC recently voted to add Saisai (Jerry) Shao as a committer. Saisai has > been contributing to many areas of the project for a long

Re: SPIP: Spark on Kubernetes

2017-08-17 Thread Mridul Muralidharan
While I definitely support the idea of Apache Spark being able to leverage kubernetes, IMO it is better for long term evolution of spark to expose appropriate SPI such that this support need not necessarily live within Apache Spark code base. It will allow for multiple backends to evolve,

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Mridul Muralidharan
Congratulations Hyukjin, Sameer ! Regards, Mridul On Mon, Aug 7, 2017 at 8:53 AM, Matei Zaharia wrote: > Hi everyone, > > The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal as > committers. Join me in congratulating both of them and thanking them for

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-04 Thread Mridul Muralidharan
Hi, https://issues.apache.org/jira/browse/SPARK-20202?jql=priority%20%3D%20Blocker%20AND%20affectedVersion%20%3D%20%222.1.1%22%20and%20project%3D%22spark%22 Indicates there is another blocker (SPARK-20197 should have come in the list too, but was marked major). Regards, Mridul On Tue, Apr 4,

Re: welcoming Burak and Holden as committers

2017-01-25 Thread Mridul Muralidharan
Congratulations and welcome Holden and Burak ! Regards, Mridul On Tue, Jan 24, 2017 at 10:13 AM, Reynold Xin wrote: > Hi all, > > Burak and Holden have recently been elected as Apache Spark committers. > > Burak has been very active in a large number of areas in Spark,

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Mridul Muralidharan
Since TaskContext.getPartitionId is part of the public api, it cant be removed as user code can be depending on it (unless we go through a deprecation process for it). Regards, Mridul On Sat, Jan 14, 2017 at 2:02 AM, Jacek Laskowski wrote: > Hi, > > Just noticed that

Edit access for spark confluence wiki

2016-10-04 Thread Mridul Muralidharan
Can someone add me to edit list for the spark wiki please ? Thanks, Mridul - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Mridul Muralidharan
+1 Regards, Mridul On Wed, Sep 28, 2016 at 7:14 PM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.0.1. The vote is open until Sat, Oct 1, 2016 at 20:00 PDT and passes if a > majority of at least 3+1 PMC votes are cast. > >

Re: What's the meaning when the partitions is zero?

2016-09-16 Thread Mridul Muralidharan
When numPartitions is 0, there is no data in the rdd: so getPartition is never invoked. - Mridul On Friday, September 16, 2016, WangJianfei wrote: > if so, we will get exception when the numPartitions is 0. > def getPartition(key: Any): Int = key match { >

Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Mridul Muralidharan
It is good to get clarification, but the way I read it, the issue is whether we publish it as official Apache artifacts (in maven, etc). Users can of course build it directly (and we can make it easy to do so) - as they are explicitly agreeing to additional licenses. Regards Mridul On

Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Mridul Muralidharan
I agree, we should not be publishing both of them. Thanks for bringing this up ! Regards, Mridul On Wed, Sep 7, 2016 at 1:29 AM, Sean Owen wrote: > It's worth calling attention to: > > https://issues.apache.org/jira/browse/SPARK-17418 >

Re: rdd.distinct with Partitioner

2016-06-08 Thread Mridul Muralidharan
The example violates the basic contract of a Partitioner. It does make sense to take Partitioner as a param to distinct - though it is fairly trivial to simulate that in user code as well ... Regards Mridul On Wednesday, June 8, 2016, 汪洋 wrote: > Hi Alexander, > > I

Re: Welcoming Yanbo Liang as a committer

2016-06-03 Thread Mridul Muralidharan
Congratulations Yanbo ! Regards Mridul On Friday, June 3, 2016, Matei Zaharia wrote: > Hi all, > > The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been a > super active contributor in many areas of MLlib. Please join me in > welcoming Yanbo! > >

Re: [DISCUSS] Removing or changing maintainer process

2016-05-19 Thread Mridul Muralidharan
+1 (binding) on removing maintainer process. I agree with your opinion of "automatic " instead of a manual list. Regards Mridul On Thursday, May 19, 2016, Matei Zaharia wrote: > Hi folks, > > Around 1.5 years ago, Spark added a maintainer process for reviewing API >

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mridul Muralidharan
On Friday, April 15, 2016, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Yeah in support of this statement I think that my primary interest in > this Spark Extras and the good work by Luciano here is that anytime we > take bits out of a code base and “move it to GitHub” I see

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-06 Thread Mridul Muralidharan
In general, I agree - it is preferable to break backward compatibility (where unavoidable) only at major versions. Unfortunately, this usually is planned better - with earlier versions announcing intent of the change - deprecation across multiple releases, defaults changed, etc. >From the thread,

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Mridul Muralidharan
I think Reynold's suggestion of using ram disk would be a good way to test if these are the bottlenecks or something else is. For most practical purposes, pointing local dir to ramdisk should effectively give you 'similar' performance as shuffling from memory. Are there concerns with taking that

Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Mridul Muralidharan
required (and this discussion is a sign that the process has not been > > conducted properly as people have concerns, me including). > > > > Thanks Mridul! > > > > Pozdrawiam, > > Jacek Laskowski > > > > https://medium.com/@jaceklaskowski/ > >

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-25 Thread Mridul Muralidharan
ts to support scala 2.10 three years after they did the last > maintenance release? > > > On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mri...@gmail.com > <javascript:_e(%7B%7D,'cvml','mri...@gmail.com');>> wrote: > >> Removing compatibility (with jdk, etc

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Mridul Muralidharan
ts to support scala 2.10 three years after they did the last > maintenance release? > > > On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mri...@gmail.com > <javascript:_e(%7B%7D,'cvml','mri...@gmail.com');>> wrote: > >> Removing compatibility (with jdk, etc

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Mridul Muralidharan
Removing compatibility (with jdk, etc) can be done with a major release- given that 7 has been EOLed a while back and is now unsupported, we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from now). Given the functionality & performance benefits of going to jdk8, future

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Mridul Muralidharan
Container Java version can be different from yarn Java version : we run jobs with jdk8 on jdk7 cluster without issues. Regards Mridul On Thursday, March 24, 2016, Koert Kuipers wrote: > i guess what i am saying is that in a yarn world the only hard > restrictions left are

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Mridul Muralidharan
+1 Agree, dropping support for java 7 is long overdue - and 2.0 would be a logical release to do this on. Regards, Mridul On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin wrote: > About a year ago we decided to drop Java 6 support in Spark 1.5. I am > wondering if we should

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Mridul Muralidharan
I was not aware of a discussion in Dev list about this - agree with most of the observations. In addition, I did not see PMC signoff on moving (sub-)modules out. Regards Mridul On Thursday, March 17, 2016, Marcelo Vanzin wrote: > Hello all, > > Recently a lot of the

Re: [discuss] making SparkEnv private in Spark 2.0

2016-03-19 Thread Mridul Muralidharan
We use it in executors to get to : a) spark conf (for getting to hadoop config in map doing custom writing of side-files) b) Shuffle manager (to get shuffle reader) Not sure if there are alternative ways to get to these. Regards, Mridul On Wed, Mar 16, 2016 at 2:52 PM, Reynold Xin

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Mridul Muralidharan
t Kafka specifically > > https://issues.apache.org/jira/browse/SPARK-13877 > > > On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan <mri...@gmail.com> wrote: >> >> I was not aware of a discussion in Dev list about this - agree with most of >> the observations. >> In add

Re: Automated close of PR's ?

2015-12-31 Thread Mridul Muralidharan
open by people out there anyway) > > On Thu, Dec 31, 2015 at 3:25 AM, Mridul Muralidharan <mri...@gmail.com> wrote: >> I am not sure of others, but I had a PR close from under me where >> ongoing discussion was as late as 2 weeks back. >> Given this, I assumed it was

Re: Automated close of PR's ?

2015-12-30 Thread Mridul Muralidharan
ividual ones. > > > On Wednesday, December 30, 2015, Mridul Muralidharan <mri...@gmail.com> > wrote: >> >> Is there a script running to close "old" PR's ? I was not aware of any >> discussion about this in dev list. >> >> - Mridul >> >> -

Automated close of PR's ?

2015-12-30 Thread Mridul Muralidharan
Is there a script running to close "old" PR's ? I was not aware of any discussion about this in dev list. - Mridul - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail:

Re: A proposal for Spark 2.0

2015-12-03 Thread Mridul Muralidharan
There was a proposal to make schedulers pluggable in context of adding one which leverages Apache Tez : IIRC it was a abandoned - but the jira might be a good starting point. Regards Mridul On Dec 3, 2015 2:59 PM, "Rad Gruchalski" wrote: > There was a talk in this thread

Re: A proposal for Spark 2.0

2015-11-10 Thread Mridul Muralidharan
Would be also good to fix api breakages introduced as part of 1.0 (where there is missing functionality now), overhaul & remove all deprecated config/features/combinations, api changes that we need to make to public api which has been deferred for minor releases. Regards, Mridul On Tue, Nov 10,

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-14 Thread Mridul Muralidharan
What I understood from Imran's mail (and what was referenced in his mail) the RDD mentioned seems to be violating some basic contracts on how partitions are used in spark [1]. They cannot be arbitrarily numbered,have duplicates, etc. Extending RDD to add functionality is typically for niche

Re: Data source aliasing

2015-07-30 Thread Mridul Muralidharan
Would be a good idea to generalize this for spark core - and allow for its use in serde, compression, etc. Regards, Mridul On Thu, Jul 30, 2015 at 11:33 AM, Joseph Batchik josephbatc...@gmail.com wrote: Yep I was looking into using the jar service loader. I pushed a rough draft to my fork of

Re: Asked to remove non-existent executor exception

2015-07-26 Thread Mridul Muralidharan
Simply customize your log4j confit instead of modifying code if you don't want messages from that class. Regards Mridul On Sunday, July 26, 2015, Sea 261810...@qq.com wrote: This exception is so ugly!!! The screen is full of these information when the program runs a long time, and they

Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Mridul Muralidharan
the only thing that changed is the location of some scripts in mesos/ to amplab/). Thanks Shivaram On Mon, Jul 20, 2015 at 12:55 PM, Mridul Muralidharan mri...@gmail.com wrote: Might be a good idea to get the PMC's of both projects to sign off to prevent future issues with apache. Regards

Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Mridul Muralidharan
of the Apache Mesos project. It was a remnant part of Spark from when Spark used to live at github.com/mesos/spark. Shivaram On Tue, Jul 21, 2015 at 11:03 AM, Mridul Muralidharan mri...@gmail.com wrote: If I am not wrong, since the code was hosted within mesos project repo, I assume (atleast part

Re: Should spark-ec2 get its own repo?

2015-07-20 Thread Mridul Muralidharan
Might be a good idea to get the PMC's of both projects to sign off to prevent future issues with apache. Regards, Mridul On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: I've created https://github.com/amplab/spark-ec2 and added an initial set of

Re: [discuss] Removing individual commit messages from the squash commit message

2015-07-18 Thread Mridul Muralidharan
Just to clarify, the proposal is to have a single commit msg giving the jira and pr id? That sounds like a good change to have. Regards Mridul On Saturday, July 18, 2015, Reynold Xin r...@databricks.com wrote: I took a look at the commit messages in git log -- it looks like the individual

If gmail, check sparm

2015-07-18 Thread Mridul Muralidharan
https://plus.google.com/+LinusTorvalds/posts/DiG9qANf5PA I have noticed a bunch of mails from dev@ and github going to spam - including spark maliing list. Might be a good idea for dev, committers to check if they are missing things in their spam folder if on gmail. Regards, Mridul

Re: [discuss] Removing individual commit messages from the squash commit message

2015-07-18 Thread Mridul Muralidharan
description 3. List of authors contributing to the patch The main thing that changes is 3: we used to also include the individual commits to the pull request branch that are squashed. On Sat, Jul 18, 2015 at 3:45 PM, Mridul Muralidharan mri...@gmail.com javascript:_e(%7B%7D,'cvml','mri...@gmail.com

Re: Increase partition count (repartition) without shuffle

2015-06-18 Thread Mridul Muralidharan
If you can scan input twice, you can of course do per partition count and build custom RDD which can reparation without shuffle. But nothing off the shelf as Sandy mentioned. Regards Mridul On Thursday, June 18, 2015, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Alexander, There is currently

OT: Key types which have potential issues

2015-05-19 Thread Mridul Muralidharan
Hi, I vaguely remember issues with using float/double as keys in MR (and spark ?). But cant seem to find documentation/analysis about the same. Does anyone have some resource/link I can refer to ? Thanks, Mridul - To

Re: Change for submitting to yarn in 1.3.1

2015-05-11 Thread Mridul Muralidharan
That works when it is launched from same process - which is unfortunately not our case :-) - Mridul On Sun, May 10, 2015 at 9:05 PM, Manku Timma manku.tim...@gmail.com wrote: sc.applicationId gives the yarn appid. On 11 May 2015 at 08:13, Mridul Muralidharan mri...@gmail.com wrote: We had

Re: YARN mode startup takes too long (10+ secs)

2015-05-11 Thread Mridul Muralidharan
For tiny/small clusters (particularly single tenet), you can set it to lower value. But for anything reasonably large or multi-tenet, the request storm can be bad if large enough number of applications start aggressively polling RM. That is why the interval is set to configurable. - Mridul On

Re: Change for submitting to yarn in 1.3.1

2015-05-10 Thread Mridul Muralidharan
We had a similar requirement, and as a stopgap, I currently use a suboptimal impl specific workaround - parsing it out of the stdout/stderr (based on log config). A better means to get to this is indeed required ! Regards, Mridul On Sun, May 10, 2015 at 7:33 PM, Ron's Yahoo!

Re: [discuss] ending support for Java 6?

2015-05-02 Thread Mridul Muralidharan
We could build on minimum jdk we support for testing pr's - which will automatically cause build failures in case code uses newer api ? Regards, Mridul On Fri, May 1, 2015 at 2:46 PM, Reynold Xin r...@databricks.com wrote: It's really hard to inspect API calls since none of us have the Java

Re: [discuss] ending support for Java 6?

2015-05-02 Thread Mridul Muralidharan
... ;) On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan mri...@gmail.com wrote: We could build on minimum jdk we support for testing pr's - which will automatically cause build failures in case code uses newer api ? Regards, Mridul On Fri, May 1, 2015 at 2:46 PM, Reynold Xin r

Re: Why does SortShuffleWriter write to disk always?

2015-05-02 Thread Mridul Muralidharan
I agree, this is better handled by the filesystem cache - not to mention, being able to do zero copy writes. Regards, Mridul On Sat, May 2, 2015 at 10:26 PM, Reynold Xin r...@databricks.com wrote: I've personally prototyped completely in-memory shuffle for Spark 3 times. However, it is unclear

Re: Should we let everyone set Assignee?

2015-04-24 Thread Mridul Muralidharan
This is a great suggestion - definitely makes sense to have it. Regards, Mridul On Fri, Apr 24, 2015 at 11:08 AM, Patrick Wendell pwend...@gmail.com wrote: It's a bit of a digression - but Steve's suggestion that we have a mailing list for new issues is a great idea and we can do it easily.

Re: broadcast hang out

2015-03-15 Thread Mridul Muralidharan
Cross region as in different data centers ? - Mridul On Sun, Mar 15, 2015 at 8:08 PM, lonely Feb lonely8...@gmail.com wrote: Hi all, i meet up with a problem that torrent broadcast hang out in my spark cluster (1.2, standalone) , particularly serious when driver and executors are

Re: Spark config option 'expression language' feedback request

2015-03-13 Thread Mridul Muralidharan
Let me try to rephrase my query. How can a user specify, for example, what the executor memory should be or number of cores should be. I dont want a situation where some variables can be specified using one set of idioms (from this PR for example) and another set cannot be. Regards, Mridul

Re: May we merge into branch-1.3 at this point?

2015-03-13 Thread Mridul Muralidharan
Who is managing 1.3 release ? You might want to coordinate with them before porting changes to branch. Regards Mridul On Friday, March 13, 2015, Sean Owen so...@cloudera.com wrote: Yeah, I'm guessing that is all happening quite literally as we speak. The Apache git tag is the one of

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-09 Thread Mridul Muralidharan
In ideal situation, +1 on removing all vendor specific builds and making just hadoop version specific - that is what we should depend on anyway. Though I hope Sean is correct in assuming that vendor specific builds for hadoop 2.4 are just that; and not 2.4- or 2.4+ which cause incompatibilities

Re: enum-like types in Spark

2015-03-05 Thread Mridul Muralidharan
While I dont have any strong opinions about how we handle enum's either way in spark, I assume the discussion is targetted at (new) api being designed in spark. Rewiring what we already have exposed will lead to incompatible api change (StorageLevel for example, is in 1.0). Regards, Mridul On

Re: enum-like types in Spark

2015-03-05 Thread Mridul Muralidharan
I have a strong dislike for java enum's due to the fact that they are not stable across JVM's - if it undergoes serde, you end up with unpredictable results at times [1]. One of the reasons why we prevent enum's from being key : though it is highly possible users might depend on it internally

Re: 2GB limit for partitions?

2015-02-04 Thread Mridul Muralidharan
, seems promising. thanks, Imran On Tue, Feb 3, 2015 at 7:32 PM, Mridul Muralidharan mri...@gmail.com javascript:_e(%7B%7D,'cvml','mri...@gmail.com'); wrote: That is fairly out of date (we used to run some of our jobs on it ... But that is forked off 1.1 actually). Regards Mridul

Re: Welcoming three new committers

2015-02-03 Thread Mridul Muralidharan
Congratulations ! Keep up the good work :-) Regards Mridul On Tuesday, February 3, 2015, Matei Zaharia matei.zaha...@gmail.com wrote: Hi all, The PMC recently voted to add three new committers: Cheng Lian, Joseph Bradley and Sean Owen. All three have been major contributors to Spark in

Re: 2GB limit for partitions?

2015-02-03 Thread Mridul Muralidharan
That is fairly out of date (we used to run some of our jobs on it ... But that is forked off 1.1 actually). Regards Mridul On Tuesday, February 3, 2015, Imran Rashid iras...@cloudera.com wrote: Thanks for the explanations, makes sense. For the record looks like this was worked on a while

Re: keeping PR titles / descriptions up to date

2014-12-02 Thread Mridul Muralidharan
I second that ! Would also be great if the JIRA was updated accordingly too. Regards, Mridul On Wed, Dec 3, 2014 at 1:53 AM, Kay Ousterhout kayousterh...@gmail.com wrote: Hi all, I've noticed a bunch of times lately where a pull request changes to be pretty different from the original pull

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Mridul Muralidharan
Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which

Re: [VOTE] Release Apache Spark 1.1.0 (RC1)

2014-08-28 Thread Mridul Muralidharan
Is SPARK-3277 applicable to 1.1 ? If yes, until it is fixed, I am -1 on the release (I am on break, so can't verify or help fix, sorry). Regards Mridul On 28-Aug-2014 9:33 pm, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version

Re: [VOTE] Release Apache Spark 1.1.0 (RC1)

2014-08-28 Thread Mridul Muralidharan
and we'll patch it and spin a new RC. We can also update the test coverage to cover LZ4. - Patrick On Thu, Aug 28, 2014 at 9:27 AM, Mridul Muralidharan mri...@gmail.com wrote: Is SPARK-3277 applicable to 1.1 ? If yes, until it is fixed, I am -1 on the release (I am on break, so can't

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Mridul Muralidharan
Weird that Patrick did not face this while creating the RC. Essentially the yarn alpha pom.xml has not been updated properly in the 1.1 branch. Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to make it same as any other pom). Regards, Mridul On Thu, Aug 21, 2014 at 5:09 AM,

Re: Unit tests in 5 minutes

2014-08-09 Thread Mridul Muralidharan
Issue with supporting this imo is the fact that scala-test uses the same vm for all the tests (surefire plugin supports fork, but scala-test ignores it iirc). So different tests would initialize different spark context, and can potentially step on each others toes. Regards, Mridul On Fri, Aug

Re: -1s on pull requests?

2014-08-05 Thread Mridul Muralidharan
Just came across this mail, thanks for initiating this discussion Kay. To add; another issue which recurs is very rapid commit's: before most contributors have had a chance to even look at the changes proposed. There is not much prior discussion on the jira or pr, and the time between submitting

Re: better compression codecs for shuffle blocks?

2014-07-14 Thread Mridul Muralidharan
We tried with lower block size for lzf, but it barfed all over the place. Snappy was the way to go for our jobs. Regards, Mridul On Mon, Jul 14, 2014 at 12:31 PM, Reynold Xin r...@databricks.com wrote: Hi Spark devs, I was looking into the memory usage of shuffle and one annoying thing is

Unresponsive to PR/jira changes

2014-07-09 Thread Mridul Muralidharan
Hi, I noticed today that gmail has been marking most of the mails from spark github/jira I was receiving to spam folder; and I was assuming it was lull in activity due to spark summit for past few weeks ! In case I have commented on specific PR/JIRA issues and not followed up, apologies for

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

2014-07-08 Thread Mridul Muralidharan
You are ignoring serde costs :-) - Mridul On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson ilike...@gmail.com wrote: Tachyon should only be marginally less performant than memory_only, because we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer the data over a pipe from

<    1   2   3   >