Re: Spark 0.9.1 release

2014-03-19 Thread Mridul Muralidharan
Would be great if the garbage collection PR is also committed - if not the whole thing, atleast the part to unpersist broadcast variables explicitly would be great. Currently we are running with a custom impl which does something similar, and I would like to move to standard distribution for that.

Re: Spark 0.9.1 release

2014-03-19 Thread Mridul Muralidharan
of April (not too far ;) ). TD On Wed, Mar 19, 2014 at 5:57 PM, Mridul Muralidharan mri...@gmail.comwrote: Would be great if the garbage collection PR is also committed - if not the whole thing, atleast the part to unpersist broadcast variables explicitly would be great. Currently we

Re: Spark 0.9.1 release

2014-03-25 Thread Mridul Muralidharan
reasonably long running job (30 mins+) working on non trivial dataset will fail due to accumulated failures in spark. Regards, Mridul TD On Tue, Mar 25, 2014 at 8:44 PM, Mridul Muralidharan mri...@gmail.comwrote: Forgot to mention this in the earlier request for PR's

JIRA. github and asf updates

2014-03-29 Thread Mridul Muralidharan
Hi, So we are now receiving updates from three sources for each change to the PR. While each of them handles a corner case which others might miss, would be great if we could minimize the volume of duplicated communication. Regards, Mridul

Re: JIRA. github and asf updates

2014-03-29 Thread Mridul Muralidharan
unsubscribe yourself from any of these sources, right? - Patrick On Sat, Mar 29, 2014 at 11:05 AM, Mridul Muralidharan mri...@gmail.comwrote: Hi, So we are now receiving updates from three sources for each change to the PR. While each of them handles a corner case which others might miss

ephemeral storage level in spark ?

2014-04-05 Thread Mridul Muralidharan
Hi, We have a requirement to use a (potential) ephemeral storage, which is not within the VM, which is strongly tied to a worker node. So source of truth for a block would still be within spark; but to actually do computation, we would need to copy data to external device (where it might lie

Re: ephemeral storage level in spark ?

2014-04-05 Thread Mridul Muralidharan
is stored in a remote cluster or machines. And the goal is to load the remote raw data only once? Haoyuan On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan mri...@gmail.com wrote: Hi, We have a requirement to use a (potential) ephemeral storage, which is not within the VM, which

Re: all values for a key must fit in memory

2014-04-20 Thread Mridul Muralidharan
An iterator does not imply data has to be memory resident. Think merge sort output as an iterator (disk backed). Tom is actually planning to work on something similar with me on this hopefully this or next month. Regards, Mridul On Sun, Apr 20, 2014 at 11:46 PM, Sandy Ryza

Re: bug using kryo as closure serializer

2014-05-04 Thread Mridul Muralidharan
On a slightly related note (apologies Soren for hijacking the thread), Reynold how much better is kryo from spark's usage point of view compared to the default java serialization (in general, not for closures) ? The numbers on kyro site are interesting, but since you have played the most with kryo

Re: [jira] [Created] (SPARK-1767) Prefer HDFS-cached replicas when scheduling data-local tasks

2014-05-15 Thread Mridul Muralidharan
Hi Sandy, I assume you are referring to caching added to datanodes via new caching api via NN ? (To preemptively mmap blocks). I have not looked in detail, but does NN tell us about this in block locations? If yes, we can simply make those process local instead of node local for executors on

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-16 Thread Mridul Muralidharan
Effectively this is persist without fault tolerance. Failure of any node means complete lack of fault tolerance. I would be very skeptical of truncating lineage if it is not reliable. On 17-May-2014 3:49 am, Xiangrui Meng (JIRA) j...@apache.org wrote: Xiangrui Meng created SPARK-1855:

Re: [VOTE] Release Apache Spark 1.0.0 (rc6)

2014-05-16 Thread Mridul Muralidharan
So was rc5 cancelled ? Did not see a note indicating that or why ... [1] - Mridul [1] could have easily missed it in the email storm though ! On Thu, May 15, 2014 at 1:32 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
I had echoed similar sentiments a while back when there was a discussion around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api changes, add missing functionality, go through a hardening release before 1.0 But the community preferred a 1.0 :-) Regards, Mridul On 17-May-2014

Re: [jira] [Created] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2014-05-17 Thread Mridul Muralidharan
I suspect this is an issue we have fixed internally here as part of a larger change - the issue we fixed was not a config issue but bugs in spark. Unfortunately we plan to contribute this as part of 1.1 Regards, Mridul On 17-May-2014 4:09 pm, sam (JIRA) j...@apache.org wrote: sam created

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
. On Sat, May 17, 2014 at 4:26 AM, Mridul Muralidharan mri...@gmail.com wrote: I had echoed similar sentiments a while back when there was a discussion around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api changes, add missing functionality, go through a hardening release

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
the discussion. Regards Mridul issue, and what I am asking, is which pending bug fixes does anyone anticipate will require breaking the public API guaranteed in rc9 On Sat, May 17, 2014 at 9:44 AM, Mridul Muralidharan mri...@gmail.com wrote: We made incompatible api changes whose impact

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
Mridul If you can tell me about specific changes in the current release candidate that occasion new arguments for why a 1.0 release is an unacceptable idea, then I'm listening. On Sat, May 17, 2014 at 11:59 AM, Mridul Muralidharan mri...@gmail.com wrote: On 17-May-2014 11:40 pm, Mark Hamstra m

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-17 Thread Mridul Muralidharan
, Andrew Ash and...@andrewash.com wrote: +1 on the next release feeling more like a 0.10 than a 1.0 On May 17, 2014 4:38 AM, Mridul Muralidharan mri...@gmail.com wrote: I had echoed similar sentiments a while back when there was a discussion around 0.10 vs 1.0 ... I would have preferred

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-18 Thread Mridul Muralidharan
:38 AM, Mridul Muralidharan mri...@gmail.com wrote: I had echoed similar sentiments a while back when there was a discussion around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api changes, add missing functionality, go through a hardening release before

Re: [VOTE] Release Apache Spark 1.0.0 (rc5)

2014-05-18 Thread Mridul Muralidharan
guaranteed 1.0.0 baseline. On Sat, May 17, 2014 at 2:05 PM, Mridul Muralidharan mri...@gmail.comwrote: I would make the case for interface stability not just api stability. Particularly given that we have significantly changed some of our interfaces, I want to ensure developers/users

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

2014-05-18 Thread Mridul Muralidharan
avoid hitting disk if we have enough memory to use. We need to investigate more to find a good solution. -Xiangrui On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan mri...@gmail.com wrote: Effectively this is persist without fault tolerance. Failure of any node means complete lack of fault

Re: Java IO Stream Corrupted - Invalid Type AC?

2014-06-18 Thread Mridul Muralidharan
On Wed, Jun 18, 2014 at 6:19 PM, Surendranauth Hiraman suren.hira...@velos.io wrote: Patrick, My team is using shuffle consolidation but not speculation. We are also using persist(DISK_ONLY) for caching. Use of shuffle consolidation is probably what is causing the issue. Would be good idea

Re: [jira] [Created] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2014-06-23 Thread Mridul Muralidharan
, Can you comment a little bit more on this issue? We are running into the same stack trace but not sure whether it is just different Spark versions on each cluster (doesn't seem likely) or a bug in Spark. Thanks. On Sat, May 17, 2014 at 4:41 AM, Mridul Muralidharan mri...@gmail.com wrote

Re: Eliminate copy while sending data : any Akka experts here ?

2014-07-01 Thread Mridul Muralidharan
the executor returns the result of a task when it's too big for akka. We were thinking of refactoring this too, as using the block manager has much higher latency than a direct TCP send. On Mon, Jun 30, 2014 at 12:13 PM, Mridul Muralidharan mri...@gmail.com wrote: Our current hack is to use

Re: Eliminate copy while sending data : any Akka experts here ?

2014-07-02 Thread Mridul Muralidharan
Hi Patrick, Please see inline. Regards, Mridul On Wed, Jul 2, 2014 at 10:52 AM, Patrick Wendell pwend...@gmail.com wrote: b) Instead of pulling this information, push it to executors as part of task submission. (What Patrick mentioned ?) (1) a.1 from above is still an issue for this. I

Re: Eliminate copy while sending data : any Akka experts here ?

2014-07-02 Thread Mridul Muralidharan
, Mridul On Tue, Jul 1, 2014 at 2:51 AM, Mridul Muralidharan mri...@gmail.com wrote: We had considered both approaches (if I understood the suggestions right) : a) Pulling only map output states for tasks which run on the reducer by modifying the Actor. (Probably along lines of what Aaron

Re: Eliminate copy while sending data : any Akka experts here ?

2014-07-03 Thread Mridul Muralidharan
On Thu, Jul 3, 2014 at 11:32 AM, Reynold Xin r...@databricks.com wrote: On Wed, Jul 2, 2014 at 3:44 AM, Mridul Muralidharan mri...@gmail.com wrote: The other thing we do need is the location of blocks. This is actually just O(n) because we just need to know where the map was run

Re: Eliminate copy while sending data : any Akka experts here ?

2014-07-04 Thread Mridul Muralidharan
= 0 using a compressed bitmap. That way we can still avoid requests for zero-sized blocks. On Thu, Jul 3, 2014 at 3:12 PM, Reynold Xin r...@databricks.com wrote: Yes, that number is likely == 0 in any real workload ... On Thu, Jul 3, 2014 at 8:01 AM, Mridul Muralidharan mri...@gmail.com

Re: on shark, is tachyon less efficient than memory_only cache strategy ?

2014-07-08 Thread Mridul Muralidharan
You are ignoring serde costs :-) - Mridul On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson ilike...@gmail.com wrote: Tachyon should only be marginally less performant than memory_only, because we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer the data over a pipe from

Unresponsive to PR/jira changes

2014-07-09 Thread Mridul Muralidharan
Hi, I noticed today that gmail has been marking most of the mails from spark github/jira I was receiving to spam folder; and I was assuming it was lull in activity due to spark summit for past few weeks ! In case I have commented on specific PR/JIRA issues and not followed up, apologies for

Re: better compression codecs for shuffle blocks?

2014-07-14 Thread Mridul Muralidharan
We tried with lower block size for lzf, but it barfed all over the place. Snappy was the way to go for our jobs. Regards, Mridul On Mon, Jul 14, 2014 at 12:31 PM, Reynold Xin r...@databricks.com wrote: Hi Spark devs, I was looking into the memory usage of shuffle and one annoying thing is

Re: -1s on pull requests?

2014-08-05 Thread Mridul Muralidharan
Just came across this mail, thanks for initiating this discussion Kay. To add; another issue which recurs is very rapid commit's: before most contributors have had a chance to even look at the changes proposed. There is not much prior discussion on the jira or pr, and the time between submitting

Re: Unit tests in 5 minutes

2014-08-09 Thread Mridul Muralidharan
Issue with supporting this imo is the fact that scala-test uses the same vm for all the tests (surefire plugin supports fork, but scala-test ignores it iirc). So different tests would initialize different spark context, and can potentially step on each others toes. Regards, Mridul On Fri, Aug

Re: is Branch-1.1 SBT build broken for yarn-alpha ?

2014-08-21 Thread Mridul Muralidharan
Weird that Patrick did not face this while creating the RC. Essentially the yarn alpha pom.xml has not been updated properly in the 1.1 branch. Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to make it same as any other pom). Regards, Mridul On Thu, Aug 21, 2014 at 5:09 AM,

Re: [VOTE] Release Apache Spark 1.1.0 (RC1)

2014-08-28 Thread Mridul Muralidharan
Is SPARK-3277 applicable to 1.1 ? If yes, until it is fixed, I am -1 on the release (I am on break, so can't verify or help fix, sorry). Regards Mridul On 28-Aug-2014 9:33 pm, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version

Re: [VOTE] Release Apache Spark 1.1.0 (RC1)

2014-08-28 Thread Mridul Muralidharan
and we'll patch it and spin a new RC. We can also update the test coverage to cover LZ4. - Patrick On Thu, Aug 28, 2014 at 9:27 AM, Mridul Muralidharan mri...@gmail.com wrote: Is SPARK-3277 applicable to 1.1 ? If yes, until it is fixed, I am -1 on the release (I am on break, so can't

Re: Breaking the previous large-scale sort record with Spark

2014-10-10 Thread Mridul Muralidharan
Brilliant stuff ! Congrats all :-) This is indeed really heartening news ! Regards, Mridul On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hi folks, I interrupt your regularly scheduled user / dev list to bring you some pretty cool news for the project, which

Re: keeping PR titles / descriptions up to date

2014-12-02 Thread Mridul Muralidharan
I second that ! Would also be great if the JIRA was updated accordingly too. Regards, Mridul On Wed, Dec 3, 2014 at 1:53 AM, Kay Ousterhout kayousterh...@gmail.com wrote: Hi all, I've noticed a bunch of times lately where a pull request changes to be pretty different from the original pull

Re: 2GB limit for partitions?

2015-02-04 Thread Mridul Muralidharan
, seems promising. thanks, Imran On Tue, Feb 3, 2015 at 7:32 PM, Mridul Muralidharan mri...@gmail.com javascript:_e(%7B%7D,'cvml','mri...@gmail.com'); wrote: That is fairly out of date (we used to run some of our jobs on it ... But that is forked off 1.1 actually). Regards Mridul

Re: Welcoming three new committers

2015-02-03 Thread Mridul Muralidharan
Congratulations ! Keep up the good work :-) Regards Mridul On Tuesday, February 3, 2015, Matei Zaharia matei.zaha...@gmail.com wrote: Hi all, The PMC recently voted to add three new committers: Cheng Lian, Joseph Bradley and Sean Owen. All three have been major contributors to Spark in

Re: 2GB limit for partitions?

2015-02-03 Thread Mridul Muralidharan
That is fairly out of date (we used to run some of our jobs on it ... But that is forked off 1.1 actually). Regards Mridul On Tuesday, February 3, 2015, Imran Rashid iras...@cloudera.com wrote: Thanks for the explanations, makes sense. For the record looks like this was worked on a while

Re: broadcast hang out

2015-03-15 Thread Mridul Muralidharan
Cross region as in different data centers ? - Mridul On Sun, Mar 15, 2015 at 8:08 PM, lonely Feb lonely8...@gmail.com wrote: Hi all, i meet up with a problem that torrent broadcast hang out in my spark cluster (1.2, standalone) , particularly serious when driver and executors are

Re: Release Scala version vs Hadoop version (was: [VOTE] Release Apache Spark 1.3.0 (RC3))

2015-03-09 Thread Mridul Muralidharan
In ideal situation, +1 on removing all vendor specific builds and making just hadoop version specific - that is what we should depend on anyway. Though I hope Sean is correct in assuming that vendor specific builds for hadoop 2.4 are just that; and not 2.4- or 2.4+ which cause incompatibilities

Re: Spark config option 'expression language' feedback request

2015-03-13 Thread Mridul Muralidharan
Let me try to rephrase my query. How can a user specify, for example, what the executor memory should be or number of cores should be. I dont want a situation where some variables can be specified using one set of idioms (from this PR for example) and another set cannot be. Regards, Mridul

Re: May we merge into branch-1.3 at this point?

2015-03-13 Thread Mridul Muralidharan
Who is managing 1.3 release ? You might want to coordinate with them before porting changes to branch. Regards Mridul On Friday, March 13, 2015, Sean Owen so...@cloudera.com wrote: Yeah, I'm guessing that is all happening quite literally as we speak. The Apache git tag is the one of

Re: enum-like types in Spark

2015-03-05 Thread Mridul Muralidharan
While I dont have any strong opinions about how we handle enum's either way in spark, I assume the discussion is targetted at (new) api being designed in spark. Rewiring what we already have exposed will lead to incompatible api change (StorageLevel for example, is in 1.0). Regards, Mridul On

Re: enum-like types in Spark

2015-03-05 Thread Mridul Muralidharan
I have a strong dislike for java enum's due to the fact that they are not stable across JVM's - if it undergoes serde, you end up with unpredictable results at times [1]. One of the reasons why we prevent enum's from being key : though it is highly possible users might depend on it internally

Re: Should we let everyone set Assignee?

2015-04-24 Thread Mridul Muralidharan
This is a great suggestion - definitely makes sense to have it. Regards, Mridul On Fri, Apr 24, 2015 at 11:08 AM, Patrick Wendell pwend...@gmail.com wrote: It's a bit of a digression - but Steve's suggestion that we have a mailing list for new issues is a great idea and we can do it easily.

Re: [discuss] ending support for Java 6?

2015-05-02 Thread Mridul Muralidharan
We could build on minimum jdk we support for testing pr's - which will automatically cause build failures in case code uses newer api ? Regards, Mridul On Fri, May 1, 2015 at 2:46 PM, Reynold Xin r...@databricks.com wrote: It's really hard to inspect API calls since none of us have the Java

Re: [discuss] ending support for Java 6?

2015-05-02 Thread Mridul Muralidharan
... ;) On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan mri...@gmail.com wrote: We could build on minimum jdk we support for testing pr's - which will automatically cause build failures in case code uses newer api ? Regards, Mridul On Fri, May 1, 2015 at 2:46 PM, Reynold Xin r

Re: Why does SortShuffleWriter write to disk always?

2015-05-02 Thread Mridul Muralidharan
I agree, this is better handled by the filesystem cache - not to mention, being able to do zero copy writes. Regards, Mridul On Sat, May 2, 2015 at 10:26 PM, Reynold Xin r...@databricks.com wrote: I've personally prototyped completely in-memory shuffle for Spark 3 times. However, it is unclear

Re: Change for submitting to yarn in 1.3.1

2015-05-11 Thread Mridul Muralidharan
That works when it is launched from same process - which is unfortunately not our case :-) - Mridul On Sun, May 10, 2015 at 9:05 PM, Manku Timma manku.tim...@gmail.com wrote: sc.applicationId gives the yarn appid. On 11 May 2015 at 08:13, Mridul Muralidharan mri...@gmail.com wrote: We had

Re: Change for submitting to yarn in 1.3.1

2015-05-10 Thread Mridul Muralidharan
We had a similar requirement, and as a stopgap, I currently use a suboptimal impl specific workaround - parsing it out of the stdout/stderr (based on log config). A better means to get to this is indeed required ! Regards, Mridul On Sun, May 10, 2015 at 7:33 PM, Ron's Yahoo!

Re: YARN mode startup takes too long (10+ secs)

2015-05-11 Thread Mridul Muralidharan
For tiny/small clusters (particularly single tenet), you can set it to lower value. But for anything reasonably large or multi-tenet, the request storm can be bad if large enough number of applications start aggressively polling RM. That is why the interval is set to configurable. - Mridul On

OT: Key types which have potential issues

2015-05-19 Thread Mridul Muralidharan
Hi, I vaguely remember issues with using float/double as keys in MR (and spark ?). But cant seem to find documentation/analysis about the same. Does anyone have some resource/link I can refer to ? Thanks, Mridul - To

Re: Increase partition count (repartition) without shuffle

2015-06-18 Thread Mridul Muralidharan
If you can scan input twice, you can of course do per partition count and build custom RDD which can reparation without shuffle. But nothing off the shelf as Sandy mentioned. Regards Mridul On Thursday, June 18, 2015, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Alexander, There is currently

Re: Data source aliasing

2015-07-30 Thread Mridul Muralidharan
Would be a good idea to generalize this for spark core - and allow for its use in serde, compression, etc. Regards, Mridul On Thu, Jul 30, 2015 at 11:33 AM, Joseph Batchik josephbatc...@gmail.com wrote: Yep I was looking into using the jar service loader. I pushed a rough draft to my fork of

Re: Spark runs into an Infinite loop even if the tasks are completed successfully

2015-08-14 Thread Mridul Muralidharan
What I understood from Imran's mail (and what was referenced in his mail) the RDD mentioned seems to be violating some basic contracts on how partitions are used in spark [1]. They cannot be arbitrarily numbered,have duplicates, etc. Extending RDD to add functionality is typically for niche

Re: Asked to remove non-existent executor exception

2015-07-26 Thread Mridul Muralidharan
Simply customize your log4j confit instead of modifying code if you don't want messages from that class. Regards Mridul On Sunday, July 26, 2015, Sea 261810...@qq.com wrote: This exception is so ugly!!! The screen is full of these information when the program runs a long time, and they

Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Mridul Muralidharan
the only thing that changed is the location of some scripts in mesos/ to amplab/). Thanks Shivaram On Mon, Jul 20, 2015 at 12:55 PM, Mridul Muralidharan mri...@gmail.com wrote: Might be a good idea to get the PMC's of both projects to sign off to prevent future issues with apache. Regards

Re: Should spark-ec2 get its own repo?

2015-07-21 Thread Mridul Muralidharan
of the Apache Mesos project. It was a remnant part of Spark from when Spark used to live at github.com/mesos/spark. Shivaram On Tue, Jul 21, 2015 at 11:03 AM, Mridul Muralidharan mri...@gmail.com wrote: If I am not wrong, since the code was hosted within mesos project repo, I assume (atleast part

Re: [discuss] Removing individual commit messages from the squash commit message

2015-07-18 Thread Mridul Muralidharan
Just to clarify, the proposal is to have a single commit msg giving the jira and pr id? That sounds like a good change to have. Regards Mridul On Saturday, July 18, 2015, Reynold Xin r...@databricks.com wrote: I took a look at the commit messages in git log -- it looks like the individual

If gmail, check sparm

2015-07-18 Thread Mridul Muralidharan
https://plus.google.com/+LinusTorvalds/posts/DiG9qANf5PA I have noticed a bunch of mails from dev@ and github going to spam - including spark maliing list. Might be a good idea for dev, committers to check if they are missing things in their spam folder if on gmail. Regards, Mridul

Re: [discuss] Removing individual commit messages from the squash commit message

2015-07-18 Thread Mridul Muralidharan
description 3. List of authors contributing to the patch The main thing that changes is 3: we used to also include the individual commits to the pull request branch that are squashed. On Sat, Jul 18, 2015 at 3:45 PM, Mridul Muralidharan mri...@gmail.com javascript:_e(%7B%7D,'cvml','mri...@gmail.com

Re: Should spark-ec2 get its own repo?

2015-07-20 Thread Mridul Muralidharan
Might be a good idea to get the PMC's of both projects to sign off to prevent future issues with apache. Regards, Mridul On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: I've created https://github.com/amplab/spark-ec2 and added an initial set of

Re: A proposal for Spark 2.0

2015-11-10 Thread Mridul Muralidharan
Would be also good to fix api breakages introduced as part of 1.0 (where there is missing functionality now), overhaul & remove all deprecated config/features/combinations, api changes that we need to make to public api which has been deferred for minor releases. Regards, Mridul On Tue, Nov 10,

Re: A proposal for Spark 2.0

2015-12-03 Thread Mridul Muralidharan
There was a proposal to make schedulers pluggable in context of adding one which leverages Apache Tez : IIRC it was a abandoned - but the jira might be a good starting point. Regards Mridul On Dec 3, 2015 2:59 PM, "Rad Gruchalski" wrote: > There was a talk in this thread

Re: Automated close of PR's ?

2015-12-30 Thread Mridul Muralidharan
ividual ones. > > > On Wednesday, December 30, 2015, Mridul Muralidharan <mri...@gmail.com> > wrote: >> >> Is there a script running to close "old" PR's ? I was not aware of any >> discussion about this in dev list. >> >> - Mridul >> >> -

Re: Automated close of PR's ?

2015-12-31 Thread Mridul Muralidharan
open by people out there anyway) > > On Thu, Dec 31, 2015 at 3:25 AM, Mridul Muralidharan <mri...@gmail.com> wrote: >> I am not sure of others, but I had a PR close from under me where >> ongoing discussion was as late as 2 weeks back. >> Given this, I assumed it was

Automated close of PR's ?

2015-12-30 Thread Mridul Muralidharan
Is there a script running to close "old" PR's ? I was not aware of any discussion about this in dev list. - Mridul - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail:

Re: Welcoming Yanbo Liang as a committer

2016-06-03 Thread Mridul Muralidharan
Congratulations Yanbo ! Regards Mridul On Friday, June 3, 2016, Matei Zaharia wrote: > Hi all, > > The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been a > super active contributor in many areas of MLlib. Please join me in > welcoming Yanbo! > >

Re: rdd.distinct with Partitioner

2016-06-08 Thread Mridul Muralidharan
The example violates the basic contract of a Partitioner. It does make sense to take Partitioner as a param to distinct - though it is fairly trivial to simulate that in user code as well ... Regards Mridul On Wednesday, June 8, 2016, 汪洋 wrote: > Hi Alexander, > > I

Re: [discuss] making SparkEnv private in Spark 2.0

2016-03-19 Thread Mridul Muralidharan
We use it in executors to get to : a) spark conf (for getting to hadoop config in map doing custom writing of side-files) b) Shuffle manager (to get shuffle reader) Not sure if there are alternative ways to get to these. Regards, Mridul On Wed, Mar 16, 2016 at 2:52 PM, Reynold Xin

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-06 Thread Mridul Muralidharan
In general, I agree - it is preferable to break backward compatibility (where unavoidable) only at major versions. Unfortunately, this usually is planned better - with earlier versions announcing intent of the change - deprecation across multiple releases, defaults changed, etc. >From the thread,

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Mridul Muralidharan
t Kafka specifically > > https://issues.apache.org/jira/browse/SPARK-13877 > > > On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan <mri...@gmail.com> wrote: >> >> I was not aware of a discussion in Dev list about this - agree with most of >> the observations. >> In add

Re: SPARK-13843 and future of streaming backends

2016-03-19 Thread Mridul Muralidharan
I was not aware of a discussion in Dev list about this - agree with most of the observations. In addition, I did not see PMC signoff on moving (sub-)modules out. Regards Mridul On Thursday, March 17, 2016, Marcelo Vanzin wrote: > Hello all, > > Recently a lot of the

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-25 Thread Mridul Muralidharan
ts to support scala 2.10 three years after they did the last > maintenance release? > > > On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mri...@gmail.com > <javascript:_e(%7B%7D,'cvml','mri...@gmail.com');>> wrote: > >> Removing compatibility (with jdk, etc

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Mridul Muralidharan
ts to support scala 2.10 three years after they did the last > maintenance release? > > > On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mri...@gmail.com > <javascript:_e(%7B%7D,'cvml','mri...@gmail.com');>> wrote: > >> Removing compatibility (with jdk, etc

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Mridul Muralidharan
Container Java version can be different from yarn Java version : we run jobs with jdk8 on jdk7 cluster without issues. Regards Mridul On Thursday, March 24, 2016, Koert Kuipers wrote: > i guess what i am saying is that in a yarn world the only hard > restrictions left are

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Mridul Muralidharan
Removing compatibility (with jdk, etc) can be done with a major release- given that 7 has been EOLed a while back and is now unsupported, we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from now). Given the functionality & performance benefits of going to jdk8, future

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-24 Thread Mridul Muralidharan
+1 Agree, dropping support for java 7 is long overdue - and 2.0 would be a logical release to do this on. Regards, Mridul On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin wrote: > About a year ago we decided to drop Java 6 support in Spark 1.5. I am > wondering if we should

Re: SPARK-13843 and future of streaming backends

2016-03-26 Thread Mridul Muralidharan
required (and this discussion is a sign that the process has not been > > conducted properly as people have concerns, me including). > > > > Thanks Mridul! > > > > Pozdrawiam, > > Jacek Laskowski > > > > https://medium.com/@jaceklaskowski/ > >

Re: Eliminating shuffle write and spill disk IO reads/writes in Spark

2016-04-01 Thread Mridul Muralidharan
I think Reynold's suggestion of using ram disk would be a good way to test if these are the bottlenecks or something else is. For most practical purposes, pointing local dir to ramdisk should effectively give you 'similar' performance as shuffling from memory. Are there concerns with taking that

Re: [DISCUSS] Removing or changing maintainer process

2016-05-19 Thread Mridul Muralidharan
+1 (binding) on removing maintainer process. I agree with your opinion of "automatic " instead of a manual list. Regards Mridul On Thursday, May 19, 2016, Matei Zaharia wrote: > Hi folks, > > Around 1.5 years ago, Spark added a maintainer process for reviewing API >

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

2016-04-15 Thread Mridul Muralidharan
On Friday, April 15, 2016, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > Yeah in support of this statement I think that my primary interest in > this Spark Extras and the good work by Luciano here is that anytime we > take bits out of a code base and “move it to GitHub” I see

Re: welcoming Burak and Holden as committers

2017-01-25 Thread Mridul Muralidharan
Congratulations and welcome Holden and Burak ! Regards, Mridul On Tue, Jan 24, 2017 at 10:13 AM, Reynold Xin wrote: > Hi all, > > Burak and Holden have recently been elected as Apache Spark committers. > > Burak has been very active in a large number of areas in Spark,

Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Mridul Muralidharan
I agree, we should not be publishing both of them. Thanks for bringing this up ! Regards, Mridul On Wed, Sep 7, 2016 at 1:29 AM, Sean Owen wrote: > It's worth calling attention to: > > https://issues.apache.org/jira/browse/SPARK-17418 >

Re: [VOTE] Release Apache Spark 2.0.1 (RC4)

2016-09-29 Thread Mridul Muralidharan
+1 Regards, Mridul On Wed, Sep 28, 2016 at 7:14 PM, Reynold Xin wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.0.1. The vote is open until Sat, Oct 1, 2016 at 20:00 PDT and passes if a > majority of at least 3+1 PMC votes are cast. > >

Edit access for spark confluence wiki

2016-10-04 Thread Mridul Muralidharan
Can someone add me to edit list for the spark wiki please ? Thanks, Mridul - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: What's the meaning when the partitions is zero?

2016-09-16 Thread Mridul Muralidharan
When numPartitions is 0, there is no data in the rdd: so getPartition is never invoked. - Mridul On Friday, September 16, 2016, WangJianfei wrote: > if so, we will get exception when the numPartitions is 0. > def getPartition(key: Any): Int = key match { >

Re: Removing published kinesis, ganglia artifacts due to license issues?

2016-09-07 Thread Mridul Muralidharan
It is good to get clarification, but the way I read it, the issue is whether we publish it as official Apache artifacts (in maven, etc). Users can of course build it directly (and we can make it easy to do so) - as they are explicitly agreeing to additional licenses. Regards Mridul On

Re: What about removing TaskContext#getPartitionId?

2017-01-14 Thread Mridul Muralidharan
Since TaskContext.getPartitionId is part of the public api, it cant be removed as user code can be depending on it (unless we go through a deprecation process for it). Regards, Mridul On Sat, Jan 14, 2017 at 2:02 AM, Jacek Laskowski wrote: > Hi, > > Just noticed that

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-04 Thread Mridul Muralidharan
Hi, https://issues.apache.org/jira/browse/SPARK-20202?jql=priority%20%3D%20Blocker%20AND%20affectedVersion%20%3D%20%222.1.1%22%20and%20project%3D%22spark%22 Indicates there is another blocker (SPARK-20197 should have come in the list too, but was marked major). Regards, Mridul On Tue, Apr 4,

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Mridul Muralidharan
Congratulations Hyukjin, Sameer ! Regards, Mridul On Mon, Aug 7, 2017 at 8:53 AM, Matei Zaharia wrote: > Hi everyone, > > The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal as > committers. Join me in congratulating both of them and thanking them for

Re: SPIP: Spark on Kubernetes

2017-08-17 Thread Mridul Muralidharan
While I definitely support the idea of Apache Spark being able to leverage kubernetes, IMO it is better for long term evolution of spark to expose appropriate SPI such that this support need not necessarily live within Apache Spark code base. It will allow for multiple backends to evolve,

Re: Should Flume integration be behind a profile?

2017-09-26 Thread Mridul Muralidharan
Sounds good to me. +1 Regards, Mridul On Tue, Sep 26, 2017 at 2:36 AM, Sean Owen wrote: > Not a big deal, but I'm wondering whether Flume integration should at least > be opt-in and behind a profile? it still sees some use (at least on our end) > but not applicable to the

Re: Should Flume integration be behind a profile?

2017-10-01 Thread Mridul Muralidharan
I agree, proposal 1 sounds better among the options. Regards, Mridul On Sun, Oct 1, 2017 at 3:50 PM, Reynold Xin wrote: > Probably should do 1, and then it is an easier transition in 3.0. > > On Sun, Oct 1, 2017 at 1:28 AM Sean Owen wrote: >> >> I

Re: Welcoming Tejas Patil as a Spark committer

2017-09-29 Thread Mridul Muralidharan
Congratulations Tejas ! Regards, Mridul On Fri, Sep 29, 2017 at 12:58 PM, Matei Zaharia wrote: > Hi all, > > The Spark PMC recently added Tejas Patil as a committer on the > project. Tejas has been contributing across several areas of Spark for > a while, focusing

Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Mridul Muralidharan
Congratulations Jerry, well deserved ! Regards, Mridul On Mon, Aug 28, 2017 at 6:28 PM, Matei Zaharia wrote: > Hi everyone, > > The PMC recently voted to add Saisai (Jerry) Shao as a committer. Saisai has > been contributing to many areas of the project for a long

Re: Publishing official docker images for KubernetesSchedulerBackend

2017-11-29 Thread Mridul Muralidharan
We do support running on Apache Mesos via docker images - so this would not be restricted to k8s. But unlike mesos support, which has other modes of running, I believe k8s support more heavily depends on availability of docker images. Regards, Mridul On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen

  1   2   3   >