Would be great if the garbage collection PR is also committed - if not
the whole thing, atleast the part to unpersist broadcast variables
explicitly would be great.
Currently we are running with a custom impl which does something
similar, and I would like to move to standard distribution for that.
of April (not too far
;) ).
TD
On Wed, Mar 19, 2014 at 5:57 PM, Mridul Muralidharan mri...@gmail.comwrote:
Would be great if the garbage collection PR is also committed - if not
the whole thing, atleast the part to unpersist broadcast variables
explicitly would be great.
Currently we
reasonably long running job (30 mins+) working on non
trivial dataset will fail due to accumulated failures in spark.
Regards,
Mridul
TD
On Tue, Mar 25, 2014 at 8:44 PM, Mridul Muralidharan mri...@gmail.comwrote:
Forgot to mention this in the earlier request for PR's
Hi,
So we are now receiving updates from three sources for each change to the PR.
While each of them handles a corner case which others might miss,
would be great if we could minimize the volume of duplicated
communication.
Regards,
Mridul
unsubscribe yourself from any of these sources, right?
- Patrick
On Sat, Mar 29, 2014 at 11:05 AM, Mridul Muralidharan
mri...@gmail.comwrote:
Hi,
So we are now receiving updates from three sources for each change to
the PR.
While each of them handles a corner case which others might miss
Hi,
We have a requirement to use a (potential) ephemeral storage, which
is not within the VM, which is strongly tied to a worker node. So
source of truth for a block would still be within spark; but to
actually do computation, we would need to copy data to external device
(where it might lie
is stored in a remote cluster or machines. And the
goal is to load the remote raw data only once?
Haoyuan
On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan mri...@gmail.com
wrote:
Hi,
We have a requirement to use a (potential) ephemeral storage, which
is not within the VM, which
An iterator does not imply data has to be memory resident.
Think merge sort output as an iterator (disk backed).
Tom is actually planning to work on something similar with me on this
hopefully this or next month.
Regards,
Mridul
On Sun, Apr 20, 2014 at 11:46 PM, Sandy Ryza
On a slightly related note (apologies Soren for hijacking the thread),
Reynold how much better is kryo from spark's usage point of view
compared to the default java serialization (in general, not for
closures) ?
The numbers on kyro site are interesting, but since you have played
the most with kryo
Hi Sandy,
I assume you are referring to caching added to datanodes via new caching
api via NN ? (To preemptively mmap blocks).
I have not looked in detail, but does NN tell us about this in block
locations?
If yes, we can simply make those process local instead of node local for
executors on
Effectively this is persist without fault tolerance.
Failure of any node means complete lack of fault tolerance.
I would be very skeptical of truncating lineage if it is not reliable.
On 17-May-2014 3:49 am, Xiangrui Meng (JIRA) j...@apache.org wrote:
Xiangrui Meng created SPARK-1855:
So was rc5 cancelled ? Did not see a note indicating that or why ... [1]
- Mridul
[1] could have easily missed it in the email storm though !
On Thu, May 15, 2014 at 1:32 AM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
I had echoed similar sentiments a while back when there was a discussion
around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api
changes, add missing functionality, go through a hardening release before
1.0
But the community preferred a 1.0 :-)
Regards,
Mridul
On 17-May-2014
I suspect this is an issue we have fixed internally here as part of a
larger change - the issue we fixed was not a config issue but bugs in spark.
Unfortunately we plan to contribute this as part of 1.1
Regards,
Mridul
On 17-May-2014 4:09 pm, sam (JIRA) j...@apache.org wrote:
sam created
.
On Sat, May 17, 2014 at 4:26 AM, Mridul Muralidharan mri...@gmail.com
wrote:
I had echoed similar sentiments a while back when there was a discussion
around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api
changes, add missing functionality, go through a hardening release
the discussion.
Regards
Mridul
issue, and what I am asking, is which pending bug fixes does anyone
anticipate will require breaking the public API guaranteed in rc9
On Sat, May 17, 2014 at 9:44 AM, Mridul Muralidharan mri...@gmail.com
wrote:
We made incompatible api changes whose impact
Mridul
If you can tell me about specific changes in the current release
candidate
that occasion new arguments for why a 1.0 release is an unacceptable idea,
then I'm listening.
On Sat, May 17, 2014 at 11:59 AM, Mridul Muralidharan mri...@gmail.com
wrote:
On 17-May-2014 11:40 pm, Mark Hamstra m
, Andrew Ash and...@andrewash.com
wrote:
+1 on the next release feeling more like a 0.10 than a 1.0
On May 17, 2014 4:38 AM, Mridul Muralidharan mri...@gmail.com
wrote:
I had echoed similar sentiments a while back when there was a
discussion
around 0.10 vs 1.0 ... I would have preferred
:38 AM, Mridul Muralidharan mri...@gmail.com
wrote:
I had echoed similar sentiments a while back when there was a
discussion
around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize
the
api
changes, add missing functionality, go through a hardening release
before
guaranteed 1.0.0 baseline.
On Sat, May 17, 2014 at 2:05 PM, Mridul Muralidharan
mri...@gmail.comwrote:
I would make the case for interface stability not just api stability.
Particularly given that we have significantly changed some of our
interfaces, I want to ensure developers/users
avoid hitting disk if we have
enough memory to use. We need to investigate more to find a good
solution. -Xiangrui
On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan mri...@gmail.com
wrote:
Effectively this is persist without fault tolerance.
Failure of any node means complete lack of fault
On Wed, Jun 18, 2014 at 6:19 PM, Surendranauth Hiraman
suren.hira...@velos.io wrote:
Patrick,
My team is using shuffle consolidation but not speculation. We are also
using persist(DISK_ONLY) for caching.
Use of shuffle consolidation is probably what is causing the issue.
Would be good idea
,
Can you comment a little bit more on this issue? We are running into the
same stack trace but not sure whether it is just different Spark versions
on each cluster (doesn't seem likely) or a bug in Spark.
Thanks.
On Sat, May 17, 2014 at 4:41 AM, Mridul Muralidharan mri...@gmail.com
wrote
the executor returns the result of a task when it's too big
for akka. We were thinking of refactoring this too, as using the block
manager has much higher latency than a direct TCP send.
On Mon, Jun 30, 2014 at 12:13 PM, Mridul Muralidharan mri...@gmail.com
wrote:
Our current hack is to use
Hi Patrick,
Please see inline.
Regards,
Mridul
On Wed, Jul 2, 2014 at 10:52 AM, Patrick Wendell pwend...@gmail.com wrote:
b) Instead of pulling this information, push it to executors as part
of task submission. (What Patrick mentioned ?)
(1) a.1 from above is still an issue for this.
I
,
Mridul
On Tue, Jul 1, 2014 at 2:51 AM, Mridul Muralidharan mri...@gmail.com
wrote:
We had considered both approaches (if I understood the suggestions right) :
a) Pulling only map output states for tasks which run on the reducer
by modifying the Actor. (Probably along lines of what Aaron
On Thu, Jul 3, 2014 at 11:32 AM, Reynold Xin r...@databricks.com wrote:
On Wed, Jul 2, 2014 at 3:44 AM, Mridul Muralidharan mri...@gmail.com
wrote:
The other thing we do need is the location of blocks. This is actually
just
O(n) because we just need to know where the map was run
= 0 using a compressed bitmap. That way we can still avoid
requests for zero-sized blocks.
On Thu, Jul 3, 2014 at 3:12 PM, Reynold Xin r...@databricks.com wrote:
Yes, that number is likely == 0 in any real workload ...
On Thu, Jul 3, 2014 at 8:01 AM, Mridul Muralidharan mri...@gmail.com
You are ignoring serde costs :-)
- Mridul
On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson ilike...@gmail.com wrote:
Tachyon should only be marginally less performant than memory_only, because
we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer
the data over a pipe from
Hi,
I noticed today that gmail has been marking most of the mails from
spark github/jira I was receiving to spam folder; and I was assuming
it was lull in activity due to spark summit for past few weeks !
In case I have commented on specific PR/JIRA issues and not followed
up, apologies for
We tried with lower block size for lzf, but it barfed all over the place.
Snappy was the way to go for our jobs.
Regards,
Mridul
On Mon, Jul 14, 2014 at 12:31 PM, Reynold Xin r...@databricks.com wrote:
Hi Spark devs,
I was looking into the memory usage of shuffle and one annoying thing is
Just came across this mail, thanks for initiating this discussion Kay.
To add; another issue which recurs is very rapid commit's: before most
contributors have had a chance to even look at the changes proposed.
There is not much prior discussion on the jira or pr, and the time
between submitting
Issue with supporting this imo is the fact that scala-test uses the
same vm for all the tests (surefire plugin supports fork, but
scala-test ignores it iirc).
So different tests would initialize different spark context, and can
potentially step on each others toes.
Regards,
Mridul
On Fri, Aug
Weird that Patrick did not face this while creating the RC.
Essentially the yarn alpha pom.xml has not been updated properly in
the 1.1 branch.
Just change version to '1.1.1-SNAPSHOT' for yarn/alpha/pom.xml (to
make it same as any other pom).
Regards,
Mridul
On Thu, Aug 21, 2014 at 5:09 AM,
Is SPARK-3277 applicable to 1.1 ?
If yes, until it is fixed, I am -1 on the release (I am on break, so can't
verify or help fix, sorry).
Regards
Mridul
On 28-Aug-2014 9:33 pm, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
and we'll patch it
and spin a new RC. We can also update the test coverage to cover LZ4.
- Patrick
On Thu, Aug 28, 2014 at 9:27 AM, Mridul Muralidharan mri...@gmail.com
wrote:
Is SPARK-3277 applicable to 1.1 ?
If yes, until it is fixed, I am -1 on the release (I am on break, so
can't
Brilliant stuff ! Congrats all :-)
This is indeed really heartening news !
Regards,
Mridul
On Fri, Oct 10, 2014 at 8:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi folks,
I interrupt your regularly scheduled user / dev list to bring you some pretty
cool news for the project, which
I second that !
Would also be great if the JIRA was updated accordingly too.
Regards,
Mridul
On Wed, Dec 3, 2014 at 1:53 AM, Kay Ousterhout kayousterh...@gmail.com wrote:
Hi all,
I've noticed a bunch of times lately where a pull request changes to be
pretty different from the original pull
, seems
promising.
thanks,
Imran
On Tue, Feb 3, 2015 at 7:32 PM, Mridul Muralidharan mri...@gmail.com
javascript:_e(%7B%7D,'cvml','mri...@gmail.com'); wrote:
That is fairly out of date (we used to run some of our jobs on it ... But
that is forked off 1.1 actually).
Regards
Mridul
Congratulations !
Keep up the good work :-)
Regards
Mridul
On Tuesday, February 3, 2015, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi all,
The PMC recently voted to add three new committers: Cheng Lian, Joseph
Bradley and Sean Owen. All three have been major contributors to Spark in
That is fairly out of date (we used to run some of our jobs on it ... But
that is forked off 1.1 actually).
Regards
Mridul
On Tuesday, February 3, 2015, Imran Rashid iras...@cloudera.com wrote:
Thanks for the explanations, makes sense. For the record looks like this
was worked on a while
Cross region as in different data centers ?
- Mridul
On Sun, Mar 15, 2015 at 8:08 PM, lonely Feb lonely8...@gmail.com wrote:
Hi all, i meet up with a problem that torrent broadcast hang out in my
spark cluster (1.2, standalone) , particularly serious when driver and
executors are
In ideal situation, +1 on removing all vendor specific builds and
making just hadoop version specific - that is what we should depend on
anyway.
Though I hope Sean is correct in assuming that vendor specific builds
for hadoop 2.4 are just that; and not 2.4- or 2.4+ which cause
incompatibilities
Let me try to rephrase my query.
How can a user specify, for example, what the executor memory should
be or number of cores should be.
I dont want a situation where some variables can be specified using
one set of idioms (from this PR for example) and another set cannot
be.
Regards,
Mridul
Who is managing 1.3 release ? You might want to coordinate with them before
porting changes to branch.
Regards
Mridul
On Friday, March 13, 2015, Sean Owen so...@cloudera.com wrote:
Yeah, I'm guessing that is all happening quite literally as we speak.
The Apache git tag is the one of
While I dont have any strong opinions about how we handle enum's
either way in spark, I assume the discussion is targetted at (new) api
being designed in spark.
Rewiring what we already have exposed will lead to incompatible api
change (StorageLevel for example, is in 1.0).
Regards,
Mridul
On
I have a strong dislike for java enum's due to the fact that they
are not stable across JVM's - if it undergoes serde, you end up with
unpredictable results at times [1].
One of the reasons why we prevent enum's from being key : though it is
highly possible users might depend on it internally
This is a great suggestion - definitely makes sense to have it.
Regards,
Mridul
On Fri, Apr 24, 2015 at 11:08 AM, Patrick Wendell pwend...@gmail.com wrote:
It's a bit of a digression - but Steve's suggestion that we have a
mailing list for new issues is a great idea and we can do it easily.
We could build on minimum jdk we support for testing pr's - which will
automatically cause build failures in case code uses newer api ?
Regards,
Mridul
On Fri, May 1, 2015 at 2:46 PM, Reynold Xin r...@databricks.com wrote:
It's really hard to inspect API calls since none of us have the Java
... ;)
On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan mri...@gmail.com
wrote:
We could build on minimum jdk we support for testing pr's - which will
automatically cause build failures in case code uses newer api ?
Regards,
Mridul
On Fri, May 1, 2015 at 2:46 PM, Reynold Xin r
I agree, this is better handled by the filesystem cache - not to
mention, being able to do zero copy writes.
Regards,
Mridul
On Sat, May 2, 2015 at 10:26 PM, Reynold Xin r...@databricks.com wrote:
I've personally prototyped completely in-memory shuffle for Spark 3 times.
However, it is unclear
That works when it is launched from same process - which is
unfortunately not our case :-)
- Mridul
On Sun, May 10, 2015 at 9:05 PM, Manku Timma manku.tim...@gmail.com wrote:
sc.applicationId gives the yarn appid.
On 11 May 2015 at 08:13, Mridul Muralidharan mri...@gmail.com wrote:
We had
We had a similar requirement, and as a stopgap, I currently use a
suboptimal impl specific workaround - parsing it out of the
stdout/stderr (based on log config).
A better means to get to this is indeed required !
Regards,
Mridul
On Sun, May 10, 2015 at 7:33 PM, Ron's Yahoo!
For tiny/small clusters (particularly single tenet), you can set it to
lower value.
But for anything reasonably large or multi-tenet, the request storm
can be bad if large enough number of applications start aggressively
polling RM.
That is why the interval is set to configurable.
- Mridul
On
Hi,
I vaguely remember issues with using float/double as keys in MR (and spark ?).
But cant seem to find documentation/analysis about the same.
Does anyone have some resource/link I can refer to ?
Thanks,
Mridul
-
To
If you can scan input twice, you can of course do per partition count and
build custom RDD which can reparation without shuffle.
But nothing off the shelf as Sandy mentioned.
Regards
Mridul
On Thursday, June 18, 2015, Sandy Ryza sandy.r...@cloudera.com wrote:
Hi Alexander,
There is currently
Would be a good idea to generalize this for spark core - and allow for
its use in serde, compression, etc.
Regards,
Mridul
On Thu, Jul 30, 2015 at 11:33 AM, Joseph Batchik
josephbatc...@gmail.com wrote:
Yep I was looking into using the jar service loader.
I pushed a rough draft to my fork of
What I understood from Imran's mail (and what was referenced in his
mail) the RDD mentioned seems to be violating some basic contracts on
how partitions are used in spark [1].
They cannot be arbitrarily numbered,have duplicates, etc.
Extending RDD to add functionality is typically for niche
Simply customize your log4j confit instead of modifying code if you don't
want messages from that class.
Regards
Mridul
On Sunday, July 26, 2015, Sea 261810...@qq.com wrote:
This exception is so ugly!!! The screen is full of these information when
the program runs a long time, and they
the only thing that changed is
the location of some scripts in mesos/ to amplab/).
Thanks
Shivaram
On Mon, Jul 20, 2015 at 12:55 PM, Mridul Muralidharan mri...@gmail.com
wrote:
Might be a good idea to get the PMC's of both projects to sign off to
prevent future issues with apache.
Regards
of the Apache Mesos
project. It was a remnant part of Spark from when Spark used to live at
github.com/mesos/spark.
Shivaram
On Tue, Jul 21, 2015 at 11:03 AM, Mridul Muralidharan mri...@gmail.com
wrote:
If I am not wrong, since the code was hosted within mesos project
repo, I assume (atleast part
Just to clarify, the proposal is to have a single commit msg giving the
jira and pr id?
That sounds like a good change to have.
Regards
Mridul
On Saturday, July 18, 2015, Reynold Xin r...@databricks.com wrote:
I took a look at the commit messages in git log -- it looks like the
individual
https://plus.google.com/+LinusTorvalds/posts/DiG9qANf5PA
I have noticed a bunch of mails from dev@ and github going to spam -
including spark maliing list.
Might be a good idea for dev, committers to check if they are missing
things in their spam folder if on gmail.
Regards,
Mridul
description
3. List of authors contributing to the patch
The main thing that changes is 3: we used to also include the individual
commits to the pull request branch that are squashed.
On Sat, Jul 18, 2015 at 3:45 PM, Mridul Muralidharan mri...@gmail.com
javascript:_e(%7B%7D,'cvml','mri...@gmail.com
Might be a good idea to get the PMC's of both projects to sign off to
prevent future issues with apache.
Regards,
Mridul
On Mon, Jul 20, 2015 at 12:01 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
I've created https://github.com/amplab/spark-ec2 and added an initial set of
Would be also good to fix api breakages introduced as part of 1.0
(where there is missing functionality now), overhaul & remove all
deprecated config/features/combinations, api changes that we need to
make to public api which has been deferred for minor releases.
Regards,
Mridul
On Tue, Nov 10,
There was a proposal to make schedulers pluggable in context of adding one
which leverages Apache Tez : IIRC it was a abandoned - but the jira might
be a good starting point.
Regards
Mridul
On Dec 3, 2015 2:59 PM, "Rad Gruchalski" wrote:
> There was a talk in this thread
ividual ones.
>
>
> On Wednesday, December 30, 2015, Mridul Muralidharan <mri...@gmail.com>
> wrote:
>>
>> Is there a script running to close "old" PR's ? I was not aware of any
>> discussion about this in dev list.
>>
>> - Mridul
>>
>> -
open by people out there anyway)
>
> On Thu, Dec 31, 2015 at 3:25 AM, Mridul Muralidharan <mri...@gmail.com> wrote:
>> I am not sure of others, but I had a PR close from under me where
>> ongoing discussion was as late as 2 weeks back.
>> Given this, I assumed it was
Is there a script running to close "old" PR's ? I was not aware of any
discussion about this in dev list.
- Mridul
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail:
Congratulations Yanbo !
Regards
Mridul
On Friday, June 3, 2016, Matei Zaharia wrote:
> Hi all,
>
> The PMC recently voted to add Yanbo Liang as a committer. Yanbo has been a
> super active contributor in many areas of MLlib. Please join me in
> welcoming Yanbo!
>
>
The example violates the basic contract of a Partitioner.
It does make sense to take Partitioner as a param to distinct - though it
is fairly trivial to simulate that in user code as well ...
Regards
Mridul
On Wednesday, June 8, 2016, 汪洋 wrote:
> Hi Alexander,
>
> I
We use it in executors to get to :
a) spark conf (for getting to hadoop config in map doing custom
writing of side-files)
b) Shuffle manager (to get shuffle reader)
Not sure if there are alternative ways to get to these.
Regards,
Mridul
On Wed, Mar 16, 2016 at 2:52 PM, Reynold Xin
In general, I agree - it is preferable to break backward compatibility
(where unavoidable) only at major versions.
Unfortunately, this usually is planned better - with earlier versions
announcing intent of the change - deprecation across multiple
releases, defaults changed, etc.
>From the thread,
t Kafka specifically
>
> https://issues.apache.org/jira/browse/SPARK-13877
>
>
> On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan <mri...@gmail.com> wrote:
>>
>> I was not aware of a discussion in Dev list about this - agree with most of
>> the observations.
>> In add
I was not aware of a discussion in Dev list about this - agree with most of
the observations.
In addition, I did not see PMC signoff on moving (sub-)modules out.
Regards
Mridul
On Thursday, March 17, 2016, Marcelo Vanzin wrote:
> Hello all,
>
> Recently a lot of the
ts to support scala 2.10 three years after they did the last
> maintenance release?
>
>
> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mri...@gmail.com
> <javascript:_e(%7B%7D,'cvml','mri...@gmail.com');>> wrote:
>
>> Removing compatibility (with jdk, etc
ts to support scala 2.10 three years after they did the last
> maintenance release?
>
>
> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mri...@gmail.com
> <javascript:_e(%7B%7D,'cvml','mri...@gmail.com');>> wrote:
>
>> Removing compatibility (with jdk, etc
Container Java version can be different from yarn Java version : we run
jobs with jdk8 on jdk7 cluster without issues.
Regards
Mridul
On Thursday, March 24, 2016, Koert Kuipers wrote:
> i guess what i am saying is that in a yarn world the only hard
> restrictions left are
Removing compatibility (with jdk, etc) can be done with a major release-
given that 7 has been EOLed a while back and is now unsupported, we have to
decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
Given the functionality & performance benefits of going to jdk8, future
+1
Agree, dropping support for java 7 is long overdue - and 2.0 would be
a logical release to do this on.
Regards,
Mridul
On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin wrote:
> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> wondering if we should
required (and this discussion is a sign that the process has not been
> > conducted properly as people have concerns, me including).
> >
> > Thanks Mridul!
> >
> > Pozdrawiam,
> > Jacek Laskowski
> >
> > https://medium.com/@jaceklaskowski/
> >
I think Reynold's suggestion of using ram disk would be a good way to
test if these are the bottlenecks or something else is.
For most practical purposes, pointing local dir to ramdisk should
effectively give you 'similar' performance as shuffling from memory.
Are there concerns with taking that
+1 (binding) on removing maintainer process.
I agree with your opinion of "automatic " instead of a manual list.
Regards
Mridul
On Thursday, May 19, 2016, Matei Zaharia wrote:
> Hi folks,
>
> Around 1.5 years ago, Spark added a maintainer process for reviewing API
>
On Friday, April 15, 2016, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:
> Yeah in support of this statement I think that my primary interest in
> this Spark Extras and the good work by Luciano here is that anytime we
> take bits out of a code base and “move it to GitHub” I see
Congratulations and welcome Holden and Burak !
Regards,
Mridul
On Tue, Jan 24, 2017 at 10:13 AM, Reynold Xin wrote:
> Hi all,
>
> Burak and Holden have recently been elected as Apache Spark committers.
>
> Burak has been very active in a large number of areas in Spark,
I agree, we should not be publishing both of them.
Thanks for bringing this up !
Regards,
Mridul
On Wed, Sep 7, 2016 at 1:29 AM, Sean Owen wrote:
> It's worth calling attention to:
>
> https://issues.apache.org/jira/browse/SPARK-17418
>
+1
Regards,
Mridul
On Wed, Sep 28, 2016 at 7:14 PM, Reynold Xin wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.1. The vote is open until Sat, Oct 1, 2016 at 20:00 PDT and passes if a
> majority of at least 3+1 PMC votes are cast.
>
>
Can someone add me to edit list for the spark wiki please ?
Thanks,
Mridul
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
When numPartitions is 0, there is no data in the rdd: so getPartition is
never invoked.
- Mridul
On Friday, September 16, 2016, WangJianfei
wrote:
> if so, we will get exception when the numPartitions is 0.
> def getPartition(key: Any): Int = key match {
>
It is good to get clarification, but the way I read it, the issue is
whether we publish it as official Apache artifacts (in maven, etc).
Users can of course build it directly (and we can make it easy to do so) -
as they are explicitly agreeing to additional licenses.
Regards
Mridul
On
Since TaskContext.getPartitionId is part of the public api, it cant be
removed as user code can be depending on it (unless we go through a
deprecation process for it).
Regards,
Mridul
On Sat, Jan 14, 2017 at 2:02 AM, Jacek Laskowski wrote:
> Hi,
>
> Just noticed that
Hi,
https://issues.apache.org/jira/browse/SPARK-20202?jql=priority%20%3D%20Blocker%20AND%20affectedVersion%20%3D%20%222.1.1%22%20and%20project%3D%22spark%22
Indicates there is another blocker (SPARK-20197 should have come in
the list too, but was marked major).
Regards,
Mridul
On Tue, Apr 4,
Congratulations Hyukjin, Sameer !
Regards,
Mridul
On Mon, Aug 7, 2017 at 8:53 AM, Matei Zaharia wrote:
> Hi everyone,
>
> The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal as
> committers. Join me in congratulating both of them and thanking them for
While I definitely support the idea of Apache Spark being able to
leverage kubernetes, IMO it is better for long term evolution of spark
to expose appropriate SPI such that this support need not necessarily
live within Apache Spark code base.
It will allow for multiple backends to evolve,
Sounds good to me.
+1
Regards,
Mridul
On Tue, Sep 26, 2017 at 2:36 AM, Sean Owen wrote:
> Not a big deal, but I'm wondering whether Flume integration should at least
> be opt-in and behind a profile? it still sees some use (at least on our end)
> but not applicable to the
I agree, proposal 1 sounds better among the options.
Regards,
Mridul
On Sun, Oct 1, 2017 at 3:50 PM, Reynold Xin wrote:
> Probably should do 1, and then it is an easier transition in 3.0.
>
> On Sun, Oct 1, 2017 at 1:28 AM Sean Owen wrote:
>>
>> I
Congratulations Tejas !
Regards,
Mridul
On Fri, Sep 29, 2017 at 12:58 PM, Matei Zaharia wrote:
> Hi all,
>
> The Spark PMC recently added Tejas Patil as a committer on the
> project. Tejas has been contributing across several areas of Spark for
> a while, focusing
Congratulations Jerry, well deserved !
Regards,
Mridul
On Mon, Aug 28, 2017 at 6:28 PM, Matei Zaharia wrote:
> Hi everyone,
>
> The PMC recently voted to add Saisai (Jerry) Shao as a committer. Saisai has
> been contributing to many areas of the project for a long
We do support running on Apache Mesos via docker images - so this
would not be restricted to k8s.
But unlike mesos support, which has other modes of running, I believe
k8s support more heavily depends on availability of docker images.
Regards,
Mridul
On Wed, Nov 29, 2017 at 8:56 AM, Sean Owen
1 - 100 of 228 matches
Mail list logo