= 0 using a compressed bitmap. That way we can still avoid
requests for zero-sized blocks.
On Thu, Jul 3, 2014 at 3:12 PM, Reynold Xin r...@databricks.com wrote:
Yes, that number is likely == 0 in any real workload ...
On Thu, Jul 3, 2014 at 8:01 AM, Mridul Muralidharan mri...@gmail.com
On Thu, Jul 3, 2014 at 11:32 AM, Reynold Xin r...@databricks.com wrote:
On Wed, Jul 2, 2014 at 3:44 AM, Mridul Muralidharan mri...@gmail.com
wrote:
The other thing we do need is the location of blocks. This is actually
just
O(n) because we just need to know where the map was run
Hi Patrick,
Please see inline.
Regards,
Mridul
On Wed, Jul 2, 2014 at 10:52 AM, Patrick Wendell pwend...@gmail.com wrote:
b) Instead of pulling this information, push it to executors as part
of task submission. (What Patrick mentioned ?)
(1) a.1 from above is still an issue for this.
I
,
Mridul
On Tue, Jul 1, 2014 at 2:51 AM, Mridul Muralidharan mri...@gmail.com
wrote:
We had considered both approaches (if I understood the suggestions right) :
a) Pulling only map output states for tasks which run on the reducer
by modifying the Actor. (Probably along lines of what Aaron
the executor returns the result of a task when it's too big
for akka. We were thinking of refactoring this too, as using the block
manager has much higher latency than a direct TCP send.
On Mon, Jun 30, 2014 at 12:13 PM, Mridul Muralidharan mri...@gmail.com
wrote:
Our current hack is to use
,
Can you comment a little bit more on this issue? We are running into the
same stack trace but not sure whether it is just different Spark versions
on each cluster (doesn't seem likely) or a bug in Spark.
Thanks.
On Sat, May 17, 2014 at 4:41 AM, Mridul Muralidharan mri...@gmail.com
wrote
On Wed, Jun 18, 2014 at 6:19 PM, Surendranauth Hiraman
suren.hira...@velos.io wrote:
Patrick,
My team is using shuffle consolidation but not speculation. We are also
using persist(DISK_ONLY) for caching.
Use of shuffle consolidation is probably what is causing the issue.
Would be good idea
:38 AM, Mridul Muralidharan mri...@gmail.com
wrote:
I had echoed similar sentiments a while back when there was a
discussion
around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize
the
api
changes, add missing functionality, go through a hardening release
before
guaranteed 1.0.0 baseline.
On Sat, May 17, 2014 at 2:05 PM, Mridul Muralidharan
mri...@gmail.comwrote:
I would make the case for interface stability not just api stability.
Particularly given that we have significantly changed some of our
interfaces, I want to ensure developers/users
avoid hitting disk if we have
enough memory to use. We need to investigate more to find a good
solution. -Xiangrui
On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan mri...@gmail.com
wrote:
Effectively this is persist without fault tolerance.
Failure of any node means complete lack of fault
I had echoed similar sentiments a while back when there was a discussion
around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api
changes, add missing functionality, go through a hardening release before
1.0
But the community preferred a 1.0 :-)
Regards,
Mridul
On 17-May-2014
I suspect this is an issue we have fixed internally here as part of a
larger change - the issue we fixed was not a config issue but bugs in spark.
Unfortunately we plan to contribute this as part of 1.1
Regards,
Mridul
On 17-May-2014 4:09 pm, sam (JIRA) j...@apache.org wrote:
sam created
.
On Sat, May 17, 2014 at 4:26 AM, Mridul Muralidharan mri...@gmail.com
wrote:
I had echoed similar sentiments a while back when there was a discussion
around 0.10 vs 1.0 ... I would have preferred 0.10 to stabilize the api
changes, add missing functionality, go through a hardening release
the discussion.
Regards
Mridul
issue, and what I am asking, is which pending bug fixes does anyone
anticipate will require breaking the public API guaranteed in rc9
On Sat, May 17, 2014 at 9:44 AM, Mridul Muralidharan mri...@gmail.com
wrote:
We made incompatible api changes whose impact
Mridul
If you can tell me about specific changes in the current release
candidate
that occasion new arguments for why a 1.0 release is an unacceptable idea,
then I'm listening.
On Sat, May 17, 2014 at 11:59 AM, Mridul Muralidharan mri...@gmail.com
wrote:
On 17-May-2014 11:40 pm, Mark Hamstra m
, Andrew Ash and...@andrewash.com
wrote:
+1 on the next release feeling more like a 0.10 than a 1.0
On May 17, 2014 4:38 AM, Mridul Muralidharan mri...@gmail.com
wrote:
I had echoed similar sentiments a while back when there was a
discussion
around 0.10 vs 1.0 ... I would have preferred
Effectively this is persist without fault tolerance.
Failure of any node means complete lack of fault tolerance.
I would be very skeptical of truncating lineage if it is not reliable.
On 17-May-2014 3:49 am, Xiangrui Meng (JIRA) j...@apache.org wrote:
Xiangrui Meng created SPARK-1855:
So was rc5 cancelled ? Did not see a note indicating that or why ... [1]
- Mridul
[1] could have easily missed it in the email storm though !
On Thu, May 15, 2014 at 1:32 AM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
Hi Sandy,
I assume you are referring to caching added to datanodes via new caching
api via NN ? (To preemptively mmap blocks).
I have not looked in detail, but does NN tell us about this in block
locations?
If yes, we can simply make those process local instead of node local for
executors on
On a slightly related note (apologies Soren for hijacking the thread),
Reynold how much better is kryo from spark's usage point of view
compared to the default java serialization (in general, not for
closures) ?
The numbers on kyro site are interesting, but since you have played
the most with kryo
An iterator does not imply data has to be memory resident.
Think merge sort output as an iterator (disk backed).
Tom is actually planning to work on something similar with me on this
hopefully this or next month.
Regards,
Mridul
On Sun, Apr 20, 2014 at 11:46 PM, Sandy Ryza
Hi,
We have a requirement to use a (potential) ephemeral storage, which
is not within the VM, which is strongly tied to a worker node. So
source of truth for a block would still be within spark; but to
actually do computation, we would need to copy data to external device
(where it might lie
is stored in a remote cluster or machines. And the
goal is to load the remote raw data only once?
Haoyuan
On Sat, Apr 5, 2014 at 4:30 PM, Mridul Muralidharan mri...@gmail.com
wrote:
Hi,
We have a requirement to use a (potential) ephemeral storage, which
is not within the VM, which
Hi,
So we are now receiving updates from three sources for each change to the PR.
While each of them handles a corner case which others might miss,
would be great if we could minimize the volume of duplicated
communication.
Regards,
Mridul
unsubscribe yourself from any of these sources, right?
- Patrick
On Sat, Mar 29, 2014 at 11:05 AM, Mridul Muralidharan
mri...@gmail.comwrote:
Hi,
So we are now receiving updates from three sources for each change to
the PR.
While each of them handles a corner case which others might miss
reasonably long running job (30 mins+) working on non
trivial dataset will fail due to accumulated failures in spark.
Regards,
Mridul
TD
On Tue, Mar 25, 2014 at 8:44 PM, Mridul Muralidharan mri...@gmail.comwrote:
Forgot to mention this in the earlier request for PR's
Would be great if the garbage collection PR is also committed - if not
the whole thing, atleast the part to unpersist broadcast variables
explicitly would be great.
Currently we are running with a custom impl which does something
similar, and I would like to move to standard distribution for that.
of April (not too far
;) ).
TD
On Wed, Mar 19, 2014 at 5:57 PM, Mridul Muralidharan mri...@gmail.comwrote:
Would be great if the garbage collection PR is also committed - if not
the whole thing, atleast the part to unpersist broadcast variables
explicitly would be great.
Currently we
201 - 228 of 228 matches
Mail list logo