ok so let me try again ;-)
I don't think that the page size calculation matters apart from hitting the
allocation limit earlier if the page size is too large.
If a task is going to need X bytes, it is going to need X bytes. In this
case, for at least one of the tasks, X >
I see what you are saying. Full stack trace:
java.io.IOException: Unable to acquire 4194304 bytes of memory
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368)
at
Hi,
Sorry to insist, anyone has any thoughts on this? Or at least someone can
point me to a documentation of DStream.compute() so I can understand when I
should return None for a batch?
Thanks
Juan
2015-09-14 20:51 GMT+02:00 Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com>:
> Hi,
>
I'm not sure the problem is quite as bad as you state. Both sampleByKey and
sampleByKeyExact are implemented using a function from
StratifiedSamplingUtils which does one of two things depending on whether
the exact implementation is needed. The exact version requires double the
number of lines of
good morning, denizens of the aether!
your hard working build system (and some associated infrastructure)
has been in need of some updates and housecleaning for quite a while
now. we will be splitting the maintenance over two mornings to
minimize impact.
here's the plan:
7am-9am wednesday,
so forcing the ShuffleMemoryManager to assume 32 cores and therefore
calculate a pagesize of 1MB passes the tests.
How can we determine the correct value to use in getPageSize rather than
Runtime.getRuntime.availableProcessors()?
On 16 September 2015 at 10:17, Pete Robbins
You should reach out to the speakers directly.
On Wed, Sep 16, 2015 at 9:52 AM, Renyi Xiong wrote:
> SparkR streaming is mentioned at about page 17 in below pdf, can anyone
> share source code? (could not find it on GitHub)
>
>
>
>
Thanks Shane and Jon for the heads up.
On Wednesday, September 16, 2015, shane knapp wrote:
> good morning, denizens of the aether!
>
> your hard working build system (and some associated infrastructure)
> has been in need of some updates and housecleaning for quite a while
How do executors communicate with the driver in Spark ? I understand that
it s done using Akka actors and messages are exchanged as
CoarseGrainedSchedulerMessage, but I'd really appreciate if someone could
explain the entire process in a bit detail.
Hi,
I want to do temporal join operation on DStream across RDDs, my question
is: Are RDDs from same DStream always computed on same worker (except
failover) ?
thanks,
Renyi.
> 630am-10am thursday, 9-24-15:
> * jenknins update to 1.629 (we're a few months behind in versions, and
> some big bugs have been fixed)
> * jenkins master and worker system package updates
> * all systems get a reboot (lots of hanging java processes have been
> building up over the months)
> *
got it, thanks a lot!
On Wed, Sep 16, 2015 at 10:14 AM, Shivaram Venkataraman <
shiva...@eecs.berkeley.edu> wrote:
> I think Hao posted a link to the source code in the description of
> https://issues.apache.org/jira/browse/SPARK-6803
>
> On Wed, Sep 16, 2015 at 10:06 AM, Reynold Xin
There was a long thread about enum's initiated by Xiangrui several months
back in which the final consensus was to use java enum's. Is that
discussion (/decision) applicable here?
2015-09-16 17:43 GMT-07:00 Ulanov, Alexander :
> Hi Joseph,
>
>
>
> Strings sounds
I've tended to use Strings. Params can be created with a validator
(isValid) which can ensure users get an immediate error if they try to pass
an unsupported String. Not as nice as compile-time errors, but easier on
the APIs.
On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang
Hi Joseph,
Strings sounds reasonable. However, there is no StringParam (only
StringArrayParam). Should I create a new param type? Also, how can the user get
all possible values of String parameter?
Best regards, Alexander
From: Joseph Bradley [mailto:jos...@databricks.com]
Sent: Wednesday,
@Alexander It's worked for us to use Param[String] directly. (I think
it's b/c String is exactly java.lang.String, rather than a Scala version of
it, so it's still Java-friendly.) In other classes, I've added a static
list (e.g., NaiveBayes.supportedModelTypes), though there isn't consistent
Just wanted to bring this email up again in case there were any thoughts.
Having all the information from the web UI accessible through a supported
json API is very important to us; are there any objections to us adding a v2
API to Spark?
Thanks!
From: Kevin Chen
Date:
We actually meet the similiar problem in a real case, see
https://issues.apache.org/jira/browse/SPARK-10474
After checking the source code, the external sort memory management strategy
seems the root cause of the issue.
Currently, we allocate the 4MB (page size) buffer as initial in the
Do we need to increment the version number if it is just strict additions?
On Wed, Sep 16, 2015 at 7:10 PM, Kevin Chen wrote:
> Just wanted to bring this email up again in case there were any thoughts.
> Having all the information from the web UI accessible through a
I think Hao posted a link to the source code in the description of
https://issues.apache.org/jira/browse/SPARK-6803
On Wed, Sep 16, 2015 at 10:06 AM, Reynold Xin wrote:
> You should reach out to the speakers directly.
>
>
> On Wed, Sep 16, 2015 at 9:52 AM, Renyi Xiong
20 matches
Mail list logo