date:20150916

Re: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-16 Thread Pete Robbins

ok so let me try again ;-) I don't think that the page size calculation matters apart from hitting the allocation limit earlier if the page size is too large. If a task is going to need X bytes, it is going to need X bytes. In this case, for at least one of the tasks, X >

Re: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-16 Thread Pete Robbins

I see what you are saying. Full stack trace: java.io.IOException: Unable to acquire 4194304 bytes of memory at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368) at

JobScheduler: Error generating jobs for time for custom InputDStream

2015-09-16 Thread Juan Rodríguez Hortalá

Hi, Sorry to insist, anyone has any thoughts on this? Or at least someone can point me to a documentation of DStream.compute() so I can understand when I should return None for a batch? Thanks Juan 2015-09-14 20:51 GMT+02:00 Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com>: > Hi, >

Re: RDD API patterns

2015-09-16 Thread robineast

I'm not sure the problem is quite as bad as you state. Both sampleByKey and sampleByKeyExact are implemented using a function from StratifiedSamplingUtils which does one of two things depending on whether the exact implementation is needed. The exact version requires double the number of lines of

JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

2015-09-16 Thread shane knapp

good morning, denizens of the aether! your hard working build system (and some associated infrastructure) has been in need of some updates and housecleaning for quite a while now. we will be splitting the maintenance over two mornings to minimize impact. here's the plan: 7am-9am wednesday,

Re: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-16 Thread Pete Robbins

so forcing the ShuffleMemoryManager to assume 32 cores and therefore calculate a pagesize of 1MB passes the tests. How can we determine the correct value to use in getPageSize rather than Runtime.getRuntime.availableProcessors()? On 16 September 2015 at 10:17, Pete Robbins

Re: SparkR streaming source code

2015-09-16 Thread Reynold Xin

You should reach out to the speakers directly. On Wed, Sep 16, 2015 at 9:52 AM, Renyi Xiong wrote: > SparkR streaming is mentioned at about page 17 in below pdf, can anyone > share source code? (could not find it on GitHub) > > > >

Re: JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

2015-09-16 Thread Reynold Xin

Thanks Shane and Jon for the heads up. On Wednesday, September 16, 2015, shane knapp wrote: > good morning, denizens of the aether! > > your hard working build system (and some associated infrastructure) > has been in need of some updates and housecleaning for quite a while

Communication between executors and drivers

2015-09-16 Thread Muhammad Haseeb Javed

How do executors communicate with the driver in Spark ? I understand that it s done using Akka actors and messages are exchanged as CoarseGrainedSchedulerMessage, but I'd really appreciate if someone could explain the entire process in a bit detail.

Spark streaming DStream state on worker

2015-09-16 Thread Renyi Xiong

Hi, I want to do temporal join operation on DStream across RDDs, my question is: Are RDDs from same DStream always computed on same worker (except failover) ? thanks, Renyi.

Re: JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

2015-09-16 Thread shane knapp

> 630am-10am thursday, 9-24-15: > * jenknins update to 1.629 (we're a few months behind in versions, and > some big bugs have been fixed) > * jenkins master and worker system package updates > * all systems get a reboot (lots of hanging java processes have been > building up over the months) > *

Re: SparkR streaming source code

2015-09-16 Thread Renyi Xiong

got it, thanks a lot! On Wed, Sep 16, 2015 at 10:14 AM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > I think Hao posted a link to the source code in the description of > https://issues.apache.org/jira/browse/SPARK-6803 > > On Wed, Sep 16, 2015 at 10:06 AM, Reynold Xin

Re: Enum parameter in ML

2015-09-16 Thread Stephen Boesch

There was a long thread about enum's initiated by Xiangrui several months back in which the final consensus was to use java enum's. Is that discussion (/decision) applicable here? 2015-09-16 17:43 GMT-07:00 Ulanov, Alexander : > Hi Joseph, > > > > Strings sounds

Re: Enum parameter in ML

2015-09-16 Thread Joseph Bradley

I've tended to use Strings. Params can be created with a validator (isValid) which can ensure users get an immediate error if they try to pass an unsupported String. Not as nice as compile-time errors, but easier on the APIs. On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang

RE: Enum parameter in ML

2015-09-16 Thread Ulanov, Alexander

Hi Joseph, Strings sounds reasonable. However, there is no StringParam (only StringArrayParam). Should I create a new param type? Also, how can the user get all possible values of String parameter? Best regards, Alexander From: Joseph Bradley [mailto:jos...@databricks.com] Sent: Wednesday,

Re: Enum parameter in ML

2015-09-16 Thread Joseph Bradley

@Alexander It's worked for us to use Param[String] directly. (I think it's b/c String is exactly java.lang.String, rather than a Scala version of it, so it's still Java-friendly.) In other classes, I've added a static list (e.g., NaiveBayes.supportedModelTypes), though there isn't consistent

Re: New Spark json endpoints

2015-09-16 Thread Kevin Chen

Just wanted to bring this email up again in case there were any thoughts. Having all the information from the web UI accessible through a supported json API is very important to us; are there any objections to us adding a v2 API to Spark? Thanks! From: Kevin Chen Date:

RE: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-16 Thread Cheng, Hao

We actually meet the similiar problem in a real case, see https://issues.apache.org/jira/browse/SPARK-10474 After checking the source code, the external sort memory management strategy seems the root cause of the issue. Currently, we allocate the 4MB (page size) buffer as initial in the

Re: New Spark json endpoints

2015-09-16 Thread Reynold Xin

Do we need to increment the version number if it is just strict additions? On Wed, Sep 16, 2015 at 7:10 PM, Kevin Chen wrote: > Just wanted to bring this email up again in case there were any thoughts. > Having all the information from the web UI accessible through a

Re: SparkR streaming source code

2015-09-16 Thread Shivaram Venkataraman

I think Hao posted a link to the source code in the description of https://issues.apache.org/jira/browse/SPARK-6803 On Wed, Sep 16, 2015 at 10:06 AM, Reynold Xin wrote: > You should reach out to the speakers directly. > > > On Wed, Sep 16, 2015 at 9:52 AM, Renyi Xiong

Re: Unable to acquire memory errors in HiveCompatibilitySuite

Re: Unable to acquire memory errors in HiveCompatibilitySuite

JobScheduler: Error generating jobs for time for custom InputDStream

Re: RDD API patterns

JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

Re: Unable to acquire memory errors in HiveCompatibilitySuite

Re: SparkR streaming source code

Re: JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

Communication between executors and drivers

Spark streaming DStream state on worker

Re: JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

Re: SparkR streaming source code

Re: Enum parameter in ML

Re: Enum parameter in ML

RE: Enum parameter in ML

Re: Enum parameter in ML

Re: New Spark json endpoints

RE: Unable to acquire memory errors in HiveCompatibilitySuite

Re: New Spark json endpoints

Re: SparkR streaming source code

20 matches

Site Navigation

Mail list logo

Footer information