Re: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-16 Thread Pete Robbins
ok so let me try again ;-) I don't think that the page size calculation matters apart from hitting the allocation limit earlier if the page size is too large. If a task is going to need X bytes, it is going to need X bytes. In this case, for at least one of the tasks, X >

Re: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-16 Thread Pete Robbins
I see what you are saying. Full stack trace: java.io.IOException: Unable to acquire 4194304 bytes of memory at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPage(UnsafeExternalSorter.java:368) at

JobScheduler: Error generating jobs for time for custom InputDStream

2015-09-16 Thread Juan Rodríguez Hortalá
Hi, Sorry to insist, anyone has any thoughts on this? Or at least someone can point me to a documentation of DStream.compute() so I can understand when I should return None for a batch? Thanks Juan 2015-09-14 20:51 GMT+02:00 Juan Rodríguez Hortalá < juan.rodriguez.hort...@gmail.com>: > Hi, >

Re: RDD API patterns

2015-09-16 Thread robineast
I'm not sure the problem is quite as bad as you state. Both sampleByKey and sampleByKeyExact are implemented using a function from StratifiedSamplingUtils which does one of two things depending on whether the exact implementation is needed. The exact version requires double the number of lines of

JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

2015-09-16 Thread shane knapp
good morning, denizens of the aether! your hard working build system (and some associated infrastructure) has been in need of some updates and housecleaning for quite a while now. we will be splitting the maintenance over two mornings to minimize impact. here's the plan: 7am-9am wednesday,

Re: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-16 Thread Pete Robbins
so forcing the ShuffleMemoryManager to assume 32 cores and therefore calculate a pagesize of 1MB passes the tests. How can we determine the correct value to use in getPageSize rather than Runtime.getRuntime.availableProcessors()? On 16 September 2015 at 10:17, Pete Robbins

Re: SparkR streaming source code

2015-09-16 Thread Reynold Xin
You should reach out to the speakers directly. On Wed, Sep 16, 2015 at 9:52 AM, Renyi Xiong wrote: > SparkR streaming is mentioned at about page 17 in below pdf, can anyone > share source code? (could not find it on GitHub) > > > >

Re: JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

2015-09-16 Thread Reynold Xin
Thanks Shane and Jon for the heads up. On Wednesday, September 16, 2015, shane knapp wrote: > good morning, denizens of the aether! > > your hard working build system (and some associated infrastructure) > has been in need of some updates and housecleaning for quite a while

Communication between executors and drivers

2015-09-16 Thread Muhammad Haseeb Javed
How do executors communicate with the driver in Spark ? I understand that it s done using Akka actors and messages are exchanged as CoarseGrainedSchedulerMessage, but I'd really appreciate if someone could explain the entire process in a bit detail.

Spark streaming DStream state on worker

2015-09-16 Thread Renyi Xiong
Hi, I want to do temporal join operation on DStream across RDDs, my question is: Are RDDs from same DStream always computed on same worker (except failover) ? thanks, Renyi.

Re: JENKINS: downtime next week, wed and thurs mornings (9-23 and 9-24)

2015-09-16 Thread shane knapp
> 630am-10am thursday, 9-24-15: > * jenknins update to 1.629 (we're a few months behind in versions, and > some big bugs have been fixed) > * jenkins master and worker system package updates > * all systems get a reboot (lots of hanging java processes have been > building up over the months) > *

Re: SparkR streaming source code

2015-09-16 Thread Renyi Xiong
got it, thanks a lot! On Wed, Sep 16, 2015 at 10:14 AM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > I think Hao posted a link to the source code in the description of > https://issues.apache.org/jira/browse/SPARK-6803 > > On Wed, Sep 16, 2015 at 10:06 AM, Reynold Xin

Re: Enum parameter in ML

2015-09-16 Thread Stephen Boesch
There was a long thread about enum's initiated by Xiangrui several months back in which the final consensus was to use java enum's. Is that discussion (/decision) applicable here? 2015-09-16 17:43 GMT-07:00 Ulanov, Alexander : > Hi Joseph, > > > > Strings sounds

Re: Enum parameter in ML

2015-09-16 Thread Joseph Bradley
I've tended to use Strings. Params can be created with a validator (isValid) which can ensure users get an immediate error if they try to pass an unsupported String. Not as nice as compile-time errors, but easier on the APIs. On Mon, Sep 14, 2015 at 6:07 PM, Feynman Liang

RE: Enum parameter in ML

2015-09-16 Thread Ulanov, Alexander
Hi Joseph, Strings sounds reasonable. However, there is no StringParam (only StringArrayParam). Should I create a new param type? Also, how can the user get all possible values of String parameter? Best regards, Alexander From: Joseph Bradley [mailto:jos...@databricks.com] Sent: Wednesday,

Re: Enum parameter in ML

2015-09-16 Thread Joseph Bradley
@Alexander It's worked for us to use Param[String] directly. (I think it's b/c String is exactly java.lang.String, rather than a Scala version of it, so it's still Java-friendly.) In other classes, I've added a static list (e.g., NaiveBayes.supportedModelTypes), though there isn't consistent

Re: New Spark json endpoints

2015-09-16 Thread Kevin Chen
Just wanted to bring this email up again in case there were any thoughts. Having all the information from the web UI accessible through a supported json API is very important to us; are there any objections to us adding a v2 API to Spark? Thanks! From: Kevin Chen Date:

RE: Unable to acquire memory errors in HiveCompatibilitySuite

2015-09-16 Thread Cheng, Hao
We actually meet the similiar problem in a real case, see https://issues.apache.org/jira/browse/SPARK-10474 After checking the source code, the external sort memory management strategy seems the root cause of the issue. Currently, we allocate the 4MB (page size) buffer as initial in the

Re: New Spark json endpoints

2015-09-16 Thread Reynold Xin
Do we need to increment the version number if it is just strict additions? On Wed, Sep 16, 2015 at 7:10 PM, Kevin Chen wrote: > Just wanted to bring this email up again in case there were any thoughts. > Having all the information from the web UI accessible through a

Re: SparkR streaming source code

2015-09-16 Thread Shivaram Venkataraman
I think Hao posted a link to the source code in the description of https://issues.apache.org/jira/browse/SPARK-6803 On Wed, Sep 16, 2015 at 10:06 AM, Reynold Xin wrote: > You should reach out to the speakers directly. > > > On Wed, Sep 16, 2015 at 9:52 AM, Renyi Xiong