Re: PCA OutOfMemoryError

2016-01-17 Thread Bharath Ravi Kumar
he SVD of the input matrix to the first; EOF is another name for > PCA). > > This takes about 30 minutes to compute the top 20 PCs of a 46.7K-by-6.3M > dense matrix of doubles (~2 Tb), with most of the time spent on the > distributed matrix-vector multiplies. > > Best, >

Re: PCA OutOfMemoryError

2016-01-12 Thread Bharath Ravi Kumar
Any suggestion/opinion? On 12-Jan-2016 2:06 pm, "Bharath Ravi Kumar" <reachb...@gmail.com> wrote: > We're running PCA (selecting 100 principal components) on a dataset that > has ~29K columns and is 70G in size stored in ~600 parts on HDFS. The > matrix in question i

PCA OutOfMemoryError

2016-01-12 Thread Bharath Ravi Kumar
We're running PCA (selecting 100 principal components) on a dataset that has ~29K columns and is 70G in size stored in ~600 parts on HDFS. The matrix in question is mostly sparse with tens of columns populate in most rows, but a few rows with thousands of columns populated. We're running spark on

Re: Spark on Mesos / Executor Memory

2015-10-17 Thread Bharath Ravi Kumar
gher > level tool that can run your spark jobs through one mesos framework and > then you can let spark distribute the resources more effectively. > > I hope that helps! > > Tom. > > On 17 Oct 2015, at 06:47, Bharath Ravi Kumar <reachb...@gmail.com> wrote: > >

Re: Spark on Mesos / Executor Memory

2015-10-17 Thread Bharath Ravi Kumar
To be precise, the MesosExecutorBackend's Xms & Xmx equal spark.executor.memory. So there's no question of expanding or contracting the memory held by the executor. On Sat, Oct 17, 2015 at 5:38 PM, Bharath Ravi Kumar <reachb...@gmail.com> wrote: > David, Tom, > > Thanks

Re: Spark on Mesos / Executor Memory

2015-10-16 Thread Bharath Ravi Kumar
Can someone respond if you're aware of the reason for such a memory footprint? It seems unintuitive and hard to reason about. Thanks, Bharath On Thu, Oct 15, 2015 at 12:29 PM, Bharath Ravi Kumar <reachb...@gmail.com> wrote: > Resending since user@mesos bounced earlier. My apologies. &

Re: Spark on Mesos / Executor Memory

2015-10-15 Thread Bharath Ravi Kumar
Resending since user@mesos bounced earlier. My apologies. On Thu, Oct 15, 2015 at 12:19 PM, Bharath Ravi Kumar <reachb...@gmail.com> wrote: > (Reviving this thread since I ran into similar issues...) > > I'm running two spark jobs (in mesos fine grained mode), each belonging t

Re: Spark on Mesos / Executor Memory

2015-10-15 Thread Bharath Ravi Kumar
(Reviving this thread since I ran into similar issues...) I'm running two spark jobs (in mesos fine grained mode), each belonging to a different mesos role, say low and high. The low:high mesos weights are 1:10. On expected lines, I see that the low priority job occupies cluster resources to the

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-19 Thread Bharath Ravi Kumar
://spark.apache.org/docs/latest/running-on-yarn.html Then I can see exactly whats in the directory. Doug ps Sorry for the dup message Bharath and Todd, used wrong email address. On Mar 19, 2015, at 1:19 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Thanks for clarifying Todd. This may

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-18 Thread Bharath Ravi Kumar
but that was for a cloudera installation. I am not sure what the HDP version would be to put here. -Todd On Wed, Mar 18, 2015 at 12:49 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Hi Todd, Yes, those entries were present in the conf under the same SPARK_HOME that was used to run spark-submit

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-17 Thread Bharath Ravi Kumar
/conf/spark-defaults.conf file? spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041 spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 On Tue, Mar 17, 2015 at 1:04 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Still no luck running purpose-built 1.3 against HDP 2.2 after

HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-16 Thread Bharath Ravi Kumar
Hi, Trying to run spark ( 1.2.1 built for hdp 2.2) against a yarn cluster results in the AM failing to start with following error on stderr: Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher An application id was assigned to the job, but there were no logs.

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-16 Thread Bharath Ravi Kumar
2a and 2b are not required. HTH -Todd On Mon, Mar 16, 2015 at 10:13 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Hi, Trying to run spark ( 1.2.1 built for hdp 2.2) against a yarn cluster results in the AM failing to start with following error on stderr: Error: Could not find

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-16 Thread Bharath Ravi Kumar
Still no luck running purpose-built 1.3 against HDP 2.2 after following all the instructions. Anyone else faced this issue? On Mon, Mar 16, 2015 at 8:53 PM, Bharath Ravi Kumar reachb...@gmail.com wrote: Hi Todd, Thanks for the help. I'll try again after building a distribution with the 1.3

Re: ALS failure with size Integer.MAX_VALUE

2014-12-15 Thread Bharath Ravi Kumar
Ok. We'll try using it in a test cluster running 1.2. On 16-Dec-2014 1:36 am, Xiangrui Meng men...@gmail.com wrote: Unfortunately, it will depends on the Sorter API in 1.2. -Xiangrui On Mon, Dec 15, 2014 at 11:48 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Hi Xiangrui, The block size

Re: ALS failure with size Integer.MAX_VALUE

2014-12-14 Thread Bharath Ravi Kumar
On Wed, Dec 3, 2014 at 10:10 PM, Bharath Ravi Kumar reachb...@gmail.com wrote: Thanks Xiangrui. I'll try out setting a smaller number of item blocks. And yes, I've been following the JIRA for the new ALS implementation. I'll try it out when it's ready for testing. . On Wed, Dec 3, 2014 at 4

Re: ALS failure with size Integer.MAX_VALUE

2014-12-03 Thread Bharath Ravi Kumar
will try to implement in 1.3. I'll ping you when it is ready. Best, Xiangrui On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Yes, the issue appears to be due to the 2GB block size limitation. I am hence looking for (user, product) block sizing suggestions

Re: ALS failure with size Integer.MAX_VALUE

2014-12-01 Thread Bharath Ravi Kumar
a very similar use case to yours (with more constrained hardware resources) and I haven’t seen this exact problem but I’m sure we’ve seen similar issues. Please let me know if you have other questions. From: Bharath Ravi Kumar reachb...@gmail.com Date: Thursday, November 27, 2014 at 1:30 PM

Re: ALS failure with size Integer.MAX_VALUE

2014-11-28 Thread Bharath Ravi Kumar
. Thanks, Bharath On Fri, Nov 28, 2014 at 12:00 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: We're training a recommender with ALS in mllib 1.1 against a dataset of 150M users and 4.5K items, with the total number of training records being 1.2 Billion (~30GB data). The input data is spread

ALS failure with size Integer.MAX_VALUE

2014-11-27 Thread Bharath Ravi Kumar
We're training a recommender with ALS in mllib 1.1 against a dataset of 150M users and 4.5K items, with the total number of training records being 1.2 Billion (~30GB data). The input data is spread across 1200 partitions on HDFS. For the training, rank=10, and we've configured {number of user data

Re: OOM with groupBy + saveAsTextFile

2014-11-03 Thread Bharath Ravi Kumar
save every element of the RDD as one line of text. It works like TextOutputFormat in Hadoop MapReduce since that's what it uses. So you are causing it to create one big string out of each Iterable this way. On Sun, Nov 2, 2014 at 4:48 PM, Bharath Ravi Kumar reachb...@gmail.com wrote: Thanks

Re: OOM with groupBy + saveAsTextFile

2014-11-03 Thread Bharath Ravi Kumar
approach. My bad. On Mon, Nov 3, 2014 at 3:38 PM, Bharath Ravi Kumar reachb...@gmail.com wrote: The result was no different with saveAsHadoopFile. In both cases, I can see that I've misinterpreted the API docs. I'll explore the API's a bit further for ways to save the iterable as chunks rather than

Re: OOM with groupBy + saveAsTextFile

2014-11-02 Thread Bharath Ravi Kumar
attempting to create a huge array, for example, when the number of elements in the array are computed using an algorithm that computes an incorrect size.” On 2 Nov, 2014, at 12:25 pm, Bharath Ravi Kumar reachb...@gmail.com wrote: Resurfacing the thread. Oom shouldn't be the norm

OOM with groupBy + saveAsTextFile

2014-11-01 Thread Bharath Ravi Kumar
Hi, I'm trying to run groupBy(function) followed by saveAsTextFile on an RDD of count ~ 100 million. The data size is 20GB and groupBy results in an RDD of 1061 keys with values being IterableTuple4String, Integer, Double, String. The job runs on 3 hosts in a standalone setup with each host's

Re: OOM with groupBy + saveAsTextFile

2014-11-01 Thread Bharath Ravi Kumar
Minor clarification: I'm running spark 1.1.0 on JDK 1.8, Linux 64 bit. On Sun, Nov 2, 2014 at 1:06 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Hi, I'm trying to run groupBy(function) followed by saveAsTextFile on an RDD of count ~ 100 million. The data size is 20GB and groupBy results

Re: OOM with groupBy + saveAsTextFile

2014-11-01 Thread Bharath Ravi Kumar
Resurfacing the thread. Oom shouldn't be the norm for a common groupby / sort use case in a framework that is leading in sorting bench marks? Or is there something fundamentally wrong in the usage? On 02-Nov-2014 1:06 am, Bharath Ravi Kumar reachb...@gmail.com wrote: Hi, I'm trying to run

Re: OOM writing out sorted RDD

2014-08-10 Thread Bharath Ravi Kumar
Update: as expected, switching to kryo merely delays the inevitable. Does anyone have experience controlling memory consumption while processing (e.g. writing out) imbalanced partitions? On 09-Aug-2014 10:41 am, Bharath Ravi Kumar reachb...@gmail.com wrote: Our prototype application reads a 20GB

OOM writing out sorted RDD

2014-08-08 Thread Bharath Ravi Kumar
Our prototype application reads a 20GB dataset from HDFS (nearly 180 partitions), groups it by key, sorts by rank and write out to HDFS in that order. The job runs against two nodes (16G, 24 cores per node available to the job). I noticed that the execution plan results in two sortByKey stages,

Implementing percentile through top Vs take

2014-07-30 Thread Bharath Ravi Kumar
I'm looking to select the top n records (by rank) from a data set of a few hundred GB's. My understanding is that JavaRDD.top(n, comparator) is entirely a driver-side operation in that all records are sorted in the driver's memory. I prefer an approach where the records are sorted on the cluster

Re: Hadoop client protocol mismatch with spark 1.0.1, cdh3u5

2014-07-25 Thread Bharath Ravi Kumar
Any suggestions to work around this issue ? The pre built spark binaries don't appear to work against cdh as documented, unless there's a build issue, which seems unlikely. On 25-Jul-2014 3:42 pm, Bharath Ravi Kumar reachb...@gmail.com wrote: I'm encountering a hadoop client protocol mismatch

Re: Hadoop client protocol mismatch with spark 1.0.1, cdh3u5

2014-07-25 Thread Bharath Ravi Kumar
to your build in your app? On Fri, Jul 25, 2014 at 4:32 PM, Bharath Ravi Kumar reachb...@gmail.com wrote: Any suggestions to work around this issue ? The pre built spark binaries don't appear to work against cdh as documented, unless there's a build issue, which seems unlikely. On 25-Jul

Re: Hadoop client protocol mismatch with spark 1.0.1, cdh3u5

2014-07-25 Thread Bharath Ravi Kumar
custom Spark and depending on it is a different thing from depending on plain Spark and changing its deps. I think you want the latter. On Fri, Jul 25, 2014 at 5:46 PM, Bharath Ravi Kumar reachb...@gmail.com wrote: Thanks for responding. I used the pre built spark binaries meant

Re: Execution stalls in LogisticRegressionWithSGD

2014-07-02 Thread Bharath Ravi Kumar
PROCESS_LOCAL slave2 2014/07/02 16:01:28 33 s 99 ms Any pointers / diagnosis please? On Thu, Jun 19, 2014 at 10:03 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Thanks. I'll await the fix to re-run my test. On Thu, Jun 19, 2014 at 8:28 AM, Xiangrui Meng men...@gmail.com

Re: Execution stalls in LogisticRegressionWithSGD

2014-06-18 Thread Bharath Ravi Kumar
, Bharath Ravi Kumar reachb...@gmail.com wrote: Couple more points: 1)The inexplicable stalling of execution with large feature sets appears similar to that reported with the news-20 dataset: http://mail-archives.apache.org/mod_mbox/spark-user/201406.mbox/%3c53a03542.1010...@gmail.com%3E

Execution stalls in LogisticRegressionWithSGD

2014-06-17 Thread Bharath Ravi Kumar
Hi, (Apologies for the long mail, but it's necessary to provide sufficient details considering the number of issues faced.) I'm running into issues testing LogisticRegressionWithSGD a two node cluster (each node with 24 cores and 16G available to slaves out of 24G on the system). Here's a

Re: Execution stalls in LogisticRegressionWithSGD

2014-06-17 Thread Bharath Ravi Kumar
Hi Xiangrui , I'm using 1.0.0. Thanks, Bharath On 18-Jun-2014 1:43 am, Xiangrui Meng men...@gmail.com wrote: Hi Bharath, Thanks for posting the details! Which Spark version are you using? Best, Xiangrui On Tue, Jun 17, 2014 at 6:48 AM, Bharath Ravi Kumar reachb...@gmail.com wrote

Re: Execution stalls in LogisticRegressionWithSGD

2014-06-17 Thread Bharath Ravi Kumar
, Long, Integer, Integer into a JavaPairRDDTuple2Long,Long, Tuple2Integer,Integer is unrelated to mllib. Thanks, Bharath On Wed, Jun 18, 2014 at 7:14 AM, Bharath Ravi Kumar reachb...@gmail.com wrote: Hi Xiangrui , I'm using 1.0.0. Thanks, Bharath On 18-Jun-2014 1:43 am, Xiangrui Meng men

Standalone client failing with docker deployed cluster

2014-05-16 Thread Bharath Ravi Kumar
Hi, I'm running the spark server with a single worker on a laptop using the docker images. The spark shell examples run fine with this setup. However, a standalone java client that tries to run wordcount on a local files (1 MB in size), the execution fails with the following error on the stdout