Re: Problem embedding GaussianMixtureModel in a closure

2016-01-04 Thread Tomasz Fruboes
Tomasz Fruboes <tomasz.frub...@ncbj.gov.pl <mailto:tomasz.frub...@ncbj.gov.pl>>: Dear All, I'm trying to implement a procedure that iteratively updates a rdd using results from GaussianMixtureModel.predictSoft. In order to avoid problems with local variable (the obtained

Problem embedding GaussianMixtureModel in a closure

2015-12-31 Thread Tomasz Fruboes
Dear All, I'm trying to implement a procedure that iteratively updates a rdd using results from GaussianMixtureModel.predictSoft. In order to avoid problems with local variable (the obtained GMM) beeing overwritten in each pass of the loop I'm doing the following:

Re: Union of many RDDs taking a long time

2015-06-29 Thread Tomasz Fruboes
Hi Matt, is there a reason you need to call coalesce every loop iteration? Most likely it forces spark to do lots of unnecessary shuffles. Also - for really large number of inputs this approach can lead to due to to many nested RDD.union calls. A safer approach is to call union from

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-21 Thread Tomasz Fruboes
file a JIRA for this? The executor should run under the user who submit a job, I think. On Wed, May 20, 2015 at 2:40 AM, Tomasz Fruboes tomasz.frub...@fuw.edu.pl wrote: Thanks for a suggestion. I have tried playing with it, sc.sparkUser() gives me expected user name, but it doesnt solve

Re: saveAsTextFile() part- files are missing

2015-05-21 Thread Tomasz Fruboes
Hi, it looks you are writing to a local filesystem. Could you try writing to a location visible by all nodes (master and workers), e.g. nfs share? HTH, Tomasz W dniu 21.05.2015 o 17:16, rroxanaioana pisze: Hello! I just started with Spark. I have an application which counts words in a

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-20 Thread Tomasz Fruboes
Cheers, Tomasz W dniu 19.05.2015 o 23:56, Davies Liu pisze: It surprises me, could you list the owner information of /mnt/lustre/bigdata/med_home/tmp/test19EE/ ? On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes tomasz.frub...@fuw.edu.pl wrote: Dear Experts, we have a spark cluster

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-20 Thread Tomasz Fruboes
are running. I couldn't find many references to this variable, but at least Yarn and Mesos take it into account when spawning executors. Chances are that standalone mode also does it. iulian On Wed, May 20, 2015 at 9:29 AM, Tomasz Fruboes tomasz.frub...@fuw.edu.pl mailto:tomasz.frub...@fuw.edu.pl wrote

Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-19 Thread Tomasz Fruboes
Dear Experts, we have a spark cluster (standalone mode) in which master and workers are started from root account. Everything runs correctly to the point when we try doing operations such as dataFrame.select(name, age).save(ofile, parquet) or rdd.saveAsPickleFile(ofile) , where