from:"Tomasz Fruboes"

Re: Problem embedding GaussianMixtureModel in a closure

2016-01-04 Thread Tomasz Fruboes

Tomasz Fruboes <tomasz.frub...@ncbj.gov.pl <mailto:tomasz.frub...@ncbj.gov.pl>>: Dear All, I'm trying to implement a procedure that iteratively updates a rdd using results from GaussianMixtureModel.predictSoft. In order to avoid problems with local variable (the obtained

Problem embedding GaussianMixtureModel in a closure

2015-12-31 Thread Tomasz Fruboes

Dear All, I'm trying to implement a procedure that iteratively updates a rdd using results from GaussianMixtureModel.predictSoft. In order to avoid problems with local variable (the obtained GMM) beeing overwritten in each pass of the loop I'm doing the following:

Re: Union of many RDDs taking a long time

2015-06-29 Thread Tomasz Fruboes

Hi Matt, is there a reason you need to call coalesce every loop iteration? Most likely it forces spark to do lots of unnecessary shuffles. Also - for really large number of inputs this approach can lead to due to to many nested RDD.union calls. A safer approach is to call union from

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-21 Thread Tomasz Fruboes

file a JIRA for this? The executor should run under the user who submit a job, I think. On Wed, May 20, 2015 at 2:40 AM, Tomasz Fruboes tomasz.frub...@fuw.edu.pl wrote: Thanks for a suggestion. I have tried playing with it, sc.sparkUser() gives me expected user name, but it doesnt solve

Re: saveAsTextFile() part- files are missing

2015-05-21 Thread Tomasz Fruboes

Hi, it looks you are writing to a local filesystem. Could you try writing to a location visible by all nodes (master and workers), e.g. nfs share? HTH, Tomasz W dniu 21.05.2015 o 17:16, rroxanaioana pisze: Hello! I just started with Spark. I have an application which counts words in a

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-20 Thread Tomasz Fruboes

Cheers, Tomasz W dniu 19.05.2015 o 23:56, Davies Liu pisze: It surprises me, could you list the owner information of /mnt/lustre/bigdata/med_home/tmp/test19EE/ ? On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes tomasz.frub...@fuw.edu.pl wrote: Dear Experts, we have a spark cluster

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-20 Thread Tomasz Fruboes

are running. I couldn't find many references to this variable, but at least Yarn and Mesos take it into account when spawning executors. Chances are that standalone mode also does it. iulian On Wed, May 20, 2015 at 9:29 AM, Tomasz Fruboes tomasz.frub...@fuw.edu.pl mailto:tomasz.frub...@fuw.edu.pl wrote

Multi user setup and saving a DataFrame / RDD to a network exported file system

2015-05-19 Thread Tomasz Fruboes

Dear Experts, we have a spark cluster (standalone mode) in which master and workers are started from root account. Everything runs correctly to the point when we try doing operations such as dataFrame.select(name, age).save(ofile, parquet) or rdd.saveAsPickleFile(ofile) , where

Re: Problem embedding GaussianMixtureModel in a closure

Problem embedding GaussianMixtureModel in a closure

Re: Union of many RDDs taking a long time

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

Re: saveAsTextFile() part- files are missing

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

Re: Multi user setup and saving a DataFrame / RDD to a network exported file system

Multi user setup and saving a DataFrame / RDD to a network exported file system

8 matches

Site Navigation

Mail list logo

Footer information