Tomasz Fruboes <tomasz.frub...@ncbj.gov.pl
<mailto:tomasz.frub...@ncbj.gov.pl>>:
Dear All,
I'm trying to implement a procedure that iteratively updates a rdd
using results from GaussianMixtureModel.predictSoft. In order to
avoid problems with local variable (the obtained
Dear All,
I'm trying to implement a procedure that iteratively updates a rdd
using results from GaussianMixtureModel.predictSoft. In order to avoid
problems with local variable (the obtained GMM) beeing overwritten in
each pass of the loop I'm doing the following:
Hi Matt,
is there a reason you need to call coalesce every loop iteration? Most
likely it forces spark to do lots of unnecessary shuffles. Also - for
really large number of inputs this approach can lead to due to to many
nested RDD.union calls. A safer approach is to call union from
file a JIRA for this?
The executor should run under the user who submit a job, I think.
On Wed, May 20, 2015 at 2:40 AM, Tomasz Fruboes
tomasz.frub...@fuw.edu.pl wrote:
Thanks for a suggestion. I have tried playing with it, sc.sparkUser() gives
me expected user name, but it doesnt solve
Hi,
it looks you are writing to a local filesystem. Could you try writing
to a location visible by all nodes (master and workers), e.g. nfs share?
HTH,
Tomasz
W dniu 21.05.2015 o 17:16, rroxanaioana pisze:
Hello!
I just started with Spark. I have an application which counts words in a
Cheers,
Tomasz
W dniu 19.05.2015 o 23:56, Davies Liu pisze:
It surprises me, could you list the owner information of
/mnt/lustre/bigdata/med_home/tmp/test19EE/ ?
On Tue, May 19, 2015 at 8:15 AM, Tomasz Fruboes
tomasz.frub...@fuw.edu.pl wrote:
Dear Experts,
we have a spark cluster
are running. I couldn't find many references to this variable, but at
least Yarn and Mesos take it into account when spawning executors.
Chances are that standalone mode also does it.
iulian
On Wed, May 20, 2015 at 9:29 AM, Tomasz Fruboes
tomasz.frub...@fuw.edu.pl mailto:tomasz.frub...@fuw.edu.pl wrote
Dear Experts,
we have a spark cluster (standalone mode) in which master and workers
are started from root account. Everything runs correctly to the point
when we try doing operations such as
dataFrame.select(name, age).save(ofile, parquet)
or
rdd.saveAsPickleFile(ofile)
, where