Re: Standalone cluster node utilization

2016-07-14 Thread Jakub Stransky
uot;SELECT statmement ... Condition = '$Condition'""".stripMargin) } else { df_init }).repartition(Configuration.appPartitioning) df.persist() Seems that none of those actually work as expected. It seems that I cannot distribute the data across the cluster. Could someone

Re: Standalone cluster node utilization

2016-07-14 Thread Jakub Stransky
0.021t 6836 S* 676.7 79.4* 40:08.61 java Thanks Jakub On 14 July 2016 at 19:22, Jakub Stransky <stransky...@gmail.com> wrote: > HI Talebzadeh, > > we are using 6 worker machines - running. > > We are reading the data through sqlContext (data frame) as it is suggested > i

Re: Standalone cluster node utilization

2016-07-14 Thread Jakub Stransky
r will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 14 July 2016 at 17:18, Jakub Stransky <stransky...@gmail.com> wrote: > >> Hello, >> >> I have a spark cluster running in a single mode, master

Standalone cluster node utilization

2016-07-14 Thread Jakub Stransky
Hello, I have a spark cluster running in a single mode, master + 6 executors. My application is reading a data from database via DataFrame.read then there is a filtering of rows. After that I re-partition data and I wonder why on the executors page of the driver UI I see RDD blocks all

Re: Spark application doesn't scale to worker nodes

2016-07-05 Thread Jakub Stransky
> > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > On Tue, Jul 5, 2016 at 12:04 PM, Jakub Stransky <stransky...@gmail.com> > wro

Standalone mode resource allocation questions

2016-07-05 Thread Jakub Stransky
Hello, I went through Spark documentation and several posts from Cloudera etc and as my background is heavily on Hadoop/YARN there is a little confusion still there. Could someone more experienced clarify please? What I am trying to achieve: - Running cluster in standalone mode version 1.6.1

Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 4 July 2016 at 17:05, Jakub Stransky <stransky...@gmail.com> wrote: > >> Hi Mich, >> >> I have set up spark default configuration in conf directory >> spark-defau

Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
t; >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for

Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
t; > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any mon

Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
Hello, I have a spark cluster consisting of 4 nodes in a standalone mode, master + 3 workers nodes with configured available memory and cpus etc. I have an spark application which is essentially a MLlib pipeline for training a classifier, in this case RandomForest but could be a DecesionTree

Write RDD to Elasticsearch

2016-03-24 Thread Jakub Stransky
Hi, I am trying to write JavaPairRDD into elasticsearch 1.7 using spark 1.2.1 using elasticsearch-hadoop 2.0.2 JavaPairRDD output = ... final JobConf jc = new JobConf(output.context().hadoopConfiguration()); jc.set("mapred.output.format.class",