Re: Spark on Mesos - Weird behavior

2018-07-11 Thread Pavel Plotnikov
; allocation. Am I wrong? > > -Thodoris > > On 11 Jul 2018, at 17:09, Pavel Plotnikov > wrote: > > Hi, Thodoris > You can configure resources per executor and manipulate with number of > executers instead using spark.max.cores. I think

Re: Spark on Mesos - Weird behavior

2018-07-11 Thread Pavel Plotnikov
that seems that we can’t control the resource usage of an application. By > the way, we are not using dynamic allocation. > > - Thodoris > > > On 10 Jul 2018, at 14:35, Pavel Plotnikov > wrote: > > Hello Thodoris! > Have you checked this: > - does mesos cluster ha

Re: Spark on Mesos - Weird behavior

2018-07-10 Thread Pavel Plotnikov
Hello Thodoris! Have you checked this: - does mesos cluster have available resources? - if spark have waiting tasks in queue more than spark.dynamicAllocation.schedulerBacklogTimeout configuration value? - And then, have you checked that mesos send offers to spark app mesos framework at least

Re: Spark diclines mesos offers

2017-04-26 Thread Pavel Plotnikov
a/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L316 > > On Mon, Apr 24, 2017 at 4:53 AM, Pavel Plotnikov < > pavel.plotni...@team.wrike.com> wrote: > >> Hi, everyone! I run spark 2.1.0 jobs on the top of Mesos cluster in >>

Spark diclines mesos offers

2017-04-24 Thread Pavel Plotnikov
Hi, everyone! I run spark 2.1.0 jobs on the top of Mesos cluster in coarse-grained mode with dynamic resource allocation. And sometimes spark mesos scheduler declines mesos offers despite the fact that not all available resources were used (I have less workers than the possible maximum) and the

Re: Spark runs out of memory with small file

2017-02-26 Thread Pavel Plotnikov
Hi, Henry In first example the dict d always contains only one value because the_Id is same, in second case duct grows very quickly. So, I can suggest to firstly apply map function to split you file with string on rows then please make repartition and then apply custom logic Example: def

Re: Launching an Spark application in a subset of machines

2017-02-07 Thread Pavel Plotnikov
Hi, Alvaro You can create different clusters using standalone cluster manager, and than manage subset of machines through submitting application on different masters. Or you can use Mesos attributes to mark subset of workers and specify it in spark.mesos.constraints On Tue, Feb 7, 2017 at 1:21

Re: physical memory usage keep increasing for spark app on Yarn

2017-01-23 Thread Pavel Plotnikov
hen dropDF.repartition(1).write.mode(SaveMode.ErrorIfExists).parquet(targetpath) Best, On Sun, Jan 22, 2017 at 12:31 PM Yang Cao <cybea...@gmail.com> wrote: > Also, do you know why this happen? > > On 2017年1月20日, at 18:23, Pavel Plotnikov <pavel.plotni...@team.

Re: physical memory usage keep increasing for spark app on Yarn

2017-01-20 Thread Pavel Plotnikov
Hi Yang, i have faced with the same problem on Mesos and to circumvent this issue i am usually increase partition number. On last step in your code you reduce number of partitions to 1, try to set bigger value, may be it solve this problem. Cheers, Pavel On Fri, Jan 20, 2017 at 12:35 PM Yang Cao

Re: Spark partition size tuning

2016-01-26 Thread Pavel Plotnikov
Hi, May be *sc.hadoopConfiguration.setInt( "dfs.blocksize", blockSize ) *helps you Best Regards, Pavel On Tue, Jan 26, 2016 at 7:13 AM Jia Zou wrote: > Dear all, > > First to update that the local file system data partition size can be > tuned by: >

Re: Parquet write optimization by row group size config

2016-01-21 Thread Pavel Plotnikov
Franke <jornfra...@gmail.com> wrote: > What is your data size, the algorithm and the expected time? > Depending on this the group can recommend you optimizations or tell you > that the expectations are wrong > > On 20 Jan 2016, at 18:24, Pavel Plotnikov <pavel.plotni...

Re: Parquet write optimization by row group size config

2016-01-20 Thread Pavel Plotnikov
gt; On Tue, Jan 19, 2016 at 6:13 PM, Pavel Plotnikov < > pavel.plotni...@team.wrike.com> wrote: > >> Hello, >> I'm using spark on some machines in standalone mode, data storage is >> mounted on this machines via nfs. A have input data stream and when i'm >> trying

Parquet write optimization by row group size config

2016-01-19 Thread Pavel Plotnikov
Hello, I'm using spark on some machines in standalone mode, data storage is mounted on this machines via nfs. A have input data stream and when i'm trying to store all data for hour in parquet, a job executes mostly on one core and this hourly data are stored in 40- 50 minutes. It is very slow!

Re: Can I configure Spark on multiple nodes using local filesystem on each node?

2016-01-19 Thread Pavel Plotnikov
Hi, I'm using Spark in standalone mode without HDFS, and shared folder is mounted on nodes via nfs. It looks like each node write data like in local file system. Regards, Pavel On Tue, Jan 19, 2016 at 5:39 PM Jia Zou wrote: > Dear all, > > Can I configure Spark on