Regression of external shuffle service spark 2.3 vs spark 2.2

2018-11-19 Thread igor.berman
Hi, any inputs will be welcome regarding below We are running with external shuffle service. Mesos cluster(1.5.1) After upgrading our production workload to spark 2.3 we started to see OOM failures of external shuffle services(running on each node). Does anybody experienced same problems? Any

Driver doesn't respect the request to abort itself by Mesos

2018-06-24 Thread igor.berman
Hi, any inputs regarding following situation will be appreciated: We are running with dynamic allocation(spark v.2.2.0), i.e. with external shuffle service with Mesos cluster(1.1.0) Sometimes due to network failures and/or order of offers excepted by different frameworks the application framework

Re: Driver aborts on Mesos when unable to connect to one of external shuffle services

2018-04-16 Thread igor.berman
Hi Szuromi, We manage external shuffle service by Marathon and not manually sometime though, eg. when adding new node to cluster there is some delay between mesos schedules tasks on some slave and marathon scheduling external shuffle service task on this node. -- Sent from:

Driver aborts on Mesos when unable to connect to one of external shuffle services

2018-04-12 Thread igor.berman
Hi, any input regarding is it expected: Driver starts and unable to connect to external shuffle service on one of the nodes(no matter what is the reason) This makes framework to go to Inactive mode in Mesos UI However it seems that driver doesn't exits and continues to execute tasks(or tries to).

Re: external shuffle service in mesos

2018-01-23 Thread igor.berman
Hi Susan, yes, agree with you regarding resource accounting. Imho, in this case shuffle service must run on node no matter what resources are available(same as we don't account for resources that "system" takes - mesos agent, OS itself and any other process that is running on same machine) One

Re: external shuffle service in mesos

2018-01-21 Thread igor.berman
Hi Susan In general I can get what I need without Marathon, with configuring external-shuffle-service with puppet/ansible/chef + maybe some alerts for checks. I mean in companies that don't have strong Devops teams and want to install services as simple as possible just by config - Marathon

external shuffle service in mesos

2018-01-20 Thread igor.berman
Hi, wanted to get some advice regarding managing external shuffle service in mesos environments In spark documentation the Marathon is mentioned, however there is very limited documentation. I've tried to search for some documentation and it's seems not too difficult to configure it under

Hive api vs Dataset api

2016-09-16 Thread igor.berman
Hi, I wanted to understand if there is any other advantage besides api syntax when using hive/table api vs. dataset api in spark sql(v2.0)? Any additional optimizations maybe? I'm most interested in parquet partitioned tables stored on s3. Is there any difference if I'm comfortable with dataset

DirectFileOutputCommiter

2016-02-22 Thread igor.berman
Hi, Wanted to understand if anybody uses DirectFileOutputCommitter or alikes especially when working with s3? I know that there is one impl in spark distro for parquet format, but not for files - why? Imho, it can bring huge performance boost. Using default FileOutputCommiter with s3 has big

our spark gotchas report while creating batch pipeline

2015-10-18 Thread igor.berman
might be somebody will find it useful goo.gl/0yfvBd -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/our-spark-gotchas-report-while-creating-batch-pipeline-tp25112.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

log4j.xml bundled in jar vs log4.properties in spark/conf

2015-07-21 Thread igor.berman
Hi, I have log4j.xml in my jar From 1.4.1 it seems that log4j.properties in spark/conf is defined first in classpath so the spark.conf/log4j.properties wins before that (in v1.3.0) log4j.xml bundled in jar defined the configuration if I manually add my jar to be strictly first in classpath(by

1.4.1 in production

2015-07-20 Thread igor.berman
Hi, do somebody already uses version 1.4.1 in production? any problems? thanks in advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-1-in-production-tp23909.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

upload to s3, UI Total Duration and Sum of Job Durations

2015-07-01 Thread igor.berman
Hi, Our job is reading files from s3, transforming/aggregating them and writing them back to s3. While investigating performance problems I've noticed that there is big difference between sum of job durations and Total duration which appears in UI After investigating it a bit the difference

spilling in-memory map of 5.1 MB to disk (272 times so far)

2015-06-26 Thread igor.berman
Hi, wanted to get some advice regarding tunning spark application I see for some of the tasks many log entries like this Executor task launch worker-38 ExternalAppendOnlyMap: Thread 239 spilling in-memory map of 5.1 MB to disk (272 times so far) (especially when inputs are considerable) I

missing part of the file while using newHadoopApi

2015-06-15 Thread igor.berman
Hi Have anyone experienced problem with uploading to s3 with s3n protocol with spark newHadoopApi, when job completes successfully(there is _SUCCESS marker), but in reality one of the parts of the file is missing ? Thanks in advance ps: we are trying s3a now(which needs upgrade to hadoop2.7)

Jobs aborted due to EventLoggingListener Filesystem closed

2015-06-08 Thread igor.berman
I'm getting sometimes errors like below spark 1.3.1 history enabled to hdfs I've found few jiras but they seems to be resolved, e.g. https://issues.apache.org/jira/browse/SPARK-1475 any ideas? 2015-06-08 08:33:06.426 ERROR LiveListenerBus: Listener EventLoggingListener threw an exception

Re: Jobs aborted due to EventLoggingListener Filesystem closed

2015-06-08 Thread igor.berman
for the sake of the history : DON'T do System.exit within spark code -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Jobs-aborted-due-to-EventLoggingListener-Filesystem-closed-tp23202p23205.html Sent from the Apache Spark User List mailing list archive at

Re: spark java.io.FileNotFoundException: /user/spark/applicationHistory/application

2015-05-29 Thread igor.berman
in yarn your executors might run on every node in your cluster, so you need to configure spark history to be on hdfs(so it will be accessible to every executor) probably you've switched from local to yarn mode when submitting -- View this message in context:

Batch aggregation by sliding window + join

2015-05-28 Thread igor.berman
Hi, I have a batch daily job that computes daily aggregate of several counters represented by some object. After daily aggregation is done, I want to compute block of 3 days aggregation(3,7,30 etc) To do so I need to add new daily aggregation to the current block and then subtract from current