Unsubscribe

2023-08-01 Thread Alex Landa
Unsubscribe

Re: Monitor Spark Applications

2019-09-15 Thread Alex Landa
o assign fixed JMX ports to driver and executors. > > @Alex, > Is there any difference in fetching data via JMX or using banzaicloud jar. > > > On Fri, 13 Sep 2019 at 10:47, Alex Landa wrote: > >> Hi, >> We are starting to use https://github.com/banzaicloud/

Re: Monitor Spark Applications

2019-09-12 Thread Alex Landa
Hi, We are starting to use https://github.com/banzaicloud/spark-metrics . Keep in mind that their solution is for Spark for K8s, to make it work for Spark on Yarn you have to copy the dependencies of the spark-metrics into Spark Jars folders on all the Spark machines (took me a while to figure).

Re: How to combine all rows into a single row in DataFrame

2019-08-19 Thread Alex Landa
Hi, It sounds similar to what we do in our application. We don't serialize every row, but instead we group first the rows into the wanted representation and then apply protobuf serialization using map and lambda. I suggest not to serialize the entire DataFrame into a single protobuf message since

Re: Spark Standalone - Failing to pass extra java options to the driver in cluster mode

2019-08-19 Thread Alex Landa
ks, > Jungtaek Lim (HeartSaVioR) > > 1. https://issues.apache.org/jira/browse/SPARK-26606 > > On Tue, Aug 20, 2019 at 3:43 AM Alex Landa wrote: > >> Hi, >> >> We are using Spark Standalone 2.4.0 in production and publishing our >> Scala app using cluster mode.

Spark Standalone - Failing to pass extra java options to the driver in cluster mode

2019-08-19 Thread Alex Landa
Hi, We are using Spark Standalone 2.4.0 in production and publishing our Scala app using cluster mode. I saw that extra java options passed to the driver don't actually pass. A submit example: *spark-submit --deploy-mode cluster --master spark://:7077 --driver-memory 512mb --conf

Re: Long-Running Spark application doesn't clean old shuffle data correctly

2019-07-23 Thread Alex Landa
apman.com > > > On Sun, Jul 21, 2019 at 12:19 AM Alex Landa wrote: > >> Thanks, >> I looked into these options, the cleaner periodic interval is set to 30 >> min by default. >> The block option for shuffle - >> *spark.cleaner.referenceTracking.blockin

Re: Long-Running Spark application doesn't clean old shuffle data correctly

2019-07-21 Thread Alex Landa
.referenceTracking > spark.cleaner.referenceTracking.blocking.shuffle > > Regards > Prathmesh Ranaut > > On Jul 21, 2019, at 11:31 AM, Alex Landa wrote: > > Hi, > > We are running a long running Spark application ( which executes lots of > quick jobs using our scheduler ) on Spark stand-a

Long-Running Spark application doesn't clean old shuffle data correctly

2019-07-21 Thread Alex Landa
Hi, We are running a long running Spark application ( which executes lots of quick jobs using our scheduler ) on Spark stand-alone cluster 2.4.0. We see that old shuffle files ( a week old for example ) are not deleted during the execution of the application, which leads to out of disk space