Re: Spark with External Shuffle Service - using saved shuffle files in the event of executor failure

2021-05-12 Thread Attila Zsolt Piros
Hello, I have answered it on the Stack Overflow. Best Regards, Attila On Wed, May 12, 2021 at 4:57 PM Chris Thomas wrote: > Hi, > > I am pretty confident I have observed Spark configured with the Shuffle > Service continuing to fetch shuffle files on a node in the event of > executor

Re: [Spark in Kubernetes] Question about running in client mode

2021-04-26 Thread Attila Zsolt Piros
Hi Shiqi, In case of client mode the driver runs locally: in the same machine, even in the same process, of the spark submit. So if the application was submitted in a running POD then the driver will be running in a POD and when outside of K8s then it will be running outside. This is why there

Re: Dynamic Allocation Backlog Property in Spark on Kubernetes

2021-04-10 Thread Attila Zsolt Piros
t; running task count (24), but each task took around 7 min to complete. > > So here again, if resources are available in quota, more parallelism can > be achieved using schedulerbacklogtimeout (say 15 mins) and speeds up the > job. > > > [image: image.png] > > Best Rega

Re: possible bug

2021-04-09 Thread Attila Zsolt Piros
e a theory on what's going on > there? I don't! > > On Fri, Apr 9, 2021 at 10:43 AM Attila Zsolt Piros < > piros.attila.zs...@gmail.com> wrote: > >> Hi! >> >> I looked into the code and find a way to improve it. &

Re: possible bug

2021-04-09 Thread Attila Zsolt Piros
Hi! I looked into the code and find a way to improve it. With the improvement your test runs just fine: Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT /_/ Using Python version 3.8.1

Re: Dynamic Allocation Backlog Property in Spark on Kubernetes

2021-04-08 Thread Attila Zsolt Piros
> So this is all about when I want to scale executors dynamically for spark > job. Is that understanding correct? > > > > In the below statement I don’t understand much about available partitions > :-( > > *pending tasks (which kinda related to the available partitions)* > > > >

Re: Dynamic Allocation Backlog Property in Spark on Kubernetes

2021-04-08 Thread Attila Zsolt Piros
Hi! For dynamic allocation you do not need to run the Spark jobs in parallel. Dynamic allocation simply means Spark scales up by requesting more executors when there are pending tasks (which kinda related to the available partitions) and scales down when the executor is idle (as within one job

Re: unit testing for spark code

2021-03-22 Thread Attila Zsolt Piros
Hi! Let me draw your attention to Holden's* spark-testing-base* project. The documentation is at https://github.com/holdenk/spark-testing-base/wiki. As I usually write test for spark internal features I haven't needed to test so high level. But I am interested about your experiences. Best

Re: Can JVisual VM monitoring tool be used to Monitor Spark Executor Memory and CPU

2021-03-21 Thread Attila Zsolt Piros
onents of a Web application. > > spark.executor.memory= Peak Execution Memory + Storage Mem + Reserved > Mem + User Memory > > = 1 Kb + Storage Mem + > 300 Mb + (4g *0.25) > > > > > > @Att

Re: Spark version verification

2021-03-21 Thread Attila Zsolt Piros
d to 3.1.1, >>>> occasionally I see this error >>>> >>>> 21/03/18 16:53:38 ERROR org.apache.spark.scheduler.AsyncEventQueue: >>>> Listener EventLoggingListener threw an exception >>>> >>>> java.util.ConcurrentModificationException >>

Re: Spark saveAsTextFile Disk Recommendation

2021-03-21 Thread Attila Zsolt Piros
Hi! I would like to reflect only to the first part of your mail: I have a large RDD dataset of around 60-70 GB which I cannot send to driver > using *collect* so first writing that to disk using *saveAsTextFile* and > then this data gets saved in the form of multiple part files on each node >

Re: Spark version verification

2021-03-20 Thread Attila Zsolt Piros
Hi! I would check out the Spark source then diff those two RCs (first just take look to the list of the changed files): $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat ... The shell scripts in the release can be checked very easily: $ git diff v3.1.1-rc1..v3.1.1-rc2 --stat | grep ".sh "

Re: Can JVisual VM monitoring tool be used to Monitor Spark Executor Memory and CPU

2021-03-20 Thread Attila Zsolt Piros
Hi Ranju! You can configure Spark's metric system. Check the *memoryMetrics.** of executor-metrics and in the component-instance-executor

Re: Coalesce vs reduce operation parameter

2021-03-20 Thread Attila Zsolt Piros
Hi! Actually *coalesce()* is usually a cheap operation as it moves some existing partitions from one node to another. So it is not a (full) shuffle. See the documentation

Re: Coalesce vs reduce operation parameter

2021-03-20 Thread Attila Zsolt Piros
Hi! Actually *coalesce()* is usually a cheap operation as it moves some existing partitions from one node to another. So it is not a (full) shuffle. See the documentation coalesce is a cheap operation as it moves some existing partitions from one node to another. So it is not a full shuffle. See

Re: How default partitioning in spark is deployed

2021-03-16 Thread Attila Zsolt Piros
titionsWithIndex( (index,itr) => { itr.foreach( e => >> println("Index : " +index +" " + e)); itr}, true).collect()* >> >> Prints the below... (index: ? the ? is actually the partition number) >> *Index : 9 Animal(4,Tiger) Index : 4 Animal(2,Elephan

Re: Spark streaming giving error for version 2.4

2021-03-16 Thread Attila Zsolt Piros
Hi! I am just guessing here (as Gabor said before we need more information / logs): But is it possible Renu that you just upgraded one single jar? Best Regards, Attila On Tue, Mar 16, 2021 at 11:31 AM Gabor Somogyi wrote: > Well, this is not much. Please provide driver and executor logs... >

Re: How default partitioning in spark is deployed

2021-03-16 Thread Attila Zsolt Piros
Hi! This is weird. The code of foreachPartition leads to ParallelCollectionRDD

Re: spark on k8s driver pod exception

2021-03-15 Thread Attila Zsolt Piros
fig to false to > change that. As the doc says: > > "spark.kubernetes.submission.waitAppCompletion" : In cluster mode, whether > to wait for the application to finish before exiting the launcher process. > When changed to false, the launcher has a "fire-and-forget" be

Fwd: compile spark 3.1.1 error

2021-03-12 Thread Attila Zsolt Piros
li 于2021年3月11日周四 下午12:04写道: > >> Maybe it is my environment cause >> >> jiahong li 于2021年3月11日周四 上午11:14写道: >> >>> it not the cause,when i set -Phadoop-2.7 instead of >>> -Dhadoop.version=2.6.0-cdh5.13.1, the same errors come out. >>> >>

Re: spark on k8s driver pod exception

2021-03-11 Thread Attila Zsolt Piros
ng the launcher process. When changed to false, the launcher has a "fire-and-forget" behavior when launching the Spark job. On Thu, Mar 11, 2021 at 10:05 PM Attila Zsolt Piros < piros.attila.zs...@gmail.com> wrote: > > For getting the logs please read Accessing Logs &

Re: spark on k8s driver pod exception

2021-03-11 Thread Attila Zsolt Piros
For getting the logs please read Accessing Logs part of the *Running Spark on Kubernetes* page. For stopping and generic management of the spark application please read the Spark Application Management

Re: [spark-core] docker-image-tool.sh question...

2021-03-10 Thread Attila Zsolt Piros
Hi Muthu! I tried and at my side it is working just fine: $ ./bin/docker-image-tool.sh -r docker.io/sample-spark -b java_image_tag=8-jre-slim -t 3.1.1 build Sending build context to Docker daemon 228.3MB Step 1/18 : ARG java_image_tag=11-jre-slim Step 2/18 : FROM openjdk:${java_image_tag}

Re: compile spark 3.1.1 error

2021-03-10 Thread Attila Zsolt Piros
cdh5.13.1 -DskipTests > same error appear. > and execute command: ps -ef |grep zinc, there is nothing containe zinc > > Attila Zsolt Piros 于2021年3月10日周三 下午6:55写道: > >> hi! >> >> Are you compiling Spark itself? >> Do you use "./build/mvn" from the project

Re: compile spark 3.1.1 error

2021-03-10 Thread Attila Zsolt Piros
hi! Are you compiling Spark itself? Do you use "./build/mvn" from the project root? If you compiled an other version of Spark before and there the scala version was different then zinc/nailgun could cached the old classes which can cause similar troubles. In that case this could help:

RE: Spark Version 3.0.1 Gui Display Query

2021-03-04 Thread Attila Zsolt Piros
Hi Ranju! I meant the event log would be very helpful for analyzing the problem at your side. The three logs together (driver, executors, event) is the best from the same run of course. I know you want check the executors tab during the job is running. And for this you do not need to

RE: Spark Version 3.0.1 Gui Display Query

2021-03-03 Thread Attila Zsolt Piros
Hi Ranju! The UI is built up from events. This is why history server able to show the state of the a finished app as those events are replayed to build a state, for details you can check web UI page and the following section too <

Re: How to control count / size of output files for

2021-02-24 Thread Attila Zsolt Piros
hi! It is because of "spark.sql.shuffle.partitions". See the value 200 in the physical plan at the rangepartitioning: scala> val df = sc.parallelize(1 to 1000, 10).toDF("v").sort("v") df: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [v: int] scala> df.explain() == Physical Plan ==

Re: K8S spark-submit Loses Successful Driver Completion

2021-02-15 Thread Attila Zsolt Piros
Hi, I am not using Airflow but I assume your application is deployed in cluster mode and in this case the class you are looking for is *org.apache.spark.deploy.k8s.submit.Client* [1]. If we are talking about the first "spark-submit" used to start the application and not "spark-submit --status"

Re: Spark Kubernetes 3.0.1 | podcreationTimeout not working

2021-02-12 Thread Attila Zsolt Piros
I believe this problem led to opening SPARK-34389 where the problem is discussed further. [1] https://issues.apache.org/jira/browse/SPARK-34389 -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To

Re: understanding spark shuffle file re-use better

2021-02-12 Thread Attila Zsolt Piros
A much better one-liner (easier to understand the UI because it will be 1 simple job with 2 stages): ``` spark.read.text("README.md").repartition(2).take(1) ``` Attila Zsolt Piros wrote > No, it won't be reused. > You should reuse the dateframe for reusing the shuffle blocks (a

Re: understanding spark shuffle file re-use better

2021-02-11 Thread Attila Zsolt Piros
No, it won't be reused. You should reuse the dateframe for reusing the shuffle blocks (and cached data). I know this because the two actions will lead to building a two separate DAGs, but I will show you a way how you could check this on your own (with a small simple spark application). For

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Attila Zsolt Piros
The new issue is https://issues.apache.org/jira/browse/SPARK-26688. On Tue, Jan 22, 2019 at 11:30 AM Attila Zsolt Piros wrote: > Hi, > > >> Is it this one: https://github.com/apache/spark/pull/23223 ? > > No. My old development was https://github.com/apache/spark/pull/210

Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-22 Thread Attila Zsolt Piros
Hi, >> Is it this one: https://github.com/apache/spark/pull/23223 ? No. My old development was https://github.com/apache/spark/pull/21068, which is closed. This would be a new improvement with a new Apache JIRA issue ( https://issues.apache.org) and with a new Github pull request. >> Can I try