Hi Jayesh, Its not executor process. Its application( job itself) is getting called multiple times. Like a recursion. Problem seems mainly in ZipWithIndex
Thanks Sachit On Tue, 13 Oct 2020, 22:40 Lalwani, Jayesh, <jlalw...@amazon.com> wrote: > Where are you running your Spark cluster? Can you post the command line > that you are using to run your application? > > > > Spark is designed to process a lot of data by distributing work to a > cluster of a machines. When you submit a job, it starts executor processes > on the cluster. So, what you are seeing is somewhat expected, (although 25 > processes on a single node seem too high) > > > > *From: *Sachit Murarka <connectsac...@gmail.com> > *Date: *Tuesday, October 13, 2020 at 8:15 AM > *To: *spark users <user@spark.apache.org> > *Subject: *RE: [EXTERNAL] Multiple applications being spawned > > > > *CAUTION*: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and know > the content is safe. > > > > Adding Logs. > > > > When it launches the multiple applications , following logs get generated > on the terminal > Also it retries the task always: > > 20/10/13 12:04:30 WARN TaskSetManager: Lost task XX in stage XX (TID XX, > executor 5): java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at > java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > at java.io.FilterOutputStream.write(FilterOutputStream.java:97) > at > org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:212) > at > org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:224) > at > org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:224) > at scala.collection.Iterator$class.foreach(Iterator.scala:891) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) > at > org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224) > at > org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:561) > at > org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:346) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945) > at > org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:195) > > > > Kind Regards, > Sachit Murarka > > > > > > On Tue, Oct 13, 2020 at 4:02 PM Sachit Murarka <connectsac...@gmail.com> > wrote: > > Hi Users, > > When action(I am using count and write) gets executed in my spark job , it > launches many more application instances(around 25 more apps). > > In my spark code , I am running the transformations through Dataframes > then converting dataframe to rdd then applying zipwithindex , then > converting it back to dataframe and then applying 2 actions(Count & Write). > > > > Please note : This was working fine till the previous week, it has started > giving this issue since yesterday. > > > Could you please tell what can be the reason for this behavior? > > > Kind Regards, > Sachit Murarka > >