Hi Jayesh,

Its not executor process. Its application( job itself) is getting called
multiple times. Like a recursion. Problem seems mainly in ZipWithIndex

Thanks
Sachit

On Tue, 13 Oct 2020, 22:40 Lalwani, Jayesh, <jlalw...@amazon.com> wrote:

> Where are you running your Spark cluster? Can you post the command line
> that you are using to run your application?
>
>
>
> Spark is designed to process a lot of data by distributing work to a
> cluster of a machines. When you submit a job, it starts executor processes
> on the cluster. So, what you are seeing is somewhat expected, (although 25
> processes on a single node seem too high)
>
>
>
> *From: *Sachit Murarka <connectsac...@gmail.com>
> *Date: *Tuesday, October 13, 2020 at 8:15 AM
> *To: *spark users <user@spark.apache.org>
> *Subject: *RE: [EXTERNAL] Multiple applications being spawned
>
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
>
>
> Adding Logs.
>
>
>
> When it launches the multiple applications , following logs get generated
> on the terminal
> Also it retries the task always:
>
> 20/10/13 12:04:30 WARN TaskSetManager: Lost task XX in stage XX (TID XX,
> executor 5): java.net.SocketException: Broken pipe (Write failed)
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
>         at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
>         at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
>         at java.io.DataOutputStream.write(DataOutputStream.java:107)
>         at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
>         at
> org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:212)
>         at
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:224)
>         at
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:224)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>         at
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
>         at
> org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:561)
>         at
> org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:346)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
>         at
> org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:195)
>
>
>
> Kind Regards,
> Sachit Murarka
>
>
>
>
>
> On Tue, Oct 13, 2020 at 4:02 PM Sachit Murarka <connectsac...@gmail.com>
> wrote:
>
> Hi Users,
>
> When action(I am using count and write) gets executed in my spark job , it
> launches many more application instances(around 25 more apps).
>
> In my spark code ,  I am running the transformations through Dataframes
> then converting dataframe to rdd then applying zipwithindex , then
> converting it back to dataframe and then applying 2 actions(Count & Write).
>
>
>
> Please note : This was working fine till the previous week, it has started
> giving this issue since yesterday.
>
>
> Could you please tell what can be the reason for this behavior?
>
>
> Kind Regards,
> Sachit Murarka
>
>

Reply via email to