Re: how to disable replace HDFS checkpoint location in structured streaming in spark3.0.1

2020-10-13 Thread lec ssmi
sorry, the mail title is a little problematic. "How to disable or replace .." lec ssmi 于2020年10月14日周三 上午9:27写道: > I have written a demo using spark3.0.0, and the location where the > checkpoint file is saved has been explicitly specified like >> >>

how to disable replace HDFS checkpoint location in structured streaming in spark3.0.1

2020-10-13 Thread lec ssmi
I have written a demo using spark3.0.0, and the location where the checkpoint file is saved has been explicitly specified like > > stream.option("checkpointLocation","file:///C:\\Users\\Administrator\\ > Desktop\\test") But the app still throws an exception about the HDFS file system. Is

The equivalent of Scala mapping in Pyspark

2020-10-13 Thread Mich Talebzadeh
Hi, I generate an array of random data and create a DF in Spark scala as follows val end = start + numRows - 1 println (" starting at ID = " + start + " , ending on = " + end ) val usedFunctions = new UsedFunctions *val text = ( start to end ).map(i =>* * (* *

Re: Multiple applications being spawned

2020-10-13 Thread Sachit Murarka
Hi Jayesh, Its not executor process. Its application( job itself) is getting called multiple times. Like a recursion. Problem seems mainly in ZipWithIndex Thanks Sachit On Tue, 13 Oct 2020, 22:40 Lalwani, Jayesh, wrote: > Where are you running your Spark cluster? Can you post the command line

Re: Multiple applications being spawned

2020-10-13 Thread Lalwani, Jayesh
Where are you running your Spark cluster? Can you post the command line that you are using to run your application? Spark is designed to process a lot of data by distributing work to a cluster of a machines. When you submit a job, it starts executor processes on the cluster. So, what you are

Re: Multiple applications being spawned

2020-10-13 Thread Sachit Murarka
Adding Logs. When it launches the multiple applications , following logs get generated on the terminal Also it retries the task always: 20/10/13 12:04:30 WARN TaskSetManager: Lost task XX in stage XX (TID XX, executor 5): java.net.SocketException: Broken pipe (Write failed) at

Multiple applications being spawned

2020-10-13 Thread Sachit Murarka
Hi Users, When action(I am using count and write) gets executed in my spark job , it launches many more application instances(around 25 more apps). In my spark code , I am running the transformations through Dataframes then converting dataframe to rdd then applying zipwithindex , then

Re: [UPDATE] Apache Spark 3.1.0 Release Window

2020-10-13 Thread Michel Sumbul
I think you put Jan 2020 instead of 2021 :-) Le mar. 13 oct. 2020 à 00:51, Xiao Li a écrit : > Thank you, Dongjoon > > Xiao > > On Mon, Oct 12, 2020 at 4:19 PM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Apache Spark 3.1.0 Release Window is adjusted like the following today. >> Please check the