Dynamic value as the offset of lag() function

2023-05-23 Thread Nipuna Shantha
Hi all This is the sample set of data that I used for this task ,[image: image.png] My need is to pass count as the offset of lag() function. *[ lag(col(), lag(count)).over(windowspec) ]* But as the lag function expects lag(Column, Int) above code does not work. So can you guys suggest a

Re: Incremental Value dependents on another column of Data frame Spark

2023-05-23 Thread Raghavendra Ganesh
Given, you are already stating the above can be imagined as a partition, I can think of mapPartitions iterator. val inputSchema = inputDf.schema val outputRdd = inputDf.rdd.mapPartitions(rows => new SomeClass(rows)) val outputDf = sparkSession.createDataFrame(outputRdd,

Incremental Value dependents on another column of Data frame Spark

2023-05-23 Thread Nipuna Shantha
Hi all, This is the sample set of data that I used for this task [image: image.png] My expected output is as below [image: image.png] My scenario is if Type is M01 the count should be 0 and if Type is M02 it should be incremented from 1 or 0 until the sequence of M02 is finished. Imagine this

Re: Shuffle with Window().partitionBy()

2023-05-23 Thread ashok34...@yahoo.com.INVALID
Thanks great Rauf. Regards On Tuesday, 23 May 2023 at 13:18:55 BST, Rauf Khan wrote: Hi , PartitionBy() is analogous to group by, all rows  that will have the same value in the specified column will form one window.The data will be shuffled to form group. RegardsRaouf On Fri, May 12,

Re: Shuffle with Window().partitionBy()

2023-05-23 Thread Rauf Khan
Hi , PartitionBy() is analogous to group by, all rows that will have the same value in the specified column will form one window. The data will be shuffled to form group. Regards Raouf On Fri, May 12, 2023, 18:48 ashok34...@yahoo.com.INVALID wrote: > Hello, > > In Spark windowing does call

cannot load model using pyspark

2023-05-23 Thread second_co...@yahoo.com.INVALID
spark.sparkContext.textFile("s3a://a_bucket/models/random_forest_zepp/bestModel/metadata", 1).getNumPartitions() when i run above code, i get below error. Can advice how to troubleshoot? i' using spark 3.3.0. the above file path exist.