You can use zipWithIndex() to get index for each record and then you can
increment by 1 for each index.
val tf=sc.textFile(test).zipWithIndex()
tf.map(s=(s[1]+1,s[0]))
Above should serve your purpose.
--
View this message in context:
How can we implement a BroadcastHashJoin for spark with python?
My SparkSQL inner joins are taking a lot of time since it is performing
ShuffledHashJoin.
Tables on which join is performed are stored as parquet files.
Please help.
Thanks and regards,
Jitesh
--
View this message in context: