from:"jitesh129"

Re: When do map how to get the line number?

2015-04-01 Thread jitesh129

You can use zipWithIndex() to get index for each record and then you can increment by 1 for each index. val tf=sc.textFile(test).zipWithIndex() tf.map(s=(s[1]+1,s[0])) Above should serve your purpose. -- View this message in context:

Broadcasting a parquet file using spark and python

2015-03-31 Thread jitesh129

How can we implement a BroadcastHashJoin for spark with python? My SparkSQL inner joins are taking a lot of time since it is performing ShuffledHashJoin. Tables on which join is performed are stored as parquet files. Please help. Thanks and regards, Jitesh -- View this message in context: