Environment: Apache Spark 1.6.2 Scala: 2.10 I am currently using the spark-csv package courtesy of databricks and I would like to have a (pre processing ?) stage when reading the CSV file that also adds a row number to each row of data being read from the csv file. This will allow for better traceability and data lineage in case of validation or data processing issues downstream.
In doing the research it seems like the zipWithIndex API is the right or only way to get this pattern implemented. Would this be the preferred route? Would this be safe for parallel operations as far as respect no collisions? Any body have a similar requirement and have a better solution you can point me to. Appreciate any help and responses anyone can offer. Thanks -a -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/CSV-Reader-with-row-numbers-tp18946.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org