Hi Rahul, I'll just copy & paste your question here to aid with context, and reply afterwards.
----- Can I write the RDD data in excel file along with mapping in apache-spark? Is that a correct way? Isn't that a writing will be a local function and can't be passed over the clusters?? Below is given the python code(Its just an example to clarify my question, i understand that this implementation may not be actually required): import xlsxwriter import sys import math from pyspark import SparkContext # get the spark context in sc. workbook = xlsxwriter.Workbook('output_excel.xlsx') worksheet = workbook.add_worksheet() data = sc.textFile("xyz.txt") # xyz.txt is a file whose each line contains string delimited by <SPACE> row=0 def mapperFunc(x): for i in range(0,4): worksheet.write(row, i , x.split(" ")[i]) row++ return len(x.split()) data2 = data.map(mapperFunc) workbook.close() There are 2 questioms: Is using row in 'mapperFunc' like this is a correct way? Will it increment row each time? Is writing in the excel file using worksheet.write() in side the mapper function a correct way? Also If #2 is correct then plz clarify the doubt that I am thinking the worksheet is created in local machine then how does it work? ----- On Fri, May 30, 2014 at 6:55 AM, Rahul Bhojwani <rahulbhojwani2...@gmail.com> wrote: > Hi, > > I recently posted a question on stackoverflow but didn't get any reply. I > joined the mailing list now. Can anyone of you guide me a way for the > problem mentioned in > > http://stackoverflow.com/questions/23923966/writing-the-rdd-data-in-excel-file-along-mapping-in-apache-spark > > Thanks in advance > > -- > Rahul K Bhojwani > 3rd Year B.Tech > Computer Science and Engineering > National Institute of Technology, Karnataka -- Marcelo