Hi Rahul,

I'll just copy & paste your question here to aid with context, and
reply afterwards.

-----

Can I write the RDD data in excel file along with mapping in
apache-spark? Is that a correct way? Isn't that a writing will be a
local function and can't be passed over the clusters??

Below is given the python code(Its just an example to clarify my
question, i understand that this implementation may not be actually
required):

import xlsxwriter
import sys
import math
from pyspark import SparkContext

# get the spark context in sc.

workbook = xlsxwriter.Workbook('output_excel.xlsx')
worksheet = workbook.add_worksheet()

data = sc.textFile("xyz.txt")
# xyz.txt is a file whose each line contains string delimited by <SPACE>

row=0

def mapperFunc(x):
    for i in range(0,4):
        worksheet.write(row, i , x.split(" ")[i])
    row++
    return len(x.split())

data2 = data.map(mapperFunc)

workbook.close()

There are 2 questioms:

Is using row in 'mapperFunc' like this is a correct way? Will it
increment row each time?
Is writing in the excel file using worksheet.write() in side the
mapper function a correct way?

Also If #2 is correct then plz clarify the doubt that I am thinking
the worksheet is created in local machine then how does it work?
-----


On Fri, May 30, 2014 at 6:55 AM, Rahul Bhojwani
<rahulbhojwani2...@gmail.com> wrote:
> Hi,
>
> I recently posted a question on stackoverflow but didn't get any reply. I
> joined the mailing list now. Can anyone of you guide me a way for the
> problem mentioned in
>
> http://stackoverflow.com/questions/23923966/writing-the-rdd-data-in-excel-file-along-mapping-in-apache-spark
>
> Thanks in advance
>
> --
> Rahul K Bhojwani
> 3rd Year B.Tech
> Computer Science and Engineering
> National Institute of Technology, Karnataka



-- 
Marcelo

Reply via email to