Re: Is this possible to do in spark ?

Mathieu Longtin Thu, 12 May 2016 06:40:42 -0700

Make a function (or lambda) that reads the text file. Make a RDD with a
list of X/Y, then map that RDD throught the file reading function. Same
with you X/Y/Z directory. You then have RDDs with the content of each file
as a record. Work with those as needed.


On Wed, May 11, 2016 at 2:36 PM Pradeep Nayak <pradeep1...@gmail.com> wrote:

> Hi -
>
> I have a very unique problem which I am trying to solve and I am not sure
> if spark would help here.
>
> I have a directory: /X/Y/a.txt and in the same structure /X/Y/Z/b.txt.
>
> a.txt contains a unique serial number, say:
> 12345
>
> and b.txt contains key value pairs.
> a,1
> b,1,
> c,0 etc.
>
> Everyday you receive data for a system Y. so there are multiple a.txt and
> b.txt for a serial number.  The serial number doesn't change and that the
> key. So there are multiple systems and the data of a whole year is
> available and its huge.
>
> I am trying to generate a report of unique serial numbers where the value
> of the option a has changed to 1 over the last few months. Lets say the
> default is 0. Also figure how many times it was toggled.
>
>
> I am not sure how to read two text files in spark at the same time and
> associated them with the serial number. Is there a way of doing this in
> place given that we know the directory structure ? OR we should be
> transforming the data anyway to solve this ?
>
-- 
Mathieu Longtin
1-514-803-8977

Re: Is this possible to do in spark ?

Reply via email to