Thanks for the pointer to that thread. Looks like there is some demand for this 
capability, but not a lot yet. Also doesn't look like there is an easy answer 
right now.

Thanks,
Ron


From: Victor Tso-Guillen [mailto:v...@paxata.com]
Sent: Wednesday, September 03, 2014 10:40 AM
To: Daniel, Ronald (ELS-SDG)
Cc: user@spark.apache.org
Subject: Re: Accessing neighboring elements in an RDD

Interestingly, there was an almost identical question posed on Aug 22 by 
cjwang. Here's the link to the archive: 
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-td12621.html#a12664

On Wed, Sep 3, 2014 at 10:33 AM, Daniel, Ronald (ELS-SDG) 
<r.dan...@elsevier.com<mailto:r.dan...@elsevier.com>> wrote:
Hi all,

Assume I have read the lines of a text file into an RDD:

    textFile = sc.textFile("SomeArticle.txt")

Also assume that the sentence breaks in SomeArticle.txt were done by machine 
and have some errors, such as the break at Fig. in the sample text below.

Index   Text
N        ...as shown in Fig.
N+1     1.
N+2     The figure shows...

What I want is an RDD with:

N       ... as shown in Fig. 1.
N+1     The figure shows...

Is there some way a filter() can look at neighboring elements in an RDD? That 
way I could look, in parallel, at neighboring elements in an RDD and come up 
with a new RDD that may have a different number of elements.  Or do I just have 
to sequentially iterate through the RDD?

Thanks,
Ron


Reply via email to