Hi all,

Assume I have read the lines of a text file into an RDD:

    textFile = sc.textFile("SomeArticle.txt")

Also assume that the sentence breaks in SomeArticle.txt were done by machine 
and have some errors, such as the break at Fig. in the sample text below.

Index   Text
N        ...as shown in Fig.
N+1     1.
N+2     The figure shows...

What I want is an RDD with:

N       ... as shown in Fig. 1.
N+1     The figure shows...

Is there some way a filter() can look at neighboring elements in an RDD? That 
way I could look, in parallel, at neighboring elements in an RDD and come up 
with a new RDD that may have a different number of elements.  Or do I just have 
to sequentially iterate through the RDD?

Thanks,
Ron


Reply via email to