There is a sliding method implemented in MLlib
(https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala),
which is used in computing Area Under Curve:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation/AreaUnderCurve.scala#L45

With it, you can process neighbor lines by

rdd.sliding(3).map { case Seq(l0, l1, l2) => ... }

-Xiangrui

On Wed, Sep 3, 2014 at 11:30 AM, Daniel, Ronald (ELS-SDG)
<r.dan...@elsevier.com> wrote:
> Thanks for the pointer to that thread. Looks like there is some demand for
> this capability, but not a lot yet. Also doesn't look like there is an easy
> answer right now.
>
>
>
> Thanks,
>
> Ron
>
>
>
>
>
> From: Victor Tso-Guillen [mailto:v...@paxata.com]
> Sent: Wednesday, September 03, 2014 10:40 AM
> To: Daniel, Ronald (ELS-SDG)
> Cc: user@spark.apache.org
> Subject: Re: Accessing neighboring elements in an RDD
>
>
>
> Interestingly, there was an almost identical question posed on Aug 22 by
> cjwang. Here's the link to the archive:
> http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-and-next-element-in-a-sorted-RDD-td12621.html#a12664
>
>
>
> On Wed, Sep 3, 2014 at 10:33 AM, Daniel, Ronald (ELS-SDG)
> <r.dan...@elsevier.com> wrote:
>
> Hi all,
>
> Assume I have read the lines of a text file into an RDD:
>
>     textFile = sc.textFile("SomeArticle.txt")
>
> Also assume that the sentence breaks in SomeArticle.txt were done by machine
> and have some errors, such as the break at Fig. in the sample text below.
>
> Index   Text
> N        ...as shown in Fig.
> N+1     1.
> N+2     The figure shows...
>
> What I want is an RDD with:
>
> N       ... as shown in Fig. 1.
> N+1     The figure shows...
>
> Is there some way a filter() can look at neighboring elements in an RDD?
> That way I could look, in parallel, at neighboring elements in an RDD and
> come up with a new RDD that may have a different number of elements.  Or do
> I just have to sequentially iterate through the RDD?
>
> Thanks,
> Ron
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to