Thanks Xiangrui, that looks very helpful. Best regards, Ron
> -----Original Message----- > From: Xiangrui Meng [mailto:men...@gmail.com] > Sent: Wednesday, September 03, 2014 1:19 PM > To: Daniel, Ronald (ELS-SDG) > Cc: Victor Tso-Guillen; user@spark.apache.org > Subject: Re: Accessing neighboring elements in an RDD > > There is a sliding method implemented in MLlib > (https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/a > pache/spark/mllib/rdd/SlidingRDD.scala), > which is used in computing Area Under Curve: > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/a > pache/spark/mllib/evaluation/AreaUnderCurve.scala#L45 > > With it, you can process neighbor lines by > > rdd.sliding(3).map { case Seq(l0, l1, l2) => ... } > > -Xiangrui > > On Wed, Sep 3, 2014 at 11:30 AM, Daniel, Ronald (ELS-SDG) > <r.dan...@elsevier.com> wrote: > > Thanks for the pointer to that thread. Looks like there is some demand > > for this capability, but not a lot yet. Also doesn't look like there > > is an easy answer right now. > > > > > > > > Thanks, > > > > Ron > > > > > > > > > > > > From: Victor Tso-Guillen [mailto:v...@paxata.com] > > Sent: Wednesday, September 03, 2014 10:40 AM > > To: Daniel, Ronald (ELS-SDG) > > Cc: user@spark.apache.org > > Subject: Re: Accessing neighboring elements in an RDD > > > > > > > > Interestingly, there was an almost identical question posed on Aug 22 > > by cjwang. Here's the link to the archive: > > http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-a > > nd-next-element-in-a-sorted-RDD-td12621.html#a12664 > > > > > > > > On Wed, Sep 3, 2014 at 10:33 AM, Daniel, Ronald (ELS-SDG) > > <r.dan...@elsevier.com> wrote: > > > > Hi all, > > > > Assume I have read the lines of a text file into an RDD: > > > > textFile = sc.textFile("SomeArticle.txt") > > > > Also assume that the sentence breaks in SomeArticle.txt were done by > > machine and have some errors, such as the break at Fig. in the sample text > below. > > > > Index Text > > N ...as shown in Fig. > > N+1 1. > > N+2 The figure shows... > > > > What I want is an RDD with: > > > > N ... as shown in Fig. 1. > > N+1 The figure shows... > > > > Is there some way a filter() can look at neighboring elements in an RDD? > > That way I could look, in parallel, at neighboring elements in an RDD > > and come up with a new RDD that may have a different number of > > elements. Or do I just have to sequentially iterate through the RDD? > > > > Thanks, > > Ron > > > >