Thanks Xiangrui, that looks very helpful.

Best regards,
Ron


> -----Original Message-----
> From: Xiangrui Meng [mailto:men...@gmail.com]
> Sent: Wednesday, September 03, 2014 1:19 PM
> To: Daniel, Ronald (ELS-SDG)
> Cc: Victor Tso-Guillen; user@spark.apache.org
> Subject: Re: Accessing neighboring elements in an RDD
> 
> There is a sliding method implemented in MLlib
> (https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/a
> pache/spark/mllib/rdd/SlidingRDD.scala),
> which is used in computing Area Under Curve:
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/a
> pache/spark/mllib/evaluation/AreaUnderCurve.scala#L45
> 
> With it, you can process neighbor lines by
> 
> rdd.sliding(3).map { case Seq(l0, l1, l2) => ... }
> 
> -Xiangrui
> 
> On Wed, Sep 3, 2014 at 11:30 AM, Daniel, Ronald (ELS-SDG)
> <r.dan...@elsevier.com> wrote:
> > Thanks for the pointer to that thread. Looks like there is some demand
> > for this capability, but not a lot yet. Also doesn't look like there
> > is an easy answer right now.
> >
> >
> >
> > Thanks,
> >
> > Ron
> >
> >
> >
> >
> >
> > From: Victor Tso-Guillen [mailto:v...@paxata.com]
> > Sent: Wednesday, September 03, 2014 10:40 AM
> > To: Daniel, Ronald (ELS-SDG)
> > Cc: user@spark.apache.org
> > Subject: Re: Accessing neighboring elements in an RDD
> >
> >
> >
> > Interestingly, there was an almost identical question posed on Aug 22
> > by cjwang. Here's the link to the archive:
> > http://apache-spark-user-list.1001560.n3.nabble.com/Finding-previous-a
> > nd-next-element-in-a-sorted-RDD-td12621.html#a12664
> >
> >
> >
> > On Wed, Sep 3, 2014 at 10:33 AM, Daniel, Ronald (ELS-SDG)
> > <r.dan...@elsevier.com> wrote:
> >
> > Hi all,
> >
> > Assume I have read the lines of a text file into an RDD:
> >
> >     textFile = sc.textFile("SomeArticle.txt")
> >
> > Also assume that the sentence breaks in SomeArticle.txt were done by
> > machine and have some errors, such as the break at Fig. in the sample text
> below.
> >
> > Index   Text
> > N        ...as shown in Fig.
> > N+1     1.
> > N+2     The figure shows...
> >
> > What I want is an RDD with:
> >
> > N       ... as shown in Fig. 1.
> > N+1     The figure shows...
> >
> > Is there some way a filter() can look at neighboring elements in an RDD?
> > That way I could look, in parallel, at neighboring elements in an RDD
> > and come up with a new RDD that may have a different number of
> > elements.  Or do I just have to sequentially iterate through the RDD?
> >
> > Thanks,
> > Ron
> >
> >

Reply via email to