Hello !

You could try something like that :

def exists[T](rdd:RDD[T])(f:T=>Boolean, n:Int):Boolean = {
  rdd.filter(f).countApprox(timeout = 10000).getFinalValue().low > n
}

If would work for large datasets and large value of n.

Have a nice day,

Jonathan



On 31 July 2015 at 11:29, Carsten Schnober <
schno...@ukp.informatik.tu-darmstadt.de> wrote:

> Hi,
> the RDD class does not have an exist()-method (in the Scala API), but
> the functionality you need seems easy to resemble with the existing
> methods:
>
> val containsNMatchingElements =
> data.filter(qualifying_function).take(n).count() >= n
>
> Note: I am not sure whether the intermediate take(n) really increases
> performance, but the idea is to arbitrarily reduce the number of
> elements in the RDD before counting because we are not interested in the
> full count.
>
> If you need to check specifically whether there is at least one matching
> occurrence, it is probably preferable to use isEmpty() instead of
> count() and check whether the result is false:
>
> val contains1MatchingElement =
> !(data.filter(qualifying_function).isEmpty())
>
> Best,
> Carsten
>
>
>
> Am 31.07.2015 um 11:11 schrieb Sandeep Giri:
> > Dear Spark Dev Community,
> >
> > I am wondering if there is already a function to solve my problem. If
> > not, then should I work on this?
> >
> > Say you just want to check if a word exists in a huge text file. I could
> > not find better ways than those mentioned here
> > <
> http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2#q6
> >.
> >
> > So, I was proposing if we have a function called /exists /in RDD with
> > the following signature:
> >
> > #returns the true if n elements exist which qualify our criteria.
> > #qualifying function would receive the element and its index and return
> > true or false.
> > def /exists/(qualifying_function, n):
> >      ....
> >
> >
> > Regards,
> > Sandeep Giri,
> > +1 347 781 4573 (US)
> > +91-953-899-8962 (IN)
> >
> > www.KnowBigData.com. <http://KnowBigData.com.>
> > Phone: +1-253-397-1945 (Office)
> >
> > linkedin icon <https://linkedin.com/company/knowbigdata> other site icon
> > <http://knowbigdata.com> facebook icon
> > <https://facebook.com/knowbigdata>twitter icon
> > <https://twitter.com/IKnowBigData><https://twitter.com/IKnowBigData>
> >
>
> --
> Carsten Schnober
> Doctoral Researcher
> Ubiquitous Knowledge Processing (UKP) Lab
> FB 20 / Computer Science Department
> Technische Universität Darmstadt
> Hochschulstr. 10, D-64289 Darmstadt, Germany
> phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111
> schno...@ukp.informatik.tu-darmstadt.de
> www.ukp.tu-darmstadt.de
>
> Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de
> GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources
> (AIPHES): www.aiphes.tu-darmstadt.de
> PhD program: Knowledge Discovery in Scientific Literature (KDSL)
> www.kdsl.tu-darmstadt.de
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to