Hello ! You could try something like that :
def exists[T](rdd:RDD[T])(f:T=>Boolean, n:Int):Boolean = { rdd.filter(f).countApprox(timeout = 10000).getFinalValue().low > n } If would work for large datasets and large value of n. Have a nice day, Jonathan On 31 July 2015 at 11:29, Carsten Schnober < schno...@ukp.informatik.tu-darmstadt.de> wrote: > Hi, > the RDD class does not have an exist()-method (in the Scala API), but > the functionality you need seems easy to resemble with the existing > methods: > > val containsNMatchingElements = > data.filter(qualifying_function).take(n).count() >= n > > Note: I am not sure whether the intermediate take(n) really increases > performance, but the idea is to arbitrarily reduce the number of > elements in the RDD before counting because we are not interested in the > full count. > > If you need to check specifically whether there is at least one matching > occurrence, it is probably preferable to use isEmpty() instead of > count() and check whether the result is false: > > val contains1MatchingElement = > !(data.filter(qualifying_function).isEmpty()) > > Best, > Carsten > > > > Am 31.07.2015 um 11:11 schrieb Sandeep Giri: > > Dear Spark Dev Community, > > > > I am wondering if there is already a function to solve my problem. If > > not, then should I work on this? > > > > Say you just want to check if a word exists in a huge text file. I could > > not find better ways than those mentioned here > > < > http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2#q6 > >. > > > > So, I was proposing if we have a function called /exists /in RDD with > > the following signature: > > > > #returns the true if n elements exist which qualify our criteria. > > #qualifying function would receive the element and its index and return > > true or false. > > def /exists/(qualifying_function, n): > > .... > > > > > > Regards, > > Sandeep Giri, > > +1 347 781 4573 (US) > > +91-953-899-8962 (IN) > > > > www.KnowBigData.com. <http://KnowBigData.com.> > > Phone: +1-253-397-1945 (Office) > > > > linkedin icon <https://linkedin.com/company/knowbigdata> other site icon > > <http://knowbigdata.com> facebook icon > > <https://facebook.com/knowbigdata>twitter icon > > <https://twitter.com/IKnowBigData><https://twitter.com/IKnowBigData> > > > > -- > Carsten Schnober > Doctoral Researcher > Ubiquitous Knowledge Processing (UKP) Lab > FB 20 / Computer Science Department > Technische Universität Darmstadt > Hochschulstr. 10, D-64289 Darmstadt, Germany > phone [+49] (0)6151 16-6227, fax -5455, room S2/02/B111 > schno...@ukp.informatik.tu-darmstadt.de > www.ukp.tu-darmstadt.de > > Web Research at TU Darmstadt (WeRC): www.werc.tu-darmstadt.de > GRK 1994: Adaptive Preparation of Information from Heterogeneous Sources > (AIPHES): www.aiphes.tu-darmstadt.de > PhD program: Knowledge Discovery in Scientific Literature (KDSL) > www.kdsl.tu-darmstadt.de > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >