Re: Cartesian join on RDDs taking too much time

Priya Ch Wed, 25 May 2016 04:28:51 -0700

Why do i need to deploy solr for text anaytics...i have files placed in
HDFS. just need to look for matches against each string in both files and
generate those records whose match is > 85%. We trying to Fuzzy match
logic.


How can use map/reduce operations across 2 rdds ?

Thanks,
Padma Ch

On Wed, May 25, 2016 at 4:49 PM, Jörn Franke <jornfra...@gmail.com> wrote:

>
> Alternatively depending on the exact use case you may employ solr on
> Hadoop for text analytics
>
> > On 25 May 2016, at 12:57, Priya Ch <learnings.chitt...@gmail.com> wrote:
> >
> > Lets say i have rdd A of strings as  {"hi","bye","ch"} and another RDD B
> of
> > strings as {"padma","hihi","chch","priya"}. For every string rdd A i need
> > to check the matches found in rdd B as such for string "hi" i have to
> check
> > the matches against all strings in RDD B which means I need generate
> every
> > possible combination r
>

Re: Cartesian join on RDDs taking too much time

Reply via email to