Re: Review Request 24309: Review of DATAFU-65 - Add AhoCorasickMatch UDF

Russell Jurney Tue, 05 Aug 2014 16:33:40 -0700


> On Aug. 5, 2014, 9:14 p.m., Matthew Hayes wrote:
> > How do you expect this to be used in practice?  Would one large dictionary 
> > be applied to a large collection of strings to identify the matches within 
> > each string?  Or, do you expect a different dictionary to be applied to 
> > each string?  If you expect the same dictionary to be used, then it seems 
> > we miss out on the potential with this implementation to build the trie 
> > once and reuse it over and over.  Should the dictionary instead be loaded 
> > from HDFS via the distributed cache and lazy loaded on the first call to 
> > exec()?  This way you only build the trie once.


You make a good point. The way I plan to use this is to group a relation of 
match words ALL, then to CROSS it with my text to be matched against, and the 
same words will be matched against a large number of strings. Compared to your 
suggestion, my plan is dumb. I think I will do what you suggest.


- Russell


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24309/#review49639
-----------------------------------------------------------


On Aug. 5, 2014, 4:14 p.m., Russell Jurney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24309/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2014, 4:14 p.m.)
> 
> 
> Review request for DataFu, Jakob Homan, Matthew Hayes, and Sam Shah.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> See DATAFU-65
> 
> 
> Diffs
> -----
> 
>   datafu-pig/build.gradle e21a5b1 
>   datafu-pig/src/main/java/datafu/pig/text/AhoCorasickMatch.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/text/AhoCorasickMatchTest.java 
> PRE-CREATION 
>   gradle/dependency-versions.gradle eb24e4a 
> 
> Diff: https://reviews.apache.org/r/24309/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Russell Jurney
> 
>

Re: Review Request 24309: Review of DATAFU-65 - Add AhoCorasickMatch UDF

Reply via email to