Re: Review Request 24309: Review of DATAFU-65 - Add AhoCorasickMatch UDF

Matthew Hayes Tue, 05 Aug 2014 14:16:37 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24309/#review49639
-----------------------------------------------------------

How do you expect this to be used in practice?  Would one large dictionary be 
applied to a large collection of strings to identify the matches within each 
string?  Or, do you expect a different dictionary to be applied to each string? 
 If you expect the same dictionary to be used, then it seems we miss out on the 
potential with this implementation to build the trie once and reuse it over and 
over.  Should the dictionary instead be loaded from HDFS via the distributed 
cache and lazy loaded on the first call to exec()?  This way you only build the 
trie once.

- Matthew Hayes

On Aug. 5, 2014, 4:14 p.m., Russell Jurney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24309/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2014, 4:14 p.m.)
> 
> 
> Review request for DataFu, Jakob Homan, Matthew Hayes, and Sam Shah.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> See DATAFU-65
> 
> 
> Diffs
> -----
> 
>   datafu-pig/build.gradle e21a5b1 
>   datafu-pig/src/main/java/datafu/pig/text/AhoCorasickMatch.java PRE-CREATION 
>   datafu-pig/src/test/java/datafu/test/pig/text/AhoCorasickMatchTest.java 
> PRE-CREATION 
>   gradle/dependency-versions.gradle eb24e4a 
> 
> Diff: https://reviews.apache.org/r/24309/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Russell Jurney
> 
>

Re: Review Request 24309: Review of DATAFU-65 - Add AhoCorasickMatch UDF

Reply via email to