----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24309/#review49639 -----------------------------------------------------------
How do you expect this to be used in practice? Would one large dictionary be applied to a large collection of strings to identify the matches within each string? Or, do you expect a different dictionary to be applied to each string? If you expect the same dictionary to be used, then it seems we miss out on the potential with this implementation to build the trie once and reuse it over and over. Should the dictionary instead be loaded from HDFS via the distributed cache and lazy loaded on the first call to exec()? This way you only build the trie once. - Matthew Hayes On Aug. 5, 2014, 4:14 p.m., Russell Jurney wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/24309/ > ----------------------------------------------------------- > > (Updated Aug. 5, 2014, 4:14 p.m.) > > > Review request for DataFu, Jakob Homan, Matthew Hayes, and Sam Shah. > > > Repository: datafu > > > Description > ------- > > See DATAFU-65 > > > Diffs > ----- > > datafu-pig/build.gradle e21a5b1 > datafu-pig/src/main/java/datafu/pig/text/AhoCorasickMatch.java PRE-CREATION > datafu-pig/src/test/java/datafu/test/pig/text/AhoCorasickMatchTest.java > PRE-CREATION > gradle/dependency-versions.gradle eb24e4a > > Diff: https://reviews.apache.org/r/24309/diff/ > > > Testing > ------- > > > Thanks, > > Russell Jurney > >