> On Nov. 14, 2014, 12:46 a.m., Matthew Hayes wrote: > > datafu-pig/src/main/macros/nlp/tf_idf.pig, line 27 > > <https://reviews.apache.org/r/27820/diff/1/?file=756916#file756916line27> > > > > We should give some thought towards how to best namespace this and > > other macros. > > > > Although a bit wordy, this would avoid conflicts in the future: > > > > DataFu_TFIDF_OpenNlp_Simple > > > > If we supported a maximum entropy version later we could have: > > > > DataFu_TFIDF_OpenNlp_MaxEnt > > > > I am open to ideas :) > > > > We may also want to have a version of the macro in the future where the > > tokens can be fed in, without tokenization of raw text. > > Russell Jurney wrote: > This sounds pretty reasonable. Actually, why don't I make the sample UDF > configurable? An option of the Macro. > > Matthew Hayes wrote: > It might be hard to parameterize all the NLP options. For example, the > TokenizeME takes a parameter for the tokenization data. Unfortunately macros > cannot include other macros so we'd have to either copy and paste a lot or > come up with some build mechanism to templatize these.
You're right about the different interfaces to the tokenizers. Who wrote those things, anyway? :) I'll use the name you suggested. However, macros can call other macros. Is that what you meant by include? I did a post on this here: http://datasyndrome.com/post/17186084960/the-power-of-pig-macros - Russell ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27820/#review61350 ----------------------------------------------------------- On Nov. 10, 2014, 6:33 p.m., Russell Jurney wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27820/ > ----------------------------------------------------------- > > (Updated Nov. 10, 2014, 6:33 p.m.) > > > Review request for DataFu, pig, Joseph Adler, Jakob Homan, Matthew Hayes, and > Sam Shah. > > > Repository: datafu > > > Description > ------- > > DATAFU-61 - Add TF-IDF Macro to DataFu > > > Diffs > ----- > > datafu-pig/src/main/macros/nlp/tf_idf.pig PRE-CREATION > datafu-pig/src/test/macros/nlp/test_tf_idf.pig PRE-CREATION > > Diff: https://reviews.apache.org/r/27820/diff/ > > > Testing > ------- > > Works for me, but testing not automated. See > https://issues.apache.org/jira/browse/DATAFU-61 > > > Thanks, > > Russell Jurney > >