> On Nov. 14, 2014, 12:46 a.m., Matthew Hayes wrote:
> > datafu-pig/src/main/macros/nlp/tf_idf.pig, line 27
> > <https://reviews.apache.org/r/27820/diff/1/?file=756916#file756916line27>
> >
> >     We should give some thought towards how to best namespace this and 
> > other macros.
> >     
> >     Although a bit wordy, this would avoid conflicts in the future:
> >     
> >     DataFu_TFIDF_OpenNlp_Simple
> >     
> >     If we supported a maximum entropy version later we could have:
> >     
> >     DataFu_TFIDF_OpenNlp_MaxEnt
> >     
> >     I am open to ideas :)
> >     
> >     We may also want to have a version of the macro in the future where the 
> > tokens can be fed in, without tokenization of raw text.
> 
> Russell Jurney wrote:
>     This sounds pretty reasonable. Actually, why don't I make the sample UDF 
> configurable? An option of the Macro.
> 
> Matthew Hayes wrote:
>     It might be hard to parameterize all the NLP options.  For example, the 
> TokenizeME takes a parameter for the tokenization data.  Unfortunately macros 
> cannot include other macros so we'd have to either copy and paste a lot or 
> come up with some build mechanism to templatize these.
> 
> Russell Jurney wrote:
>     You're right about the different interfaces to the tokenizers. Who wrote 
> those things, anyway? :) I'll use the name you suggested. 
>     
>     However, macros can call other macros. Is that what you meant by include? 
> I did a post on this here: 
> http://datasyndrome.com/post/17186084960/the-power-of-pig-macros

Oh didn't know that macros could call other macros.  That's what I meant 
instead of include.  I thought they couldn't call other macros but I guess I am 
just not up to date on the documentation :)  If that's the case that it works, 
then you can put the majority of the code in a common macro that the others 
call.


- Matthew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27820/#review61350
-----------------------------------------------------------


On Nov. 10, 2014, 6:33 p.m., Russell Jurney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27820/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2014, 6:33 p.m.)
> 
> 
> Review request for DataFu, pig, Joseph Adler, Jakob Homan, Matthew Hayes, and 
> Sam Shah.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> DATAFU-61 - Add TF-IDF Macro to DataFu
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/macros/nlp/tf_idf.pig PRE-CREATION 
>   datafu-pig/src/test/macros/nlp/test_tf_idf.pig PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27820/diff/
> 
> 
> Testing
> -------
> 
> Works for me, but testing not automated. See 
> https://issues.apache.org/jira/browse/DATAFU-61
> 
> 
> Thanks,
> 
> Russell Jurney
> 
>

Reply via email to