-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27820/#review61350
-----------------------------------------------------------



datafu-pig/src/main/macros/nlp/tf_idf.pig
<https://reviews.apache.org/r/27820/#comment102924>

    Overall I like the simplicity of this macro.  It seems really easy to use.  
I would add a note on how tokenization is done (i.e. using TokenizeSimple, 
which uses character classes) and that it uses augmented term freq.



datafu-pig/src/main/macros/nlp/tf_idf.pig
<https://reviews.apache.org/r/27820/#comment102925>

    We should give some thought towards how to best namespace this and other 
macros.
    
    Although a bit wordy, this would avoid conflicts in the future:
    
    DataFu_TFIDF_OpenNlp_Simple
    
    If we supported a maximum entropy version later we could have:
    
    DataFu_TFIDF_OpenNlp_MaxEnt
    
    I am open to ideas :)
    
    We may also want to have a version of the macro in the future where the 
tokens can be fed in, without tokenization of raw text.


- Matthew Hayes


On Nov. 10, 2014, 6:33 p.m., Russell Jurney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27820/
> -----------------------------------------------------------
> 
> (Updated Nov. 10, 2014, 6:33 p.m.)
> 
> 
> Review request for DataFu, pig, Joseph Adler, Jakob Homan, Matthew Hayes, and 
> Sam Shah.
> 
> 
> Repository: datafu
> 
> 
> Description
> -------
> 
> DATAFU-61 - Add TF-IDF Macro to DataFu
> 
> 
> Diffs
> -----
> 
>   datafu-pig/src/main/macros/nlp/tf_idf.pig PRE-CREATION 
>   datafu-pig/src/test/macros/nlp/test_tf_idf.pig PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27820/diff/
> 
> 
> Testing
> -------
> 
> Works for me, but testing not automated. See 
> https://issues.apache.org/jira/browse/DATAFU-61
> 
> 
> Thanks,
> 
> Russell Jurney
> 
>

Reply via email to