[ 
https://issues.apache.org/jira/browse/SPARK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512765#comment-14512765
 ] 

Joseph K. Bradley commented on SPARK-7143:
------------------------------------------

Do you have some references to recent papers and current use cases in industry, 
especially ones showing BM25 is much better than TF-IDF?  It will be good to 
figure out whether it is clearly better than TF-IDF, or if it is best in 
specialized cases (and would then be better as a Spark package).

Also, can you please comment on which variant you're implementing?  The 
Wikipedia page makes it sound like some corrections are necessary for the basic 
BM25 in order to make it more practical.

Thanks!

> Add BM25 Estimator
> ------------------
>
>                 Key: SPARK-7143
>                 URL: https://issues.apache.org/jira/browse/SPARK-7143
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Liang-Chi Hsieh
>
> [BM25|http://en.wikipedia.org/wiki/Okapi_BM25] is a retrieval function used 
> to rank documents. It is commonly used in IR tasks and can be parallel. This 
> issue is proposed to add it into Spark ML.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to