[ https://issues.apache.org/jira/browse/SPARK-7143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512765#comment-14512765 ]
Joseph K. Bradley commented on SPARK-7143: ------------------------------------------ Do you have some references to recent papers and current use cases in industry, especially ones showing BM25 is much better than TF-IDF? It will be good to figure out whether it is clearly better than TF-IDF, or if it is best in specialized cases (and would then be better as a Spark package). Also, can you please comment on which variant you're implementing? The Wikipedia page makes it sound like some corrections are necessary for the basic BM25 in order to make it more practical. Thanks! > Add BM25 Estimator > ------------------ > > Key: SPARK-7143 > URL: https://issues.apache.org/jira/browse/SPARK-7143 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: Liang-Chi Hsieh > > [BM25|http://en.wikipedia.org/wiki/Okapi_BM25] is a retrieval function used > to rank documents. It is commonly used in IR tasks and can be parallel. This > issue is proposed to add it into Spark ML. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org