minTermFreq}

yuhao yang (JIRA) Wed, 22 Jun 2016 21:55:27 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345736#comment-15345736
 ]


yuhao yang commented on SPARK-16149:
------------------------------------

For the general guideline, I would vote for consistency with existing API in 
MLlib. It only brings confusion to users if we use different names for similar 
parameters in different algorithms.
For this specific issue here, we can perhaps deprecate the current minTF/minDF 
and add new API for minTermFreq/minDocFreq.


> API consistency discussion: CountVectorizer.{minDF -> minDocFreq, minTF -> 
> minTermFreq}
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-16149
>                 URL: https://issues.apache.org/jira/browse/SPARK-16149
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: MLlib
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>
> We used `minDF` and `minTF` in CountVectorizer and `minDocFreq` in IDF. It 
> would be nice to keep the naming consistent. This was discussed in 
> https://github.com/apache/spark/pull/7388 and the decision was made based on 
> sklearn compatibility. However, we didn't look broadly across MLlib APIs. 
> Maybe we can live with this small inconsistency but it would be nice to 
> discuss the guideline (consistent with other libraries or existing ones in 
> MLlib).
> cc: [~josephkb] [~yuhaoyan]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16149) API consistency discussion: CountVectorizer.{minDF -> minDocFreq, minTF -> minTermFreq}

Reply via email to