[ https://issues.apache.org/jira/browse/SPARK-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386924#comment-14386924 ]
Tomasz Bartczak commented on SPARK-6239: ---------------------------------------- I also stumbled upon this little inconvenience in the API. My point in the discussion is that 1. FPGrowth internally is using a Long value internally, aka minCount (see https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala#L120 and https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala#L146) - so it is more performant to specify that directly without doing a count. 2. a good API can be used in multiple use cases. This PR https://github.com/apache/spark/pull/5246 adds 'minCount' to be specified while keeping the existing API untouched. Why would that be a bad idea to include such an option then? > Spark MLlib fpm#FPGrowth minSupport should use long instead > ----------------------------------------------------------- > > Key: SPARK-6239 > URL: https://issues.apache.org/jira/browse/SPARK-6239 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Littlestar > Priority: Minor > > Spark MLlib fpm#FPGrowth minSupport should use long instead > ============== > val minCount = math.ceil(minSupport * count).toLong > because: > 1. [count]numbers of datasets is not kown before read. > 2. [minSupport ]double precision. > from mahout#FPGrowthDriver.java > addOption("minSupport", "s", "(Optional) The minimum number of times a > co-occurrence must be present." > + " Default Value: 3", "3"); > I just want to set minCount=2 for test. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org