[ https://issues.apache.org/jira/browse/SPARK-19940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joseph K. Bradley reassigned SPARK-19940: ----------------------------------------- Assignee: Maciej Szymkiewicz > FPGrowthModel.transform should skip duplicated items > ---------------------------------------------------- > > Key: SPARK-19940 > URL: https://issues.apache.org/jira/browse/SPARK-19940 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.2.0 > Reporter: Maciej Szymkiewicz > Assignee: Maciej Szymkiewicz > Priority: Minor > > Due to misplaced {{distinct}} {{FPGrowthModel.transform} generates duplicated > items in the "prediction": > {code} > scala> val data = > spark.read.text("data/mllib/sample_fpgrowth.txt").select(split($"value", > "\\s+").alias("features")) > data: org.apache.spark.sql.DataFrame = [features: array<string>] > scala> val data = > spark.read.text("data/mllib/sample_fpgrowth.txt").select(split($"value", > "\\s+").alias("features")) > data: org.apache.spark.sql.DataFrame = [features: array<string>] > scala> fpm.transform(Seq(Array("t", "s")).toDF("features")).show(1, false) > +--------+---------------------+ > |features|prediction | > +--------+---------------------+ > |[t, s] |[y, x, z, x, y, x, z]| > +--------+---------------------+ > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org