[ https://issues.apache.org/jira/browse/SPARK-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864872#comment-15864872 ]
Nick Pentreath commented on SPARK-14503: ---------------------------------------- Seems {{PrefixSpan}} even takes different input: {{Array[Array[T]]}} vs FPGrowth: {{Array[T]}}. So it may be tricky to unify. However we do have the case where e.g. {{QuantileDiscretizer}} returns a {{Bucketizer}} as {{Model}} from {{fit}}. In that case {{Bucketizer}} can be instantiated directly and independently, but it could in theory be the case that some other estimator returns a {{Bucketizer}} as its model. So we could perhaps think about both {{FPGrowth}} and {{PrefixSpan}} returning an {{AssociationRuleModel}} from {{fit}}. It could work if the input can be generalized to {{Seq[T]}} where for {{FPGrowth}} it would be {{Seq[Item]}} and for {{PrefixSpan}} it would be {{Seq[Seq[Item]]}}. The output of {{transform}} for the model would be the predicted items as above. It would expose {{getFreqItems}} and {{getAssociationRules}} both returning a {{DataFrame}}. Is there something in the nature of {{PrefixSpan}} vs {{FPGrowth}} that makes this too difficult? (I'll have to go read the papers when I get some time!) But having said that it could be pretty complex to try to support this. If so, unless there's a compelling argument I'd go for [~josephkb]'s suggestion above, and hide the association rule class for now (can expose later as needed). Then {{PrefixSpan}} will be totally independent and return its own {{PrefixSpanModel}} (that may also expose a {{transform}} method that has similar semantics but different internals). > spark.ml Scala API for FPGrowth > ------------------------------- > > Key: SPARK-14503 > URL: https://issues.apache.org/jira/browse/SPARK-14503 > Project: Spark > Issue Type: Sub-task > Components: ML > Reporter: Joseph K. Bradley > > This task is the first port of spark.mllib.fpm functionality to spark.ml > (Scala). > This will require a brief design doc to confirm a reasonable DataFrame-based > API, with details for this class. The doc could also look ahead to the other > fpm classes, especially if their API decisions will affect FPGrowth. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org