[ 
https://issues.apache.org/jira/browse/SPARK-23269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350308#comment-16350308
 ] 

Sean Owen commented on SPARK-23269:
-----------------------------------

Doesn't this incur similar overhead for every caller though?

> FP-growth: Provide last transaction for each detected frequent pattern
> ----------------------------------------------------------------------
>
>                 Key: SPARK-23269
>                 URL: https://issues.apache.org/jira/browse/SPARK-23269
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.1
>            Reporter: Arseniy Tashoyan
>            Priority: Minor
>              Labels: MLlib, fp-growth
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> FP-growth implementation gives patterns and their frequences:
> _model.freqItemsets_:
> ||items||freq||
> |[5]|3|
> |[5, 1]|3|
> It would be great to know when each pattern occurred last time - what is the 
> last transaction having this pattern?
> To do so, it will be necessary to tell FPGrowth what is the timestamp column 
> in the transactions data frame:
> {code:java}
> val fpgrowth = new FPGrowth()
>   .setItemsCol("items")
>   .setTimestampCol("timestamp")
> {code}
> So the data frame with patterns could look like:
> ||items||freq||lastOccurrence||
> |[5]|3|2018-01-01 12:15:00|
> |[5, 1]|3|2018-01-01 12:15:00|
> Without this functionality, it is necessary to traverse the transactions data 
> frame with the set of detected patterns and determine the last transaction 
> for each pattern. Why traverse transactions once again if it has been already 
> done in FP-growth execution?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to