[ 
https://issues.apache.org/jira/browse/SPARK-20324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cyril de Vogelaere updated SPARK-20324:
---------------------------------------
    Description: 
The idea behind this improvement would be to allow better control over the size 
of itemSets in solution patterns.

For example, assuming you posses a huge dataset of series product bought 
together, one sequence per client. And you want to find item frequently bough 
in pairs, as to make interesting promotions to your client or boost certains 
sales.

In the current implementation, all solutions would have to be calculated, 
before the user can sort through them and select only interesting ones.

What i'm proposing here, is the addition of two parameters : 

First, a maxItemPerItemset parameter which would limit the maximum number of 
item per itemset to a certain size X. Allowing potential important reduction in 
the search space, hastening the process of finding theses specific solutions.

Second a tandem minItemPerItemset parameter  that would limit the minimum 
number of item per itemset. Discarding solution that do not fit this 
constraint. Although this wouldn't entail a reduction of the constraint, this 
should still allow interested user to reduce the number of solutions collected 
by the driver.

If this improvement proposition seems interesting to the community, I will 
implement a solution along with test to guarantee the correcteness of it's 
implementation.



  was:
The idea behind this improvement would be to allow better control over the size 
of itemSets in solution patterns.

For example, assuming you posses a huge dataset of series product bought 
together, one sequence per client. And you want to find item frequently bough 
in pairs, as to make interesting promotions to your client or boost certains 
sales.

In the current implementation, all solutions would have to be calculated, 
before the user can sort through them and select only interesting ones.

What i'm proposing here, is the addition of two parameters : 

First, a maxItemPerItemset parameter which would limit the maximum number of 
item per itemset to a certain size X. Allowing potential important reduction in 
the search space, hastening the process of finding theses specific solutions.

Second a tandem minItemPerItemset parameter  that would limit the minimum 
number of item per itemset. Discarding solution that do not fit this 
constraint. Although this wouldn't entail a reduction of the constraint, this 
should still allow interested user to reduce the number of solutions collected 
by the driver.

If this solution seems interesting to the community, I will implement a 
solution along with test to guarantee the correcteness of it's implementation.




> Control itemSets length in PrefixSpan
> -------------------------------------
>
>                 Key: SPARK-20324
>                 URL: https://issues.apache.org/jira/browse/SPARK-20324
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 2.1.0
>            Reporter: Cyril de Vogelaere
>            Priority: Minor
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> The idea behind this improvement would be to allow better control over the 
> size of itemSets in solution patterns.
> For example, assuming you posses a huge dataset of series product bought 
> together, one sequence per client. And you want to find item frequently bough 
> in pairs, as to make interesting promotions to your client or boost certains 
> sales.
> In the current implementation, all solutions would have to be calculated, 
> before the user can sort through them and select only interesting ones.
> What i'm proposing here, is the addition of two parameters : 
> First, a maxItemPerItemset parameter which would limit the maximum number of 
> item per itemset to a certain size X. Allowing potential important reduction 
> in the search space, hastening the process of finding theses specific 
> solutions.
> Second a tandem minItemPerItemset parameter  that would limit the minimum 
> number of item per itemset. Discarding solution that do not fit this 
> constraint. Although this wouldn't entail a reduction of the constraint, this 
> should still allow interested user to reduce the number of solutions 
> collected by the driver.
> If this improvement proposition seems interesting to the community, I will 
> implement a solution along with test to guarantee the correcteness of it's 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to