FP-Growth clarification for Market Basket Analysis

2018-10-18 Thread aditipatel
Hello everyone,

We are working to develop an Also-Bought field using Spark FP-Growth. I use
the transform function to find products that are sold together most often.
When we use the transform function to determine consequents I was wondering,
are the predictions order from most to least likely? If not is there a way
for me to make sure that they are ordered?

Thank you,
Aditi Patel




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Market Basket Analysis by deploying FP Growth algorithm

2017-04-05 Thread Patrick Plaatje
Hi Arun,

We have been running into the same issue (having only 1000 unique items, in 
100MM transactions), but have not investigated the root cause of this. We 
decided to run this on a cluster instead (4*16 / 64GB Ram), after which the OOM 
issue went away. However, we ran into the issue that the FPGrowth 
implementation starts spilling over to disk, and we had to increase the /tmp 
partition.

Hope it helps.

BR,
-patrick



On 05/04/2017, 10:29, "asethia" <sethia.a...@gmail.com> wrote:

Hi,

We are currently working on a Market Basket Analysis by deploying FP Growth
algorithm on Spark to generate association rules for product recommendation.
We are running on close to 24 million invoices over an assortment of more
than 100k products. However, whenever we relax the support threshold below a
certain level, the stack overflows. We are using Spark 1.6.2 but can somehow
invoke 1.6.3 to counter this error. The problem though is even when we
invoke Spark 1.6.3 and increase the stack size to 100M we are running out of
memory. We believe the tree grows exponentially and is stored in memory
which causes this problem. Can anyone suggest a solution to this issue
please?

Thanks
Arun



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Market-Basket-Analysis-by-deploying-FP-Growth-algorithm-tp28569.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org





-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Market Basket Analysis by deploying FP Growth algorithm

2017-04-05 Thread asethia
Hi,

We are currently working on a Market Basket Analysis by deploying FP Growth
algorithm on Spark to generate association rules for product recommendation.
We are running on close to 24 million invoices over an assortment of more
than 100k products. However, whenever we relax the support threshold below a
certain level, the stack overflows. We are using Spark 1.6.2 but can somehow
invoke 1.6.3 to counter this error. The problem though is even when we
invoke Spark 1.6.3 and increase the stack size to 100M we are running out of
memory. We believe the tree grows exponentially and is stored in memory
which causes this problem. Can anyone suggest a solution to this issue
please?

Thanks
Arun



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Market-Basket-Analysis-by-deploying-FP-Growth-algorithm-tp28569.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Market Basket Analysis

2014-12-05 Thread Sean Owen
Generally I don't think frequent-item-set algorithms are that useful.
They're simple and not probabilistic; they don't tell you what sets
occurred unusually frequently. Usually people ask for frequent item
set algos when they really mean they want to compute item similarity
or make recommendations. What's your use case?

On Thu, Dec 4, 2014 at 8:23 PM, Rohit Pujari rpuj...@hortonworks.com wrote:
 Sure, I’m looking to perform frequent item set analysis on POS data set.
 Apriori is a classic algorithm used for such tasks. Since Apriori
 implementation is not part of MLLib yet, (see
 https://issues.apache.org/jira/browse/SPARK-4001) What are some other
 options/algorithms I could use to perform a similar task? If there’s no
 spoon to spoon substitute,  spoon to fork will suffice too.

 Hopefully this provides some clarification.

 Thanks,
 Rohit



 From: Tobias Pfeiffer t...@preferred.jp
 Date: Thursday, December 4, 2014 at 7:20 PM
 To: Rohit Pujari rpuj...@hortonworks.com
 Cc: user@spark.apache.org user@spark.apache.org
 Subject: Re: Market Basket Analysis

 Hi,

 On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari rpuj...@hortonworks.com
 wrote:

 I'd like to do market basket analysis using spark, what're my options?


 To do it or not to do it ;-)

 Seriously, could you elaborate a bit on what you want to know?

 Tobias



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader of
 this message is not the intended recipient, you are hereby notified that any
 printing, copying, dissemination, distribution, disclosure or forwarding of
 this communication is strictly prohibited. If you have received this
 communication in error, please contact the sender immediately and delete it
 from your system. Thank You.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Market Basket Analysis

2014-12-05 Thread Rohit Pujari
This is a typical use case people who buy electric razors, also tend to
buy batteries and shaving gel along with it. The goal is to build a model
which will look through POS records and find which product categories have
higher likelihood of appearing together in given a transaction.

What would you recommend?

On Fri, Dec 5, 2014 at 7:21 AM, Sean Owen so...@cloudera.com wrote:

 Generally I don't think frequent-item-set algorithms are that useful.
 They're simple and not probabilistic; they don't tell you what sets
 occurred unusually frequently. Usually people ask for frequent item
 set algos when they really mean they want to compute item similarity
 or make recommendations. What's your use case?

 On Thu, Dec 4, 2014 at 8:23 PM, Rohit Pujari rpuj...@hortonworks.com
 wrote:
  Sure, I’m looking to perform frequent item set analysis on POS data set.
  Apriori is a classic algorithm used for such tasks. Since Apriori
  implementation is not part of MLLib yet, (see
  https://issues.apache.org/jira/browse/SPARK-4001) What are some other
  options/algorithms I could use to perform a similar task? If there’s no
  spoon to spoon substitute,  spoon to fork will suffice too.
 
  Hopefully this provides some clarification.
 
  Thanks,
  Rohit
 
 
 
  From: Tobias Pfeiffer t...@preferred.jp
  Date: Thursday, December 4, 2014 at 7:20 PM
  To: Rohit Pujari rpuj...@hortonworks.com
  Cc: user@spark.apache.org user@spark.apache.org
  Subject: Re: Market Basket Analysis
 
  Hi,
 
  On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari rpuj...@hortonworks.com
  wrote:
 
  I'd like to do market basket analysis using spark, what're my options?
 
 
  To do it or not to do it ;-)
 
  Seriously, could you elaborate a bit on what you want to know?
 
  Tobias
 
 
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader of
  this message is not the intended recipient, you are hereby notified that
 any
  printing, copying, dissemination, distribution, disclosure or forwarding
 of
  this communication is strictly prohibited. If you have received this
  communication in error, please contact the sender immediately and delete
 it
  from your system. Thank You.




-- 
Rohit Pujari
Solutions Engineer, Hortonworks
rpuj...@hortonworks.com
716-430-6899

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Market Basket Analysis

2014-12-05 Thread Debasish Das
Apriori can be thought as a post-processing on product similarity graph...I
call it product similarity but for each product you build a node which
keeps distinct users visiting the product and two product nodes are
connected by an edge if the intersection  0...you are assuming if no one
user visits a keyword, he is not going to visit it in the future...this
graph is not for prediction but only keeps user visits...

Anyway once you have build this graph on graphx, you can do interesting
path based analysis...Pick a product and trace it's fanout to see once
people bought this product, which other product they bought etc etc..A
first stab at the analysis is to calculate the product similarities...

You can also generate naturally occurring cluster of products but then you
are partitioning the graph using spectral or other graph partitioners like
METIS...Even the adhoc analysis of product graph will give lot of useful
insights (hopefully deeper than apriori)...

On Fri, Dec 5, 2014 at 12:25 PM, Sean Owen so...@cloudera.com wrote:

 I doubt Amazon uses a priori for this, but who knows. Usually you want
 also bought functionality, which is a form of similar-item
 computation. But you don't want to favor items that are simply
 frequently purchased in general.

 You probably want to look at pairs of items that co-occur in purchase
 histories unusually frequently by looking at (log) likelihood ratios,
 which is a straightforward item similarity computation.

 On Fri, Dec 5, 2014 at 11:43 AM, Ashic Mahtab as...@live.com wrote:
  This can definitely be useful. Frequently bought together is something
  amazon does, though surprisingly, you don't get a discount. Perhaps it
 can
  lead to offering (or avoiding!) deals on frequent itemsets.
 
  This is a good resource for frequent itemsets implementations:
  http://infolab.stanford.edu/~ullman/mmds/ch6.pdf
 
  
  From: rpuj...@hortonworks.com
  Date: Fri, 5 Dec 2014 10:31:17 -0600
  Subject: Re: Market Basket Analysis
  To: so...@cloudera.com
  CC: t...@preferred.jp; user@spark.apache.org
 
 
  This is a typical use case people who buy electric razors, also tend to
 buy
  batteries and shaving gel along with it. The goal is to build a model
 which
  will look through POS records and find which product categories have
 higher
  likelihood of appearing together in given a transaction.
 
  What would you recommend?
 
  On Fri, Dec 5, 2014 at 7:21 AM, Sean Owen so...@cloudera.com wrote:
 
  Generally I don't think frequent-item-set algorithms are that useful.
  They're simple and not probabilistic; they don't tell you what sets
  occurred unusually frequently. Usually people ask for frequent item
  set algos when they really mean they want to compute item similarity
  or make recommendations. What's your use case?
 
  On Thu, Dec 4, 2014 at 8:23 PM, Rohit Pujari rpuj...@hortonworks.com
  wrote:
  Sure, I’m looking to perform frequent item set analysis on POS data set.
  Apriori is a classic algorithm used for such tasks. Since Apriori
  implementation is not part of MLLib yet, (see
  https://issues.apache.org/jira/browse/SPARK-4001) What are some other
  options/algorithms I could use to perform a similar task? If there’s no
  spoon to spoon substitute,  spoon to fork will suffice too.
 
  Hopefully this provides some clarification.
 
  Thanks,
  Rohit
 
 
 
  From: Tobias Pfeiffer t...@preferred.jp
  Date: Thursday, December 4, 2014 at 7:20 PM
  To: Rohit Pujari rpuj...@hortonworks.com
  Cc: user@spark.apache.org user@spark.apache.org
  Subject: Re: Market Basket Analysis
 
  Hi,
 
  On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari rpuj...@hortonworks.com
  wrote:
 
  I'd like to do market basket analysis using spark, what're my options?
 
 
  To do it or not to do it ;-)
 
  Seriously, could you elaborate a bit on what you want to know?
 
  Tobias
 
 
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
  to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of
  this message is not the intended recipient, you are hereby notified that
  any
  printing, copying, dissemination, distribution, disclosure or forwarding
  of
  this communication is strictly prohibited. If you have received this
  communication in error, please contact the sender immediately and delete
  it
  from your system. Thank You.
 
 
 
 
  --
  Rohit Pujari
  Solutions Engineer, Hortonworks
  rpuj...@hortonworks.com
  716-430-6899
 
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader of
  this message is not the intended recipient, you are hereby notified that
 any
  printing, copying, dissemination, distribution, disclosure

Market Basket Analysis

2014-12-04 Thread Rohit Pujari
Hello Folks:

I'd like to do market basket analysis using spark, what're my options?

Thanks,
Rohit Pujari
Solutions Architect, Hortonworks

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Market Basket Analysis

2014-12-04 Thread Tobias Pfeiffer
Hi,

On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari rpuj...@hortonworks.com
wrote:

 I'd like to do market basket analysis using spark, what're my options?


To do it or not to do it ;-)

Seriously, could you elaborate a bit on what you want to know?

Tobias


Re: Market Basket Analysis

2014-12-04 Thread Rohit Pujari
Sure, I'm looking to perform frequent item set analysis on POS data set. 
Apriori is a classic algorithm used for such tasks. Since Apriori 
implementation is not part of MLLib yet, (see 
https://issues.apache.org/jira/browse/SPARK-4001) What are some other 
options/algorithms I could use to perform a similar task? If there's no spoon 
to spoon substitute,  spoon to fork will suffice too.

Hopefully this provides some clarification.

Thanks,
Rohit



From: Tobias Pfeiffer t...@preferred.jpmailto:t...@preferred.jp
Date: Thursday, December 4, 2014 at 7:20 PM
To: Rohit Pujari rpuj...@hortonworks.commailto:rpuj...@hortonworks.com
Cc: user@spark.apache.orgmailto:user@spark.apache.org 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Market Basket Analysis

Hi,

On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari 
rpuj...@hortonworks.commailto:rpuj...@hortonworks.com wrote:
I'd like to do market basket analysis using spark, what're my options?

To do it or not to do it ;-)

Seriously, could you elaborate a bit on what you want to know?

Tobias



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.