FP-Growth clarification for Market Basket Analysis
Hello everyone, We are working to develop an Also-Bought field using Spark FP-Growth. I use the transform function to find products that are sold together most often. When we use the transform function to determine consequents I was wondering, are the predictions order from most to least likely? If not is there a way for me to make sure that they are ordered? Thank you, Aditi Patel -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Market Basket Analysis by deploying FP Growth algorithm
Hi Arun, We have been running into the same issue (having only 1000 unique items, in 100MM transactions), but have not investigated the root cause of this. We decided to run this on a cluster instead (4*16 / 64GB Ram), after which the OOM issue went away. However, we ran into the issue that the FPGrowth implementation starts spilling over to disk, and we had to increase the /tmp partition. Hope it helps. BR, -patrick On 05/04/2017, 10:29, "asethia" <sethia.a...@gmail.com> wrote: Hi, We are currently working on a Market Basket Analysis by deploying FP Growth algorithm on Spark to generate association rules for product recommendation. We are running on close to 24 million invoices over an assortment of more than 100k products. However, whenever we relax the support threshold below a certain level, the stack overflows. We are using Spark 1.6.2 but can somehow invoke 1.6.3 to counter this error. The problem though is even when we invoke Spark 1.6.3 and increase the stack size to 100M we are running out of memory. We believe the tree grows exponentially and is stored in memory which causes this problem. Can anyone suggest a solution to this issue please? Thanks Arun -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Market-Basket-Analysis-by-deploying-FP-Growth-algorithm-tp28569.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Market Basket Analysis by deploying FP Growth algorithm
Hi, We are currently working on a Market Basket Analysis by deploying FP Growth algorithm on Spark to generate association rules for product recommendation. We are running on close to 24 million invoices over an assortment of more than 100k products. However, whenever we relax the support threshold below a certain level, the stack overflows. We are using Spark 1.6.2 but can somehow invoke 1.6.3 to counter this error. The problem though is even when we invoke Spark 1.6.3 and increase the stack size to 100M we are running out of memory. We believe the tree grows exponentially and is stored in memory which causes this problem. Can anyone suggest a solution to this issue please? Thanks Arun -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Market-Basket-Analysis-by-deploying-FP-Growth-algorithm-tp28569.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Market Basket Analysis
Generally I don't think frequent-item-set algorithms are that useful. They're simple and not probabilistic; they don't tell you what sets occurred unusually frequently. Usually people ask for frequent item set algos when they really mean they want to compute item similarity or make recommendations. What's your use case? On Thu, Dec 4, 2014 at 8:23 PM, Rohit Pujari rpuj...@hortonworks.com wrote: Sure, I’m looking to perform frequent item set analysis on POS data set. Apriori is a classic algorithm used for such tasks. Since Apriori implementation is not part of MLLib yet, (see https://issues.apache.org/jira/browse/SPARK-4001) What are some other options/algorithms I could use to perform a similar task? If there’s no spoon to spoon substitute, spoon to fork will suffice too. Hopefully this provides some clarification. Thanks, Rohit From: Tobias Pfeiffer t...@preferred.jp Date: Thursday, December 4, 2014 at 7:20 PM To: Rohit Pujari rpuj...@hortonworks.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: Market Basket Analysis Hi, On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari rpuj...@hortonworks.com wrote: I'd like to do market basket analysis using spark, what're my options? To do it or not to do it ;-) Seriously, could you elaborate a bit on what you want to know? Tobias CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Market Basket Analysis
This is a typical use case people who buy electric razors, also tend to buy batteries and shaving gel along with it. The goal is to build a model which will look through POS records and find which product categories have higher likelihood of appearing together in given a transaction. What would you recommend? On Fri, Dec 5, 2014 at 7:21 AM, Sean Owen so...@cloudera.com wrote: Generally I don't think frequent-item-set algorithms are that useful. They're simple and not probabilistic; they don't tell you what sets occurred unusually frequently. Usually people ask for frequent item set algos when they really mean they want to compute item similarity or make recommendations. What's your use case? On Thu, Dec 4, 2014 at 8:23 PM, Rohit Pujari rpuj...@hortonworks.com wrote: Sure, I’m looking to perform frequent item set analysis on POS data set. Apriori is a classic algorithm used for such tasks. Since Apriori implementation is not part of MLLib yet, (see https://issues.apache.org/jira/browse/SPARK-4001) What are some other options/algorithms I could use to perform a similar task? If there’s no spoon to spoon substitute, spoon to fork will suffice too. Hopefully this provides some clarification. Thanks, Rohit From: Tobias Pfeiffer t...@preferred.jp Date: Thursday, December 4, 2014 at 7:20 PM To: Rohit Pujari rpuj...@hortonworks.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: Market Basket Analysis Hi, On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari rpuj...@hortonworks.com wrote: I'd like to do market basket analysis using spark, what're my options? To do it or not to do it ;-) Seriously, could you elaborate a bit on what you want to know? Tobias CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Rohit Pujari Solutions Engineer, Hortonworks rpuj...@hortonworks.com 716-430-6899 -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Market Basket Analysis
Apriori can be thought as a post-processing on product similarity graph...I call it product similarity but for each product you build a node which keeps distinct users visiting the product and two product nodes are connected by an edge if the intersection 0...you are assuming if no one user visits a keyword, he is not going to visit it in the future...this graph is not for prediction but only keeps user visits... Anyway once you have build this graph on graphx, you can do interesting path based analysis...Pick a product and trace it's fanout to see once people bought this product, which other product they bought etc etc..A first stab at the analysis is to calculate the product similarities... You can also generate naturally occurring cluster of products but then you are partitioning the graph using spectral or other graph partitioners like METIS...Even the adhoc analysis of product graph will give lot of useful insights (hopefully deeper than apriori)... On Fri, Dec 5, 2014 at 12:25 PM, Sean Owen so...@cloudera.com wrote: I doubt Amazon uses a priori for this, but who knows. Usually you want also bought functionality, which is a form of similar-item computation. But you don't want to favor items that are simply frequently purchased in general. You probably want to look at pairs of items that co-occur in purchase histories unusually frequently by looking at (log) likelihood ratios, which is a straightforward item similarity computation. On Fri, Dec 5, 2014 at 11:43 AM, Ashic Mahtab as...@live.com wrote: This can definitely be useful. Frequently bought together is something amazon does, though surprisingly, you don't get a discount. Perhaps it can lead to offering (or avoiding!) deals on frequent itemsets. This is a good resource for frequent itemsets implementations: http://infolab.stanford.edu/~ullman/mmds/ch6.pdf From: rpuj...@hortonworks.com Date: Fri, 5 Dec 2014 10:31:17 -0600 Subject: Re: Market Basket Analysis To: so...@cloudera.com CC: t...@preferred.jp; user@spark.apache.org This is a typical use case people who buy electric razors, also tend to buy batteries and shaving gel along with it. The goal is to build a model which will look through POS records and find which product categories have higher likelihood of appearing together in given a transaction. What would you recommend? On Fri, Dec 5, 2014 at 7:21 AM, Sean Owen so...@cloudera.com wrote: Generally I don't think frequent-item-set algorithms are that useful. They're simple and not probabilistic; they don't tell you what sets occurred unusually frequently. Usually people ask for frequent item set algos when they really mean they want to compute item similarity or make recommendations. What's your use case? On Thu, Dec 4, 2014 at 8:23 PM, Rohit Pujari rpuj...@hortonworks.com wrote: Sure, I’m looking to perform frequent item set analysis on POS data set. Apriori is a classic algorithm used for such tasks. Since Apriori implementation is not part of MLLib yet, (see https://issues.apache.org/jira/browse/SPARK-4001) What are some other options/algorithms I could use to perform a similar task? If there’s no spoon to spoon substitute, spoon to fork will suffice too. Hopefully this provides some clarification. Thanks, Rohit From: Tobias Pfeiffer t...@preferred.jp Date: Thursday, December 4, 2014 at 7:20 PM To: Rohit Pujari rpuj...@hortonworks.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: Market Basket Analysis Hi, On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari rpuj...@hortonworks.com wrote: I'd like to do market basket analysis using spark, what're my options? To do it or not to do it ;-) Seriously, could you elaborate a bit on what you want to know? Tobias CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Rohit Pujari Solutions Engineer, Hortonworks rpuj...@hortonworks.com 716-430-6899 CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure
Market Basket Analysis
Hello Folks: I'd like to do market basket analysis using spark, what're my options? Thanks, Rohit Pujari Solutions Architect, Hortonworks -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Market Basket Analysis
Hi, On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari rpuj...@hortonworks.com wrote: I'd like to do market basket analysis using spark, what're my options? To do it or not to do it ;-) Seriously, could you elaborate a bit on what you want to know? Tobias
Re: Market Basket Analysis
Sure, I'm looking to perform frequent item set analysis on POS data set. Apriori is a classic algorithm used for such tasks. Since Apriori implementation is not part of MLLib yet, (see https://issues.apache.org/jira/browse/SPARK-4001) What are some other options/algorithms I could use to perform a similar task? If there's no spoon to spoon substitute, spoon to fork will suffice too. Hopefully this provides some clarification. Thanks, Rohit From: Tobias Pfeiffer t...@preferred.jpmailto:t...@preferred.jp Date: Thursday, December 4, 2014 at 7:20 PM To: Rohit Pujari rpuj...@hortonworks.commailto:rpuj...@hortonworks.com Cc: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Market Basket Analysis Hi, On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari rpuj...@hortonworks.commailto:rpuj...@hortonworks.com wrote: I'd like to do market basket analysis using spark, what're my options? To do it or not to do it ;-) Seriously, could you elaborate a bit on what you want to know? Tobias CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.