Re: DBSCAN implementation in Mahout
No there is no dbscan, optics or any other density flavor afaik Sent from my phone. On Nov 28, 2014 11:41 AM, 3316 Chirag Nagpal chiragnagpal_12...@aitpune.edu.in wrote: ? Hello I am Chirag Nagpal, a third year student of Computer Engineering at the University of Pune, India and currently interning at SERC, Indian Institute of Science, Bangalore My work involves using density based clustering algorithms like DBSCAN on geo-referenced data like Tweets. Typically the dataset consists of millions of points. I would like to know if there is any Map Reduce implementation of DBSCAN available. thank you Chirag ?
Re: DBSCAN implementation in Mahout
Hi Dimitry, Thanks for the reply Since Density based clustering algorithms, are being utilised extensively, especially by the GIS research groups, it is a bit sad that there isn't a Map Reduce implementation available.. I think I will propose to write MapReduce code for DBSCAN and OPTICS for GSoC '15. I would like to take your input as to how much of significance would this be of to the community in general? Thanks, Chirag Nagpal University of Pune, India www.chiragnagpal.com From: Dmitriy Lyubimov dlie...@gmail.com Sent: Saturday, November 29, 2014 11:29 PM To: user@mahout.apache.org Subject: Re: DBSCAN implementation in Mahout No there is no dbscan, optics or any other density flavor afaik Sent from my phone. On Nov 28, 2014 11:41 AM, 3316 Chirag Nagpal chiragnagpal_12...@aitpune.edu.in wrote: ? Hello I am Chirag Nagpal, a third year student of Computer Engineering at the University of Pune, India and currently interning at SERC, Indian Institute of Science, Bangalore My work involves using density based clustering algorithms like DBSCAN on geo-referenced data like Tweets. Typically the dataset consists of millions of points. I would like to know if there is any Map Reduce implementation of DBSCAN available. thank you Chirag ?
Re: User based recommender
Can you give me some more details on the Hadoop mapreduce item-based cooccurrence recommender. Best Regards, Yash Patel On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel p...@occamsmachete.com wrote: I built this app with it: https://guide.finderbots.com The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes out of the job it is csv text—therefore language and architecture neutral. I load the data from spark-itemsimilarity into MongoDB using java. Solr is set up for full-text indexing and queries using data from MongoDB. The queries are made to Solr through REST from Ruby UX code. You can replace any component in this stack with whatever you wish and use whatever language you are comfortable with. Alternatively you could modify the UI of Solr or Elasticsearch—both are in Java. If you use any of the other Mahout recommenders they create all recs for all known users so you’ll still need to build a way to serve those results. People often use DBs for this and integrate with their web app framework. On Nov 28, 2014, at 10:03 AM, Yash Patel yashpatel1...@gmail.com wrote: I looked up spark row similarity but i am not sure if it will suit my needs as i want to build my recommender as a java application possibly with an interface. On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel p...@occamsmachete.com wrote: Some references: small free book here, which talks about the general idea: https://www.mapr.com/practical-machine-learning preso, which talks about mixing actions or other indicators: http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ two blog posts: http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/ http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ mahout docs: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html Build Mahout from this source: https://github.com/apache/mahout This will run stand-alone on a dev machine, then if your data is too big for a single machine you can run it on a Spark + Hadoop cluster. The data this creates can be put into a DB or indexed directly by a search engine (Solr or Elasticsearch). Choose the search engine you want then queries of a user’s item id history will go there--results will be an ordered list of item ids to recommend. The core piece is the command line job: “mahout spark-itemsimilarity”, which can parse csv data. The options specify what columns are used for ids. Start out simple by looking only at user and item IDs. Then you can add other cross-cooccurrence indicators for multiple actions later pretty easily. On Nov 28, 2014, at 12:14 AM, Yash Patel yashpatel1...@gmail.com wrote: The mahout + search engine recommender seems what would be best for the data i have. Kindly get back to me at your earliest convenience. Best Regards, Yash Patel On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel p...@occamsmachete.com wrote: Mahout has several recommenders so no need to create one from components. They all make use of the similarity of preferences between users—that’s why they are in the category of collaborative filtering. Primary Mahout Recommenders: 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs for all users. Uses “Mahout IDs 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in the data. Sometimes better for small data sets than #1. Uses “Mahout IDs 3) Mahout + search engine: cooccurrence type. Extremely flexible, works with multiple actions (multi-modal), works for new users that have some history, has a scalable server (from the search engine) but is more difficult to integrate than #1 or #2. Uses your own ids and reads csv files. The rest of the data seems to apply either to the user or the item and so would be used in different ways. #1 an #2 can only use user id and item id but some post recommendation weighting or filtering can be applied. #3 can use multiple attributes in different ways. For instance if category is an item attribute you can create two actions, user-pref-for-an-item, and user-pref-for-a-category. Assuming you want to recommend an item (not category) you can create a cross-ccoccurrence indicator for the second action and use the data to make your item recs better. #3 is the only methods that supports this. Pick a recommender and we can help more with data prep. On Nov 26, 2014, at 1:34 PM, Yash Patel yashpatel1...@gmail.com wrote: Hello everyone, wow i am quite happy to see so many inputs from people. I apologize for not providing more details. Although this is not my complete dataset the fields i have chosen to use are: customer id - numeric item id - text postal code - text item category ´- text potential growth - text territory - text
Re: User based recommender
The Mahout site is a good starting point for using any of the recommenders. http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html On Nov 29, 2014, at 1:33 PM, Yash Patel yashpatel1...@gmail.com wrote: Can you give me some more details on the Hadoop mapreduce item-based cooccurrence recommender. Best Regards, Yash Patel On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel p...@occamsmachete.com wrote: I built this app with it: https://guide.finderbots.com The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes out of the job it is csv text—therefore language and architecture neutral. I load the data from spark-itemsimilarity into MongoDB using java. Solr is set up for full-text indexing and queries using data from MongoDB. The queries are made to Solr through REST from Ruby UX code. You can replace any component in this stack with whatever you wish and use whatever language you are comfortable with. Alternatively you could modify the UI of Solr or Elasticsearch—both are in Java. If you use any of the other Mahout recommenders they create all recs for all known users so you’ll still need to build a way to serve those results. People often use DBs for this and integrate with their web app framework. On Nov 28, 2014, at 10:03 AM, Yash Patel yashpatel1...@gmail.com wrote: I looked up spark row similarity but i am not sure if it will suit my needs as i want to build my recommender as a java application possibly with an interface. On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel p...@occamsmachete.com wrote: Some references: small free book here, which talks about the general idea: https://www.mapr.com/practical-machine-learning preso, which talks about mixing actions or other indicators: http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ two blog posts: http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/ http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ mahout docs: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html Build Mahout from this source: https://github.com/apache/mahout This will run stand-alone on a dev machine, then if your data is too big for a single machine you can run it on a Spark + Hadoop cluster. The data this creates can be put into a DB or indexed directly by a search engine (Solr or Elasticsearch). Choose the search engine you want then queries of a user’s item id history will go there--results will be an ordered list of item ids to recommend. The core piece is the command line job: “mahout spark-itemsimilarity”, which can parse csv data. The options specify what columns are used for ids. Start out simple by looking only at user and item IDs. Then you can add other cross-cooccurrence indicators for multiple actions later pretty easily. On Nov 28, 2014, at 12:14 AM, Yash Patel yashpatel1...@gmail.com wrote: The mahout + search engine recommender seems what would be best for the data i have. Kindly get back to me at your earliest convenience. Best Regards, Yash Patel On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel p...@occamsmachete.com wrote: Mahout has several recommenders so no need to create one from components. They all make use of the similarity of preferences between users—that’s why they are in the category of collaborative filtering. Primary Mahout Recommenders: 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs for all users. Uses “Mahout IDs 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in the data. Sometimes better for small data sets than #1. Uses “Mahout IDs 3) Mahout + search engine: cooccurrence type. Extremely flexible, works with multiple actions (multi-modal), works for new users that have some history, has a scalable server (from the search engine) but is more difficult to integrate than #1 or #2. Uses your own ids and reads csv files. The rest of the data seems to apply either to the user or the item and so would be used in different ways. #1 an #2 can only use user id and item id but some post recommendation weighting or filtering can be applied. #3 can use multiple attributes in different ways. For instance if category is an item attribute you can create two actions, user-pref-for-an-item, and user-pref-for-a-category. Assuming you want to recommend an item (not category) you can create a cross-ccoccurrence indicator for the second action and use the data to make your item recs better. #3 is the only methods that supports this. Pick a recommender and we can help more with data prep. On Nov 26, 2014, at 1:34 PM, Yash Patel yashpatel1...@gmail.com wrote: Hello everyone, wow i am quite happy to see so many inputs from people. I apologize for not providing more details. Although this is not my complete dataset
Re: User based recommender
Thank you for the guidance. I will try building something rough and ask questions if i run into any errors. On Sat, Nov 29, 2014 at 10:38 PM, Pat Ferrel p...@occamsmachete.com wrote: The Mahout site is a good starting point for using any of the recommenders. http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html On Nov 29, 2014, at 1:33 PM, Yash Patel yashpatel1...@gmail.com wrote: Can you give me some more details on the Hadoop mapreduce item-based cooccurrence recommender. Best Regards, Yash Patel On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel p...@occamsmachete.com wrote: I built this app with it: https://guide.finderbots.com The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes out of the job it is csv text—therefore language and architecture neutral. I load the data from spark-itemsimilarity into MongoDB using java. Solr is set up for full-text indexing and queries using data from MongoDB. The queries are made to Solr through REST from Ruby UX code. You can replace any component in this stack with whatever you wish and use whatever language you are comfortable with. Alternatively you could modify the UI of Solr or Elasticsearch—both are in Java. If you use any of the other Mahout recommenders they create all recs for all known users so you’ll still need to build a way to serve those results. People often use DBs for this and integrate with their web app framework. On Nov 28, 2014, at 10:03 AM, Yash Patel yashpatel1...@gmail.com wrote: I looked up spark row similarity but i am not sure if it will suit my needs as i want to build my recommender as a java application possibly with an interface. On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel p...@occamsmachete.com wrote: Some references: small free book here, which talks about the general idea: https://www.mapr.com/practical-machine-learning preso, which talks about mixing actions or other indicators: http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ two blog posts: http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/ http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ mahout docs: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html Build Mahout from this source: https://github.com/apache/mahout This will run stand-alone on a dev machine, then if your data is too big for a single machine you can run it on a Spark + Hadoop cluster. The data this creates can be put into a DB or indexed directly by a search engine (Solr or Elasticsearch). Choose the search engine you want then queries of a user’s item id history will go there--results will be an ordered list of item ids to recommend. The core piece is the command line job: “mahout spark-itemsimilarity”, which can parse csv data. The options specify what columns are used for ids. Start out simple by looking only at user and item IDs. Then you can add other cross-cooccurrence indicators for multiple actions later pretty easily. On Nov 28, 2014, at 12:14 AM, Yash Patel yashpatel1...@gmail.com wrote: The mahout + search engine recommender seems what would be best for the data i have. Kindly get back to me at your earliest convenience. Best Regards, Yash Patel On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel p...@occamsmachete.com wrote: Mahout has several recommenders so no need to create one from components. They all make use of the similarity of preferences between users—that’s why they are in the category of collaborative filtering. Primary Mahout Recommenders: 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs for all users. Uses “Mahout IDs 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in the data. Sometimes better for small data sets than #1. Uses “Mahout IDs 3) Mahout + search engine: cooccurrence type. Extremely flexible, works with multiple actions (multi-modal), works for new users that have some history, has a scalable server (from the search engine) but is more difficult to integrate than #1 or #2. Uses your own ids and reads csv files. The rest of the data seems to apply either to the user or the item and so would be used in different ways. #1 an #2 can only use user id and item id but some post recommendation weighting or filtering can be applied. #3 can use multiple attributes in different ways. For instance if category is an item attribute you can create two actions, user-pref-for-an-item, and user-pref-for-a-category. Assuming you want to recommend an item (not category) you can create a cross-ccoccurrence indicator for the second action and use the data to make your item recs better. #3 is the only methods that supports this. Pick a