Re: DBSCAN implementation in Mahout

2014-11-29 Thread Dmitriy Lyubimov
No there is no dbscan, optics or any other density flavor afaik

Sent from my phone.
On Nov 28, 2014 11:41 AM, 3316 Chirag Nagpal 
chiragnagpal_12...@aitpune.edu.in wrote:

 ?

 Hello
 I am Chirag Nagpal, a third year student of Computer Engineering at the
 University of Pune, India and currently interning at SERC, Indian Institute
 of Science, Bangalore

 My work involves using density based clustering algorithms like DBSCAN on
 geo-referenced data like Tweets. Typically the dataset consists of millions
 of points. I would like to know if there is any Map Reduce implementation
 of DBSCAN available.

 thank you
 Chirag ?



Re: DBSCAN implementation in Mahout

2014-11-29 Thread 3316 Chirag Nagpal
Hi Dimitry,

Thanks for the reply

Since Density based clustering algorithms, are being utilised extensively, 
especially by the GIS research groups, it is a bit sad that there isn't a Map 
Reduce implementation available.. 

I think I will propose to write MapReduce code for DBSCAN and OPTICS for GSoC 
'15.

I would like to take your input as to how much of significance would this be of 
to the community in general? 

Thanks,

Chirag Nagpal
University of Pune, India
www.chiragnagpal.com

From: Dmitriy Lyubimov dlie...@gmail.com
Sent: Saturday, November 29, 2014 11:29 PM
To: user@mahout.apache.org
Subject: Re: DBSCAN implementation in Mahout

No there is no dbscan, optics or any other density flavor afaik

Sent from my phone.
On Nov 28, 2014 11:41 AM, 3316 Chirag Nagpal 
chiragnagpal_12...@aitpune.edu.in wrote:

 ?

 Hello
 I am Chirag Nagpal, a third year student of Computer Engineering at the
 University of Pune, India and currently interning at SERC, Indian Institute
 of Science, Bangalore

 My work involves using density based clustering algorithms like DBSCAN on
 geo-referenced data like Tweets. Typically the dataset consists of millions
 of points. I would like to know if there is any Map Reduce implementation
 of DBSCAN available.

 thank you
 Chirag ?



Re: User based recommender

2014-11-29 Thread Yash Patel
Can you give me some more details on the Hadoop mapreduce item-based
cooccurrence recommender.


Best Regards,
Yash Patel

On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel p...@occamsmachete.com wrote:

 I built this app with it: https://guide.finderbots.com

 The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
 out of the job it is csv text—therefore language and architecture neutral.
 I load the data from spark-itemsimilarity into MongoDB using java. Solr is
 set up for full-text indexing and queries using data from MongoDB. The
 queries are made to Solr through REST from Ruby UX code. You can replace
 any component in this stack with whatever you wish and use whatever
 language you are comfortable with.

 Alternatively you could modify the UI of Solr or Elasticsearch—both are in
 Java.

 If you use any of the other Mahout recommenders they create all recs for
 all known users so you’ll still need to build a way to serve those results.
 People often use DBs for this and integrate with their web app framework.

 On Nov 28, 2014, at 10:03 AM, Yash Patel yashpatel1...@gmail.com wrote:

 I looked up spark row similarity but i am not sure if it will suit my needs
 as i want to build my recommender as a java application possibly with an
 interface.


 On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel p...@occamsmachete.com wrote:

  Some references:
 
  small free book here, which talks about the general idea:
  https://www.mapr.com/practical-machine-learning
  preso, which talks about mixing actions or other indicators:
 
 http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
  two blog posts:
 
 http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
 
 http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
  mahout docs:
  http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
 
  Build Mahout from this source: https://github.com/apache/mahout This
 will
  run stand-alone on a dev machine, then if your data is too big for a
 single
  machine you can run it on a Spark + Hadoop cluster. The data this creates
  can be put into a DB or indexed directly by a search engine (Solr or
  Elasticsearch). Choose the search engine you want then queries of a
 user’s
  item id history will go there--results will be an ordered list of item
 ids
  to recommend.
 
  The core piece is the command line job: “mahout spark-itemsimilarity”,
  which can parse csv data. The options specify what columns are used for
 ids.
 
  Start out simple by looking only at user and item IDs. Then you can add
  other cross-cooccurrence indicators for multiple actions later pretty
  easily.
 
 
  On Nov 28, 2014, at 12:14 AM, Yash Patel yashpatel1...@gmail.com
 wrote:
 
  The mahout + search engine recommender seems what would be best for the
  data i have.
 
  Kindly get back to me at your earliest convenience.
 
 
 
  Best Regards,
  Yash Patel
 
  On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel p...@occamsmachete.com
 wrote:
 
  Mahout has several recommenders so no need to create one from
 components.
  They all make use of the similarity of preferences between users—that’s
  why
  they are in the category of collaborative filtering.
 
  Primary Mahout Recommenders:
  1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
 recs
  for all users. Uses “Mahout IDs
  2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
  the data. Sometimes better for small data sets than #1. Uses “Mahout
 IDs
  3) Mahout + search engine: cooccurrence type. Extremely flexible, works
  with multiple actions (multi-modal), works for new users that have some
  history, has a scalable server (from the search engine) but is more
  difficult to integrate than #1 or #2. Uses your own ids and reads csv
  files.
 
  The rest of the data seems to apply either to the user or the item and
 so
  would be used in different ways. #1 an #2 can only use user id and item
  id
  but some post recommendation weighting or filtering can be applied. #3
  can
  use multiple attributes in different ways. For instance if category is
 an
  item attribute you can create two actions, user-pref-for-an-item, and
  user-pref-for-a-category. Assuming you want to recommend an item (not
  category) you can create a cross-ccoccurrence indicator for the second
  action and use the data to make your item recs better. #3 is the only
  methods that supports this.
 
  Pick a recommender and we can help more with data prep.
 
 
  On Nov 26, 2014, at 1:34 PM, Yash Patel yashpatel1...@gmail.com
 wrote:
 
  Hello everyone,
 
  wow i am quite happy to see so many inputs from people.
 
  I apologize for not providing more details.
 
  Although this is not my complete dataset the fields i have chosen to use
  are:
 
  customer id - numeric
  item id - text
  postal code - text
  item category ´- text
  potential growth - text
  territory - text
 

Re: User based recommender

2014-11-29 Thread Pat Ferrel
The Mahout site is a good starting point for using any of the recommenders.

http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html

On Nov 29, 2014, at 1:33 PM, Yash Patel yashpatel1...@gmail.com wrote:

Can you give me some more details on the Hadoop mapreduce item-based
cooccurrence recommender.


Best Regards,
Yash Patel

On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel p...@occamsmachete.com wrote:

 I built this app with it: https://guide.finderbots.com
 
 The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
 out of the job it is csv text—therefore language and architecture neutral.
 I load the data from spark-itemsimilarity into MongoDB using java. Solr is
 set up for full-text indexing and queries using data from MongoDB. The
 queries are made to Solr through REST from Ruby UX code. You can replace
 any component in this stack with whatever you wish and use whatever
 language you are comfortable with.
 
 Alternatively you could modify the UI of Solr or Elasticsearch—both are in
 Java.
 
 If you use any of the other Mahout recommenders they create all recs for
 all known users so you’ll still need to build a way to serve those results.
 People often use DBs for this and integrate with their web app framework.
 
 On Nov 28, 2014, at 10:03 AM, Yash Patel yashpatel1...@gmail.com wrote:
 
 I looked up spark row similarity but i am not sure if it will suit my needs
 as i want to build my recommender as a java application possibly with an
 interface.
 
 
 On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel p...@occamsmachete.com wrote:
 
 Some references:
 
 small free book here, which talks about the general idea:
 https://www.mapr.com/practical-machine-learning
 preso, which talks about mixing actions or other indicators:
 
 http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
 two blog posts:
 
 http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
 
 http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
 mahout docs:
 http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
 
 Build Mahout from this source: https://github.com/apache/mahout This
 will
 run stand-alone on a dev machine, then if your data is too big for a
 single
 machine you can run it on a Spark + Hadoop cluster. The data this creates
 can be put into a DB or indexed directly by a search engine (Solr or
 Elasticsearch). Choose the search engine you want then queries of a
 user’s
 item id history will go there--results will be an ordered list of item
 ids
 to recommend.
 
 The core piece is the command line job: “mahout spark-itemsimilarity”,
 which can parse csv data. The options specify what columns are used for
 ids.
 
 Start out simple by looking only at user and item IDs. Then you can add
 other cross-cooccurrence indicators for multiple actions later pretty
 easily.
 
 
 On Nov 28, 2014, at 12:14 AM, Yash Patel yashpatel1...@gmail.com
 wrote:
 
 The mahout + search engine recommender seems what would be best for the
 data i have.
 
 Kindly get back to me at your earliest convenience.
 
 
 
 Best Regards,
 Yash Patel
 
 On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel p...@occamsmachete.com
 wrote:
 
 Mahout has several recommenders so no need to create one from
 components.
 They all make use of the similarity of preferences between users—that’s
 why
 they are in the category of collaborative filtering.
 
 Primary Mahout Recommenders:
 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
 recs
 for all users. Uses “Mahout IDs
 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
 the data. Sometimes better for small data sets than #1. Uses “Mahout
 IDs
 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
 with multiple actions (multi-modal), works for new users that have some
 history, has a scalable server (from the search engine) but is more
 difficult to integrate than #1 or #2. Uses your own ids and reads csv
 files.
 
 The rest of the data seems to apply either to the user or the item and
 so
 would be used in different ways. #1 an #2 can only use user id and item
 id
 but some post recommendation weighting or filtering can be applied. #3
 can
 use multiple attributes in different ways. For instance if category is
 an
 item attribute you can create two actions, user-pref-for-an-item, and
 user-pref-for-a-category. Assuming you want to recommend an item (not
 category) you can create a cross-ccoccurrence indicator for the second
 action and use the data to make your item recs better. #3 is the only
 methods that supports this.
 
 Pick a recommender and we can help more with data prep.
 
 
 On Nov 26, 2014, at 1:34 PM, Yash Patel yashpatel1...@gmail.com
 wrote:
 
 Hello everyone,
 
 wow i am quite happy to see so many inputs from people.
 
 I apologize for not providing more details.
 
 Although this is not my complete dataset 

Re: User based recommender

2014-11-29 Thread Yash Patel
Thank you for the guidance.

I will try building something rough and ask questions if i run into any
errors.




On Sat, Nov 29, 2014 at 10:38 PM, Pat Ferrel p...@occamsmachete.com wrote:

 The Mahout site is a good starting point for using any of the recommenders.

 http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html

 On Nov 29, 2014, at 1:33 PM, Yash Patel yashpatel1...@gmail.com wrote:

 Can you give me some more details on the Hadoop mapreduce item-based
 cooccurrence recommender.


 Best Regards,
 Yash Patel

 On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel p...@occamsmachete.com wrote:

  I built this app with it: https://guide.finderbots.com
 
  The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
  out of the job it is csv text—therefore language and architecture
 neutral.
  I load the data from spark-itemsimilarity into MongoDB using java. Solr
 is
  set up for full-text indexing and queries using data from MongoDB. The
  queries are made to Solr through REST from Ruby UX code. You can replace
  any component in this stack with whatever you wish and use whatever
  language you are comfortable with.
 
  Alternatively you could modify the UI of Solr or Elasticsearch—both are
 in
  Java.
 
  If you use any of the other Mahout recommenders they create all recs for
  all known users so you’ll still need to build a way to serve those
 results.
  People often use DBs for this and integrate with their web app framework.
 
  On Nov 28, 2014, at 10:03 AM, Yash Patel yashpatel1...@gmail.com
 wrote:
 
  I looked up spark row similarity but i am not sure if it will suit my
 needs
  as i want to build my recommender as a java application possibly with an
  interface.
 
 
  On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel p...@occamsmachete.com
 wrote:
 
  Some references:
 
  small free book here, which talks about the general idea:
  https://www.mapr.com/practical-machine-learning
  preso, which talks about mixing actions or other indicators:
 
 
 http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
  two blog posts:
 
 
 http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
 
 
 http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
  mahout docs:
 
 http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
 
  Build Mahout from this source: https://github.com/apache/mahout This
  will
  run stand-alone on a dev machine, then if your data is too big for a
  single
  machine you can run it on a Spark + Hadoop cluster. The data this
 creates
  can be put into a DB or indexed directly by a search engine (Solr or
  Elasticsearch). Choose the search engine you want then queries of a
  user’s
  item id history will go there--results will be an ordered list of item
  ids
  to recommend.
 
  The core piece is the command line job: “mahout spark-itemsimilarity”,
  which can parse csv data. The options specify what columns are used for
  ids.
 
  Start out simple by looking only at user and item IDs. Then you can add
  other cross-cooccurrence indicators for multiple actions later pretty
  easily.
 
 
  On Nov 28, 2014, at 12:14 AM, Yash Patel yashpatel1...@gmail.com
  wrote:
 
  The mahout + search engine recommender seems what would be best for the
  data i have.
 
  Kindly get back to me at your earliest convenience.
 
 
 
  Best Regards,
  Yash Patel
 
  On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel p...@occamsmachete.com
  wrote:
 
  Mahout has several recommenders so no need to create one from
  components.
  They all make use of the similarity of preferences between users—that’s
  why
  they are in the category of collaborative filtering.
 
  Primary Mahout Recommenders:
  1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
  recs
  for all users. Uses “Mahout IDs
  2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise
 in
  the data. Sometimes better for small data sets than #1. Uses “Mahout
  IDs
  3) Mahout + search engine: cooccurrence type. Extremely flexible, works
  with multiple actions (multi-modal), works for new users that have some
  history, has a scalable server (from the search engine) but is more
  difficult to integrate than #1 or #2. Uses your own ids and reads csv
  files.
 
  The rest of the data seems to apply either to the user or the item and
  so
  would be used in different ways. #1 an #2 can only use user id and item
  id
  but some post recommendation weighting or filtering can be applied. #3
  can
  use multiple attributes in different ways. For instance if category is
  an
  item attribute you can create two actions, user-pref-for-an-item, and
  user-pref-for-a-category. Assuming you want to recommend an item (not
  category) you can create a cross-ccoccurrence indicator for the second
  action and use the data to make your item recs better. #3 is the only
  methods that supports this.
 
  Pick a