date:20141129

Re: DBSCAN implementation in Mahout

2014-11-29 Thread Dmitriy Lyubimov

No there is no dbscan, optics or any other density flavor afaik

Sent from my phone.
On Nov 28, 2014 11:41 AM, 3316 Chirag Nagpal 
chiragnagpal_12...@aitpune.edu.in wrote:

 ?

 Hello
 I am Chirag Nagpal, a third year student of Computer Engineering at the
 University of Pune, India and currently interning at SERC, Indian Institute
 of Science, Bangalore

 My work involves using density based clustering algorithms like DBSCAN on
 geo-referenced data like Tweets. Typically the dataset consists of millions
 of points. I would like to know if there is any Map Reduce implementation
 of DBSCAN available.

 thank you
 Chirag ?

Re: DBSCAN implementation in Mahout

2014-11-29 Thread 3316 Chirag Nagpal

Hi Dimitry,

Thanks for the reply

Since Density based clustering algorithms, are being utilised extensively, 
especially by the GIS research groups, it is a bit sad that there isn't a Map 
Reduce implementation available.. 

I think I will propose to write MapReduce code for DBSCAN and OPTICS for GSoC 
'15.

I would like to take your input as to how much of significance would this be of 
to the community in general? 

Thanks,

Chirag Nagpal
University of Pune, India
www.chiragnagpal.com

From: Dmitriy Lyubimov dlie...@gmail.com
Sent: Saturday, November 29, 2014 11:29 PM
To: user@mahout.apache.org
Subject: Re: DBSCAN implementation in Mahout

No there is no dbscan, optics or any other density flavor afaik

Sent from my phone.
On Nov 28, 2014 11:41 AM, 3316 Chirag Nagpal 
chiragnagpal_12...@aitpune.edu.in wrote:

 ?

 Hello
 I am Chirag Nagpal, a third year student of Computer Engineering at the
 University of Pune, India and currently interning at SERC, Indian Institute
 of Science, Bangalore

 My work involves using density based clustering algorithms like DBSCAN on
 geo-referenced data like Tweets. Typically the dataset consists of millions
 of points. I would like to know if there is any Map Reduce implementation
 of DBSCAN available.

 thank you
 Chirag ?

Re: User based recommender

2014-11-29 Thread Yash Patel

Can you give me some more details on the Hadoop mapreduce item-based
cooccurrence recommender.

Best Regards,
Yash Patel

On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel p...@occamsmachete.com wrote:

I built this app with it: https://guide.finderbots.com

The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
out of the job it is csv text—therefore language and architecture neutral.
I load the data from spark-itemsimilarity into MongoDB using java. Solr is
set up for full-text indexing and queries using data from MongoDB. The
queries are made to Solr through REST from Ruby UX code. You can replace
any component in this stack with whatever you wish and use whatever
language you are comfortable with.

Alternatively you could modify the UI of Solr or Elasticsearch—both are in
Java.

If you use any of the other Mahout recommenders they create all recs for
all known users so you’ll still need to build a way to serve those results.
People often use DBs for this and integrate with their web app framework.

On Nov 28, 2014, at 10:03 AM, Yash Patel yashpatel1...@gmail.com wrote:

I looked up spark row similarity but i am not sure if it will suit my needs
as i want to build my recommender as a java application possibly with an
interface.

On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel p...@occamsmachete.com wrote:

Some references:

small free book here, which talks about the general idea:
https://www.mapr.com/practical-machine-learning
preso, which talks about mixing actions or other indicators:

http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
two blog posts:

http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/

http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
mahout docs:
http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html

Build Mahout from this source: https://github.com/apache/mahout This
will
run stand-alone on a dev machine, then if your data is too big for a
single
machine you can run it on a Spark + Hadoop cluster. The data this creates
can be put into a DB or indexed directly by a search engine (Solr or
Elasticsearch). Choose the search engine you want then queries of a
user’s
item id history will go there--results will be an ordered list of item
ids
to recommend.

The core piece is the command line job: “mahout spark-itemsimilarity”,
which can parse csv data. The options specify what columns are used for
ids.

Start out simple by looking only at user and item IDs. Then you can add
other cross-cooccurrence indicators for multiple actions later pretty
easily.

On Nov 28, 2014, at 12:14 AM, Yash Patel yashpatel1...@gmail.com
wrote:

The mahout + search engine recommender seems what would be best for the
data i have.

Kindly get back to me at your earliest convenience.

Best Regards,
Yash Patel

On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel p...@occamsmachete.com
wrote:

Mahout has several recommenders so no need to create one from
components.
They all make use of the similarity of preferences between users—that’s
why
they are in the category of collaborative filtering.

Primary Mahout Recommenders:
1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
recs
for all users. Uses “Mahout IDs
2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
the data. Sometimes better for small data sets than #1. Uses “Mahout
IDs
3) Mahout + search engine: cooccurrence type. Extremely flexible, works
with multiple actions (multi-modal), works for new users that have some
history, has a scalable server (from the search engine) but is more
difficult to integrate than #1 or #2. Uses your own ids and reads csv
files.

The rest of the data seems to apply either to the user or the item and
so
would be used in different ways. #1 an #2 can only use user id and item
id
but some post recommendation weighting or filtering can be applied. #3
can
use multiple attributes in different ways. For instance if category is
an
item attribute you can create two actions, user-pref-for-an-item, and
user-pref-for-a-category. Assuming you want to recommend an item (not
category) you can create a cross-ccoccurrence indicator for the second
action and use the data to make your item recs better. #3 is the only
methods that supports this.

Pick a recommender and we can help more with data prep.

On Nov 26, 2014, at 1:34 PM, Yash Patel yashpatel1...@gmail.com
wrote:

Hello everyone,

wow i am quite happy to see so many inputs from people.

I apologize for not providing more details.

Although this is not my complete dataset the fields i have chosen to use
are:

customer id - numeric
item id - text
postal code - text
item category ´- text
potential growth - text
territory - text

Re: User based recommender

2014-11-29 Thread Pat Ferrel

The Mahout site is a good starting point for using any of the recommenders.

http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html

On Nov 29, 2014, at 1:33 PM, Yash Patel yashpatel1...@gmail.com wrote: