Re: User based recommender

2014-12-04 Thread Ted Dunning
On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel yashpatel1...@gmail.com wrote:

 I have multiple different columns such as category,shipping location,item
 price,online user, etc.

 How can i use all these different columns and improve recommendation
 quality(ie.calculate more precise similarity between users by use of
 location,item price) ?


For some kinds of information, you can build cross recommenders off of that
other information.  That incorporates this other information in an
item-based system.

Simply hand coding a similarity usually doesn't work well.  The problem is
that you don't really know which factors really represent actionable and
non-redundant user similarity.


Re: User based recommender

2014-12-04 Thread Yash Patel
Calculating similarity using multiple column values is what i thought,I
looked throught the example but there is just some mention of use of
content filtering implemented but not explicitly.

Can you guide me to a working example or do i need to use
algorithms for classifiers or clustering?
Also if i have to can i implement the results using the recommenders
provided in mahout.


Best Regards,
Yash Patel

On Wed, Dec 3, 2014 at 3:43 PM, parnab kumar parnab.2...@gmail.com wrote:

 1. why not use the other columns as evidences and come up with a preference
 score  UID ITEMID PREF_SCORE and see if results improve.. May be you
 use  other machine learning algorithms to generate such preference
 scores...


 2. one other way may be to implement a custom similarity Score and not the
 ones that ships with mahout  where you can use this column values to decide
 on the similarity of the users. Kindly have a look at mahout in action.
 There is an example for dating recommendation. This problem of yours
 USERID,ITEMID can mapped back to the same problem mentioned. try
 implementing the similarity score using the other column values.


 May be some  expert in this area  can come up with a better solution...if i
 were you i would certainly test the waters like the way i mentioned...

 Parnab...
 CSE, IIT Kharagur
 BIS, University College Cork



Few Questions related Mahout used for Text Clustering

2014-12-04 Thread Viral Parikh
Hi Mahout Users!



Firstly, this community is great and appreciate all the Q  A back and
forth!



I am currently working on Text Clustering and I am using Mahout and
Clustering algorithms (kmeans, krunner, canopy etc) for that.



If anyone has worked on a similar project please let me know. I have a 2
questions as below –



1. In order to choose optimal K, I am running krunner across my vectorized
dataset. In order to choose the right “k”, I am trying to understand the
spread of my observations across all clusters and minimize cluster 1 (which
apparently looks like the catch-all bucket – can anyone confirm?), but I am
observing the final count varies depending on k. See below (please ignore
the blank cells) –



Any idea why the final count varies depending on chosen k?



 [image: Inline image 1]



2. Another thing I noticed, some of my clusters have just n=1 observation?
That doesn’t make sense to me. Is there a way to avoid this, any particular
parameter selection I can tweak?



Thank you and looking forward to your reply.





Cheers,

Viral


Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
Hi All,
   I have been trying mahout clustering  on unstructured data i.e human
written data . I have tried mahout clustering algorithms like
Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .

i see the problem is with the way data is written , Can some one please
provide me some pointers on how to proceed with unstructured data  for
clustering.


i have written and analyzer that uses lower-Case and stop-words filter also
.

thanks :)


Regards,
Shaikh Shahid G .
+91 9503954781


Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Donni Khan
Hi
it depends on the nature of data you are clustering. If you have knowledge
about your data, you can figure out the results and you can also set the
correct parameters to the clustering algorithm like number of topics or
number of clusters.

Cheers,
Donni

On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh shaikhshah...@gmail.com
wrote:

 Hi All,
I have been trying mahout clustering  on unstructured data i.e human
 written data . I have tried mahout clustering algorithms like
 Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .

 i see the problem is with the way data is written , Can some one please
 provide me some pointers on how to proceed with unstructured data  for
 clustering.


 i have written and analyzer that uses lower-Case and stop-words filter also
 .

 thanks :)


 Regards,
 Shaikh Shahid G .
 +91 9503954781



Re: User based recommender

2014-12-04 Thread Yash Patel
Cross Recommendors dont seem applicable because this dataset doesn't
represent different actions by a user,it just contains transaction
history.(ie.customer id,item id,shipping location,sales amount of that
item,item category etc)

Maybe location,sales per item(similarity might lead to knowledge of people
who share same purchasing patterns) etc.


On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel yashpatel1...@gmail.com
 wrote:

  I have multiple different columns such as category,shipping location,item
  price,online user, etc.
 
  How can i use all these different columns and improve recommendation
  quality(ie.calculate more precise similarity between users by use of
  location,item price) ?
 

 For some kinds of information, you can build cross recommenders off of that
 other information.  That incorporates this other information in an
 item-based system.

 Simply hand coding a similarity usually doesn't work well.  The problem is
 that you don't really know which factors really represent actionable and
 non-redundant user similarity.



Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
Hey Donni thanks but I have used the configurations and obtained the
clusters .the results are not promising enough . I was looking if there are
any known technics I can follow specifically while generating vectors .

Thanks

On Thursday, December 4, 2014, Donni Khan prince.don...@googlemail.com
wrote:
 Hi
 it depends on the nature of data you are clustering. If you have knowledge
 about your data, you can figure out the results and you can also set the
 correct parameters to the clustering algorithm like number of topics or
 number of clusters.

 Cheers,
 Donni

 On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh shaikhshah...@gmail.com
 wrote:

 Hi All,
I have been trying mahout clustering  on unstructured data i.e human
 written data . I have tried mahout clustering algorithms like
 Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .

 i see the problem is with the way data is written , Can some one please
 provide me some pointers on how to proceed with unstructured data  for
 clustering.


 i have written and analyzer that uses lower-Case and stop-words filter
also
 .

 thanks :)


 Regards,
 Shaikh Shahid G .
 +91 9503954781



-- 
Regards,
Shaikh Shahid G .
+91 9503954781


Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Brian Dolan
My experience has been that it's best to leave the data processing for Python.  
I strongly suggest you re-write your ETL and let Mahout only do the clustering. 
The built-in vectorization routines are fairly primitive.

Then I would wash the features, basically set up your own list of stop words or 
phrases, before you let Mahout do anything.

On Dec 4, 2014, at 8:38 AM, Shahid Shaikh shaikhshah...@gmail.com wrote:

 Hey Donni thanks but I have used the configurations and obtained the
 clusters .the results are not promising enough . I was looking if there are
 any known technics I can follow specifically while generating vectors .
 
 Thanks
 
 On Thursday, December 4, 2014, Donni Khan prince.don...@googlemail.com
 wrote:
 Hi
 it depends on the nature of data you are clustering. If you have knowledge
 about your data, you can figure out the results and you can also set the
 correct parameters to the clustering algorithm like number of topics or
 number of clusters.
 
 Cheers,
 Donni
 
 On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh shaikhshah...@gmail.com
 wrote:
 
 Hi All,
   I have been trying mahout clustering  on unstructured data i.e human
 written data . I have tried mahout clustering algorithms like
 Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .
 
 i see the problem is with the way data is written , Can some one please
 provide me some pointers on how to proceed with unstructured data  for
 clustering.
 
 
 i have written and analyzer that uses lower-Case and stop-words filter
 also
 .
 
 thanks :)
 
 
 Regards,
 Shaikh Shahid G .
 +91 9503954781
 
 
 
 -- 
 Regards,
 Shaikh Shahid G .
 +91 9503954781



Re: User based recommender

2014-12-04 Thread Pat Ferrel
User1 purchases = infant car seat, infant stroller
User2 purchases = infant car seat, infant stroller, infant crib mobile

The obvious recommendation for User1 is an infant crib mobile. From the 
purchase history the users look similar. Here similarity is in “taste”. User or 
item information that does not relate to taste may be misleading for recs. If 
you look at their profiles:

User1: male, 55 years old, upper 75% income
User2: female, 29 years old, lower 25% income

User1 is actually a doting grandfather, User2 a doting mother. Their profiles 
are quite dissimilar though their taste is similar. 

The point being that those other pieces of data may not relate to user 
similarity *of taste*. Going through the cross-recommendation process applies 
cooccurrence analysis to the data that checks to see if the secondary data 
correlates in an important way with the action you know is important.

For this reason it’s usually best to start out ignoring that information and 
using just UID ITEMID for the important action.

Later you may find uses for the extra data, or may consider viewing or 
purchasing from a certain category as a secondary action and use 
cross-recommendations to improve things.

On Dec 4, 2014, at 7:17 AM, Yash Patel yashpatel1...@gmail.com wrote:

Cross Recommendors dont seem applicable because this dataset doesn't
represent different actions by a user,it just contains transaction
history.(ie.customer id,item id,shipping location,sales amount of that
item,item category etc)

Maybe location,sales per item(similarity might lead to knowledge of people
who share same purchasing patterns) etc.


On Wed, Dec 3, 2014 at 5:28 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Wed, Dec 3, 2014 at 6:22 AM, Yash Patel yashpatel1...@gmail.com
 wrote:
 
 I have multiple different columns such as category,shipping location,item
 price,online user, etc.
 
 How can i use all these different columns and improve recommendation
 quality(ie.calculate more precise similarity between users by use of
 location,item price) ?
 
 
 For some kinds of information, you can build cross recommenders off of that
 other information.  That incorporates this other information in an
 item-based system.
 
 Simply hand coding a similarity usually doesn't work well.  The problem is
 that you don't really know which factors really represent actionable and
 non-redundant user similarity.
 



Mahout used for Text Clustering

2014-12-04 Thread Viral Parikh
Hi Mahout Users!
I am currently working on Text Clustering and I am using Mahout and Clustering 
algorithms (kmeans, LDA, canopy etc) for that.
 I have below questions –
1. Why is Mahout giving out clusters with only 1 observation?
2. Is cluster 1 always catch-all cluster?
3. When I change the k in kmeans and do clusterdump, the total number of 
observations change as k changes? Why so? Am I missing anything?
4. Does normalization (when creating the vectors) lead to good quality of 
clustering results, especially for unstructured data. In my case its text data!

Thank you in advance for your help!

Cheers,
V


Topological data analysis

2014-12-04 Thread Andrew Musselman
Any interest in a topological data analysis package in Mahout?

https://www.google.com/search?q=topological+data+analysis

http://danifold.net/mapper/introduction.html
http://danifold.net/mapper

Would be nice to be able to run jobs and and export to JSON for consumption
in D3 or other plotting/visualization tools.


Re: Topological data analysis

2014-12-04 Thread Brian Dolan
Though I don't have an immediate use case, I'd +1 the idea!

On Dec 4, 2014, at 3:11 PM, Andrew Musselman andrew.mussel...@gmail.com wrote:

 Any interest in a topological data analysis package in Mahout?
 
 https://www.google.com/search?q=topological+data+analysis
 
 http://danifold.net/mapper/introduction.html
 http://danifold.net/mapper
 
 Would be nice to be able to run jobs and and export to JSON for consumption
 in D3 or other plotting/visualization tools.