Re: Mar 20 minutes

2024-03-25 Thread Peng Zhang
Wow, i am curious who are the conspirators.
“Happy hour some week soon, invite collaborators and conspirators”

Cheers,
Peng

On Sat, Mar 23, 2024 at 00:54 Andrew Musselman  wrote:

> Community meeting minutes posted at
> https://mahout.apache.org/minutes/2024/03/20/Meeting-Minutes.html
>
> Meeting Minutes
>
> 2024-03-20 08:00:00 +
> Weekly community meetingAttendees
>
>- Trevor Grant
>- Tommy Naugle
>
> Old Business
>
>1. Happy hour some week soon, invite collaborators and conspirators
>2. Drop this meeting time from two hours to a half hour
>3. Coordinate on JIRA
>   - Web site cleanup (~210 broken links fixed out of ~220, tommy
>   continuing)
>   - Continued qumat data structure work (tommy in flight, akm to
> review)
>4. Ask INFRA to help us make sure PRs are defaulting to main instead of
>trunk (akm) (done)
>5. Kernel method research spike:
>https://issues.apache.org/jira/browse/MAHOUT-2200
>6. Make ticket to add notebooks to notebooks directory in source tree (
>https://issues.apache.org/jira/browse/MAHOUT-2198)
>7. Add execute method to qumat
>https://issues.apache.org/jira/browse/MAHOUT-2201
>8. Rebuild JIRA - now that we have wiped it clean, on the qumat side
>anyway, lets start grooming tasks into the appropriate
>components/releases/etc (todo)
>   - Including adding filters to all boards so only those tickets show
>   up (todo)
>
> New Business
>
>1. Tommy is working on making a docker container for previewing website
>builds
>2. Trevor is pivoting from kernel research into implementing POC for
>cirq ie the 9 gates and circuit execute
>
> Other Business
>


Re: PyMahout (incore) (alpha v0.1)

2021-01-06 Thread Peng Zhang
Well done Trevor.

-peng

On Thu, Jan 7, 2021 at 04:45 Trevor Grant  wrote:

> Hey all,
>
> I made a branch for a thing I'm toying with. PyMahout.
>
> See https://github.com/rawkintrevo/pymahout/tree/trunk
>
> Right now, its sort of dumb- it just makes a couple of random incore
> matrices... but it _does_ make them.
>
> Next I want to show I can do something with DRMs.
>
> Once I know its all possible- Ill make a batch of JIRA tickets and we can
> start implementing a python like package so that in theory in a pyspark
> workbook you could
>
> ```jupyter
> !pip install pymahout
> 
>
> import pymhout
>
> # do pymahot things here... in python.
>
> ```
>
> So if you're interested in helping /playing- reach out on here or direct-
> if there is a bunch of interest I can commit all of this to a branch as we
> play with it.
>
> Thanks!
> tg
>


Re: [ANNOUNCE] Apache Mahout 0.14.0 Release

2019-03-07 Thread Peng Zhang
I’m very happy to hear this news~

On Thu, Mar 7, 2019 at 10:25 Andrew Musselman  wrote:

> The Apache Mahout PMC is pleased to announce the release of Mahout 0.14.0.
> Mahout's goal is to create an environment for quickly creating
> machine-learning applications that scale and run on the highest-performance
> parallel computation engines available. Mahout comprises an interactive
> environment and library that support generalized scalable linear algebra
> and include many modern machine-learning algorithms. This release ships
> some major changes from 0.13.0, most in support of simplicity and tidiness.
>
> To get started with Apache Mahout 0.14.0, download the release artifacts
> and signatures from http://www.apache.org/dist/mahout/0.14.0.
>
> Many thanks to the contributors and committers who were part of this
> release.
>
>
> RELEASE HIGHLIGHTS
>
> The theme of the 0.14.0 release is a major refactor for simplicity of usage
> and maintenance. Non-core items have been moved to the new “community”
> module, and a new “experimental” area has been created for cutting-edge
> work that may require user tuning for specific hardware configurations.
>
>
> STATS
>
> A total of 15 separate JIRA issues are addressed in this release [1].
>
>
> GETTING STARTED
>
> Download the release artifacts and signatures at
> https://mahout.apache.org/general/downloads.html. The examples directory
> contains several working examples of the core functionality available in
> Mahout. These can be run via scripts in the examples/bin directory. Most
> examples do not need a Hadoop cluster in order to run.
>
>
> FUTURE PLANS
>
> 0.14.1
>
> As the project moves towards a 0.14.1 release, we are working on the
> following:
>
> * Further Native Integration for increased speedups
>
> * JCuda backing for In-core Matrices and CUDA solvers
>
> * Enumeration across multiple GPUs per JVM instance on a given instance
>
> * GPU/OpenMP Acceleration for linear solvers
>
> * Further integration with other libraries such as MLLib and SparkML
>
> * Incorporate more statistical operations
>
> * Runtime probing and optimization of available hardware for caching of
> correct/most optimal solver
>
>
> CONTRIBUTING
>
> If you are interested in contributing, please see our How to Contribute [2]
> page or contact us via email at d...@mahout.apache.org.
>
>
> CREDITS
>
> As with every release, we wish to thank all of the users and contributors
> to Mahout. Please see the JIRA Release Notes [1] for individual credits.
> Big thanks to Trevor Grant for a large effort on the refactoring and
> cleanup in this release.
>
>
> KNOWN ISSUES:
>
> * The classify-wikipedia.sh example has an outdated link to the data files.
> A workaround is to change the download section of the script to:  `curl
>
> https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles10.xml-p002336425p003046511.bz2
> -o
> 
> ${WORK_DIR}/wikixml/enwiki-latest-pages-articles.xml.bz2`
>
> * Currently GPU acceleration for supported operations is limited to a
> single JVM instance
>
> * Occasional segfault with certain GPU models and computations
>
> * On older GPUs some tests fail when building ViennaCL due to card
> limitations
>
> * Currently automatic probing of a system’s hardware happens at each
> supported operation, adding some overhead
>
> * Currently the example in the main README errors out due to a packaging
> error; we will be fixing this in the next point release
>
>
>
> [1]
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MAHOUT%20AND%20issuetype%20in%20(standardIssueTypes()%2C%20subTaskIssueTypes())%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20in%20(0.14.0)
> <
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20MAHOUT%20AND%20issuetype%20in%20(standardIssueTypes()%2C%20subTaskIssueTypes())%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20in%20(0.13.0%2C%200.13.1%2C%201.0.0)
> >
>
> [2] https://mahout.apache.org/developers/how-to-contribute
>


Re: Apache Mahout Slack Channel

2018-02-08 Thread Peng Zhang
Hi Trevor, send me invitation as well. Thanks in advance.

-Peng

On Thu, 8 Feb 2018 at 21:29 Trevor Grant  wrote:

> It is- but if you email me (most have been with out user/dev CCd) I can
> send you an invite so you don't need it.
>
> I'll send you the invite now Khatwani,
>
> When you log in- go to the #mahout channel.
>
>
> On Wed, Feb 7, 2018 at 9:30 PM, KHATWANI PARTH BHARAT <
> h2016...@pilani.bits-pilani.ac.in> wrote:
>
> > Is email with @apache.org domain necessary to sign up for the slack?
> >
> > Thanks & Regards
> > Parth Khatwani
> >
> > On 08-Feb-2018 7:48 am, "Trevor Grant"  wrote:
> >
> > > For those who've been invited- when you get into the slack, look for
> the
> > > channel #mahout
> > >
> > > Thanks
> > >
> > > On Wed, Feb 7, 2018 at 9:18 AM, Aditya 
> wrote:
> > >
> > > > Great! Can't wait to join the channel!
> > > >
> > > > On Wed, Feb 7, 2018 at 8:12 PM, Trevor Grant <
> trevor.d.gr...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hello everyone!
> > > > >
> > > > > I wanted to make you all aware that we are using a Slack Channel
> > > > (#mahout)
> > > > > on the-asf.slack.com
> > > > >
> > > > > If anyone is interested in joining- I'm pretty sure you can just go
> > > there
> > > > > and sign up.  Email me privately if you need an invite.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > tg
> > > > >
> > > > >
> > > > > PS>
> > > > > https://www.youtube.com/watch?v=PWjd9xyFrzk=9=
> > > > > PLqxhJj6bcnY8Mb5qSKiQ_tpYR-gF-YhvG
> > > > >  > > > > v%3DPWjd9xyFrzk%26index%3D9%26list%3DPLqxhJj6bcnY8Mb5qSKiQ_tpYR-
> > > > >
> gF-YhvG=D=1518100899049000=AFQjCNENRj8zcRHSmJnWKRIEARUCnq
> > > > appg>
> > > > >
> > > >
> > >
> >
>


Re: New Committer: Holden Karau

2017-07-19 Thread Peng Zhang
Welcome onboard, Holden!

- Peng

On Wed, 19 Jul 2017 at 02:26 Andrew Palumbo  wrote:

> Welcome again, Holden, Great to have you on board!
>
> --andy
>
> 
> From: Trevor Grant 
> Sent: Tuesday, July 18, 2017 12:32:07 AM
> To: user@mahout.apache.org; Mahout Dev List
> Subject: New Committer: Holden Karau
>
> The Project Management Committee (PMC) for Apache Mahout
> has invited Holden Karau to become a committer and we are pleased
> to announce that she has accepted.
>
> Holden brings a great deal of expertise and knowledge around the
> Apache Spark project, and it working to improve the integration
> between the two projects.
>
> Being a committer enables easier contribution to the
> project since there is no need to go via the patch
> submission process. This should enable better productivity.
>
> Please join mean in giving Holden a very warm welcome.
>


Re: Welcome New Committer Nikolay Sakharnykh

2017-04-22 Thread Peng Zhang
Welcome Nikolay.


On Sat, 22 Apr 2017 at 12:17 Andrew Musselman  wrote:

> The Apache Mahout PMC is pleased to announce that we have asked Nikolay
> Sakharnykh to become a committer and he has accepted. His contribution of
> an initial set of CUDA bindings into the project are good progress toward
> our goal of simplifying matrix math at scale.
>
> Being a committer allows you to contribute more easily to the project,
> since in addition to posting pull requests and patches you're also granted
> write access to the code repository; which in turn means you can review and
> accept community contributions, and help others pitch in.
>
> Nikolay, we're looking forward to working with you in the future; welcome!
> It is customary for new committers to introduce themselves with a few words
> :)
>
> Best
> Andrew
>


Universal Recommender. How to rank items returned by query on three types of indicators?

2017-02-05 Thread Peng Zhang
Hi,

Suppose we have created three types of indicators (coocurrence, content and
intrinsic) and indexed them into Ellastic Search (ES). Then we query on
these three types of indicators of a user to get recommended items. How
does Universal Tecommender rank the items recommended based on these three
types of indicators?

I have gone thru the slides on Universal Recommender created by Pat. It's
very informative. Here is the link:
https://www.slideshare.net/mobile/pferrel/unified-recommender-39986309

Thanks
-Peng


Re: New Mahout "Samsara" Book

2016-02-25 Thread Peng Zhang
Looking forward to it for a long time. 

Sent from my iPhone

> On Feb 25, 2016, at 22:07, mario.al...@gmail.com wrote:
> 
> put in the basket
> 
>> On Thu, Feb 25, 2016 at 2:04 PM, Andrew Palumbo  wrote:
>> 
>> The new book, "Apache Mahout: Beyond MapReduce" has been released. Written
>> by Mahout committers Dmitriy Lyubimov and Andrew Palumbo, this book covers
>> previously undocumented features of Mahout releases 0.10 and 0.11.
>> For more information please see the announcement page:
>> 
>> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html
>> Thank You
>> 
>> 


Re: Question on RecommenderJob

2014-09-15 Thread Peng Zhang
Mahout does not guarantee specified recs for each user. There are many reasons, 
e.g, there might not be enough similar users or items for a user.

Peng Zhang

--
Sent from my iPhone

 On Sep 15, 2014, at 3:15 PM, Wei Li wei.le...@gmail.com wrote:
 
 Hi Mahout Users:
 
We are using the RecommderJob to perform the item-based
 recommendations, the following settings are used:
 
 similairtyClassname=SIMILARITY_COOCCURRENCE
 numRecommendations=20
 other parameters are set to default values
 
 while we see that the size of the recommendation results for some users is
 less than 20, only 1 or 2. Since we have no time to dive into the source
 code now, we do know if we see the right parameters. Does any one can help
 us on this issue? many thanks :)
 
 Best
 Wei


Re: Question on RecommenderJob

2014-09-15 Thread Peng Zhang
As far as I know, mahout would not do any post processing to remove records 
from recs. I assume you refer to recs as recommendation candidates.
If you find some user is not in recs (the output), please make sure he/she has 
sufficient interactions with the items.

On Sep 16, 2014, at 10:17 AM, Wei Li wei.le...@gmail.com wrote:

 Thanks Peng.
 
 Yes, I agree your points, there may not be enough interactions between
 users and items to do the recommendations, but do our Mahout code does some
 extra filtering to remove the recommendation candidates?
 
 On Mon, Sep 15, 2014 at 3:35 PM, Peng Zhang pzhang.x...@gmail.com wrote:
 
 Mahout does not guarantee specified recs for each user. There are many
 reasons, e.g, there might not be enough similar users or items for a user.
 
 Peng Zhang
 
 --
 Sent from my iPhone
 
 On Sep 15, 2014, at 3:15 PM, Wei Li wei.le...@gmail.com wrote:
 
 Hi Mahout Users:
 
   We are using the RecommderJob to perform the item-based
 recommendations, the following settings are used:
 
 similairtyClassname=SIMILARITY_COOCCURRENCE
 numRecommendations=20
 other parameters are set to default values
 
 while we see that the size of the recommendation results for some users
 is
 less than 20, only 1 or 2. Since we have no time to dive into the source
 code now, we do know if we see the right parameters. Does any one can
 help
 us on this issue? many thanks :)
 
 Best
 Wei
 



Re: New Mahout Recommender Service

2014-09-09 Thread Peng Zhang
Using this list to discuss is very convenient to stay tuned, so no objection. 

Peng Zhang

--
Sent from my iPhone

 On Sep 10, 2014, at 12:16 AM, Pat Ferrel p...@occamsmachete.com wrote:
 
 No Jira yet. There are too many moving parts and we’d have to see if it’s 
 appropriate for Mahout inclusion or as an “example” project. It would be 
 great to include but we’ll have to see what others think as it takes better 
 form. All components should be Apache license compatible though.
 
 I’ll start a Github project. Does anyone object to using this list for 
 discussion?
 
 On Sep 9, 2014, at 8:46 AM, Saikat Kanjilal sxk1...@hotmail.com wrote:
 
 @Pat Any interest in using http://vertx.io instead of play, have heard some 
 really good perf stats around this
 
 We should really start a jira with a list of use cases and then back into a 
 tech stack and outline the design in jira, thoughts ?
 
 Sent from my iPhone
 
 On Sep 9, 2014, at 8:44 AM, Martin, Nick nimar...@pssd.com wrote:
 
 Would absolutely love an ES integration.
 
 -Original Message-
 From: Pat Ferrel [mailto:p...@occamsmachete.com]
 Sent: Tuesday, September 09, 2014 10:29 AM
 To: user@mahout.apache.org
 Subject: New Mahout Recommender Service
 
 Now that we have the basis of several significant improvements to Mahout's 
 recommender it seems like we need to go the last step and provide a service. 
 Without this it is left to the user to do a lot of integration making the 
 current next gen somewhat incomplete.
 
 Using the Hadoop mapreduce code you can get all recs for all people using 
 collaborative filtering data or you can use the in-memory single machine 
 recommender if you have a small dataset. 
 
 The next generation would require Solr or Elasticsearch so why not go the 
 extra step and provide a recommender API on top? At very least it would give 
 users a single machine API they can call, analogous to the in-memory 
 recommender of Mahout 0.9. But it would also be indefinitely scalable.
 
 Is anyone interested in discussing this here?
 


Re: jar file for org.apache.mahout.cf.taste.impl.model.jdbc.MySQLJDBCDataModel

2014-09-07 Thread Peng Zhang
Hey Vinayak,

Please add mahout-integration-0.9.jar to your classpath, and this jar file is 
in mahout-distribution-0.9/integration/target directory.

Peng




On Sep 7, 2014, at 8:12 PM, vinayakb malagatti vinayakbmalaga...@gmail.com 
wrote:

 Hi all,
 
 I am getting error for the missing jar could you please help me to find the
 jar
 
 
 Thanks and Regards,
 Vinayak B



Re: ALS recommender with Long ids

2014-09-03 Thread Peng Zhang
Hi Nilesh,

Glad to see you've fixed the issue. 

Can you show a sample of your input data with long ids?

Thanks,
Peng Zhang

--
Sent from my iPhone

 On Sep 4, 2014, at 5:08 AM, Nil Kulkarni nilesh...@yahoo.com.INVALID wrote:
 
 Found the issue. Had forgotten to include the flag --usesLongIDs in the 
 second job. :P
 
 The correct command is:
 
 mahout recommendfactorized --input 
 /mahout/output_data/alsrecommender_longs/userRatings --userFeatures 
 /mahout/output_data/alsrecommender_longs/U/ --itemFeatures 
 /mahout/output_data/alsrecommender_longs/M/ --usesLongIDs true --itemIDIndex 
 /mahout/output_data/alsrecommender_longs/itemIDIndex/ --userIDIndex 
 /mahout/output_data/alsrecommender_longs/userIDIndex/ 
 --numRecommendations 100 --output 
 /mahout/output_data/alsrecommender_longs/user_recommendations_longs 
 --maxRating 1 --numThreads 4 --tempDir /mahout/tmp 
 -Dmapred.reduce.tasks=256
 -Nilesh
 
 (: Tongue tied, speechless)
 
 
 
 On Wednesday, September 3, 2014 11:27 AM, Nil Kulkarni 
 nilesh...@yahoo.com.INVALID wrote:
 
 
 
 hi folks,
 
 
 
   I was trying to get the ALS recommender working for our data.  
 
 I have user_ids and item_ids that are longs. To handle this,
 I set the usesLongIDs ‘true’ in the parallelALS job first. 
 
 
 mahout parallelALS --input /mahout/input_data/user_item_interest/ --output 
 /mahout/output_data/alsrecommender_longs/ --lambda 0.1 --implicitFeedback 
 true --alpha 0.1 --numFeatures 20 --numIterations 10 --numThreadsPerSolver 4 
 --usesLongIDs true --tempDir /mahout/tmp -Dmapred.reduce.tasks=256
 
 
 This job created the output folders /userIDIndex and /itemIDIndex along
 with the /U and the /M latent factor folders. Then,I gave these 
 paths for the generated userIDIndex and itemIDIndex to
 the recommendfactorized job. 
 
 
 mahout recommendfactorized --input 
 /mahout/output_data/alsrecommender_longs/userRatings --userFeatures 
 /mahout/output_data/alsrecommender_longs/U/ --itemFeatures 
 /mahout/output_data/alsrecommender_longs/M/ --itemIDIndex 
 /mahout/output_data/alsrecommender_longs/itemIDIndex/ --userIDIndex 
 /mahout/output_data/alsrecommender_longs/userIDIndex/ --numRecommendations 
 100 --output 
 /mahout/output_data/alsrecommender_longs/user_recommendations_longs 
 --maxRating 1 --numThreads 4 --tempDir /mahout/tmp -Dmapred.reduce.tasks=256
 
 Both the jobs completed successfully. I was assuming logically that since my 
 input user and item ids were longs, the computed user recommendations would 
 have the user_ids and recommended item_ids as the original longs. 
 
 
 However, the output seem to still hold the recommendations in the ints that 
 it internally created as part of its first job. Is there any other parameter 
 to be set to get back the recommendations in their original long ids. It 
 seems 
 
 absurd that the factorizedrecommender job does not have post-processing 
 step that would map the ints back to the longs.
 
 Thanks,
 Nilesh


Re: RecommenderJob

2014-08-25 Thread Peng Zhang
If an item is not similar to anyone else, and a user only connects with this 
item, this user doesnt get any recommended items. 

This is just one example. 

Peng Zhang

--
Sent from my iPhone

 On Aug 25, 2014, at 2:22 PM, Wei Li wei.le...@gmail.com wrote:
 
 Hi Mahout users:
 
We have tried the item-based CF recommender with a user_id, item_id,
 rating data. while the recommendation output is less than our expected, for
 example, if we have 1000 users, the output should have 1000 records, one
 for each user, right?
 
 Best
 Wei


Re: RecommenderJob

2014-08-25 Thread Peng Zhang
If there are no suitable recommendations for a user, the output will not 
contain any records related to this user.


Peng Zhang


On Aug 25, 2014, at 4:38 PM, Wei Li wei.le...@gmail.com wrote:

 thanks Peng's answers. Yes, I know this case, but RecommenderJob does not
 output these records?
 
 
 On Mon, Aug 25, 2014 at 3:37 PM, Peng Zhang pzhang.x...@gmail.com wrote:
 
 If an item is not similar to anyone else, and a user only connects with
 this item, this user doesnt get any recommended items.
 
 This is just one example.
 
 Peng Zhang
 
 --
 Sent from my iPhone
 
 On Aug 25, 2014, at 2:22 PM, Wei Li wei.le...@gmail.com wrote:
 
 Hi Mahout users:
 
   We have tried the item-based CF recommender with a user_id, item_id,
 rating data. while the recommendation output is less than our expected,
 for
 example, if we have 1000 users, the output should have 1000 records, one
 for each user, right?
 
 Best
 Wei
 



Can user id and item id be negative integers?

2014-08-05 Thread Peng Zhang
Hi,

Is it OK that we use negative ids?

I have the impression that user id and item id should be natural integers like 
0, 1, 2, because they are used as column number and row numbers.

But today I am trying to use negative user id and item id, and they are working 
well with the item recommender and dvd recommender.

Many thanks,
Peng Zhang
pzhang.x...@gmail.com







Re: easy recommender question

2014-08-05 Thread Peng Zhang
Hi,

Please try to upload your input file test.txt to HDFS (Hadoop File System), and 
run again. 

Peng Zhang

--
Sent from my iPhone

 On Aug 5, 2014, at 5:59 PM, François Bossière francois.bossi...@gmail.com 
 wrote:
 
 Hi,
 
 I am discovering Mahout which I installed on a mapr cluster using the 
 mapr-mahout package.
 I try some very little test on recommenders:
 My input is the following $MAHOUT_HOME/input/test.txt:
 0,1,4
 0,4,2
 5,1,2
 5,4,3
 5,7,5
 8,1,2
 8,7,1
 
 I go in $MAHOUT_HOME and I run:
 mahout recommenditembased -s SIMILARITY_PEARSON_CORRELATION -i input/test.txt 
 -o output --numRecommendations 5
 
 I get the following error message:
 
 No MAHOUT_CONF_DIR found
 Running on hadoop, using /opt/mapr/hadoop/hadoop-0.20.2/bin/hadoop and 
 HADOOP_CONF_DIR=
 MAHOUT-JOB: /opt/mapr/mahout/mahout-0.9/mahout-examples-0.9-mapr-job.jar
 14/08/05 09:53:53 WARN driver.MahoutDriver: No recommenditembased.props found 
 on classpath, will use command-line arguments only
 14/08/05 09:53:54 INFO common.AbstractJob: Command line arguments: 
 {--booleanData=[false], --endPhase=[2147483647], --input=[input/test.txt], 
 --maxPrefsInItemSimilarity=[500], --maxPrefsPerUser=[10], 
 --maxSimilaritiesPerItem=[100], --minPrefsPerUser=[1], 
 --numRecommendations=[5], --output=[output], 
 --similarityClassname=[SIMILARITY_PEARSON_CORRELATION], --startPhase=[0], 
 --tempDir=[temp]}
 14/08/05 09:53:54 INFO common.AbstractJob: Command line arguments: 
 {--booleanData=[false], --endPhase=[2147483647], --input=[input/test.txt], 
 --minPrefsPerUser=[1], --output=[temp/preparePreferenceMatrix], 
 --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}
 14/08/05 09:53:54 INFO fs.JobTrackerWatcher: Current running JobTracker is: 
 fb-mapr1.c.mindful-origin-252.internal/10.240.1.96:9001
 14/08/05 09:53:54 INFO mapred.JobClient: Cleaning up the staging area 
 maprfs:/var/mapr/cluster/mapred/jobTracker/staging/mapr/.staging/job_201408041502_0035
 Exception in thread main 
 org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
 not exist: input/test.txt
at 
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:248)
at 
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:273)
at 
 org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1033)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1050)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:173)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:934)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:885)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:885)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:573)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:603)
at 
 org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:77)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at 
 org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:164)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at 
 org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:322)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
 
 Precision: i set MAHOUT_CONF_DIR=/opt/mapr/mahout/mahout-0.9/conf.new because 
 I did not know how to set it.
 Can you help me solve this very frustrating problem?
 


Re: Can user id and item id be negative integers?

2014-08-05 Thread Peng Zhang
Hi,

Does this support the possibility that user/item id can be negative?

I am reading through the source code of 
org.apache.mahout.cf.taste.impl.model.AbstractIDMigrator. The hash() function 
is trying to convert a string id to a long id like this. It’s quite possible 
that the long id returned is a negative one, when the leading bit is 1:) 
  
protected final long hash(String value) {
byte[] md5hash;
synchronized (md5Digest) {
  md5hash = md5Digest.digest(value.getBytes(Charsets.UTF_8));
  md5Digest.reset();
}
long hash = 0L;
for (int i = 0; i  8; i++) {
  hash = hash  8 | md5hash[i]  0x00FFL;
}
return hash;
  }


Hi Ted,
I am running the in memory version of GenericItemBasedRecommender and 
SVDRecommender, i.e. I am using them in my Java code.


Hi Pat,
Not all user id are negative. Input file sample:
...
-1250,6929,1
-1250,7059,1
-1250,7654,1
-1250,8094,1
-1250,9486,1
-1250,9563,3
10018000,11080,1
10018000,11176,1
10018000,11196,1
10018000,12220,1
10018000,12447,1
10018000,13213,1
...

Item based recommender output sample:
User,Brand,Scoring
-1250,12352,5.0
-1250,14261,5.0
-1250,15934,4.309238
-1250,16463,3.0
-1250,3627,1.0
1025250,29099,1.0
1025250,18741,1.0
1025250,14261,1.0
…

SVD recommender output sample:
User,Brand,Scoring
-1250,3627,3.9108906
-1250,27791,3.8262475
-1250,251,3.744943
-1250,20979,3.5778444
-1250,14482,3.5494242
1025250,27791,2.2692947
1025250,251,1.9651389
1025250,14482,1.9196383
1025250,12220,1.9153352
...


Thank you,

Peng Zhang
M: +86 186-1658-7856
pzhang.x...@gmail.com





On Aug 6, 2014, at 7:26 AM, Pat Ferrel p...@occamsmachete.com wrote:

 Are they ALL negative? Maybe only the non-negatives are working or there are 
 some conditions where negatives work. I certainly wouldn’t count on it 
 because I’ll bet it isn’t working as it should.
 
 
 On Aug 5, 2014, at 4:03 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
 On Tue, Aug 5, 2014 at 3:21 AM, Peng Zhang pzhang.x...@gmail.com wrote:
 
 But today I am trying to use negative user id and item id, and they are
 working well with the item recommender and dvd recommender.
 
 
 Which programs are you using?
 



Re: recommenditembased returns 0 records from last map-reduce job

2014-07-21 Thread Peng Zhang
My personal comments:
1. Data cleansing. One beautiful characteristic of Mahout’s CF recommendation 
is the simplicity of input data, often times just three columns (user, item, 
preference). If any value is missing, just don’t put the record in the input 
file. Therefore I don’t see there is any need to do data cleaning given that 
the application has recorded user-item-preference correctly and you have 
translated user-id and item-id properly.
2. Oftentimes Loglikelihood has a better performance than PearsonCorrelation in 
Mahout’s Collaborative Filtering. The former is focused on discrete values and 
the latter is focused on continuous values. Refer to Ted’s popular post 
Surprise and Coincidence about the former.


Peng Zhang
pzhang.x...@gmail.com





On Jul 21, 2014, at 3:37 PM, Serega Sheypak serega.shey...@gmail.com wrote:

 Thanks! I'll report this evening.
 
 Are there any articles about data preparation for mahout item
 recommendation? There are many books but most of them are copy-paste of
 javadoc and guides from mahout site.
 I'm -1 at math, my challenges are:
 
 1. approaches for data cleaning, do I have to apply dead-simple statisical
 rules?
 The empirical rule also states that approximately 95 percent of the data
 values will fall within two standard deviations from the mean.
 So If my user visits are described as normal distirbution Does it make
 sense? The idea is to put away all noise.
 
 2. similarityClassname - don't have any intuition here... I see that people
 use SIMILARITY_LOGLIKELIHOOD and PEARSON
 
 
 2014-07-21 11:18 GMT+04:00 Peng Zhang pzhang.x...@gmail.com:
 
 Serega,
 
 See the last line on how to pass outputPathForSimilarityMatrix options to
 the recommenditembased command:
 
 sudo -u oozie mahout recommenditembased \
   --input visited_items_with_inverted_items \
 
   --output result \
   --similarityClassname SIMILARITY_LOGLIKELIHOOD \
   --usersFile inverted_items \
   --numRecommendations 500 \
   --booleanData false \
   --maxPrefsPerUser 100 \
   --maxSimilaritiesPerItem 500 \
   --minPrefsPerUser 0\
   --maxPrefsPerUserInItemSimilarity 30 \
   --threshold 0.91 \
   --tempDir  temp \
   --outputPathForSimilarityMatrix similarityMatri \
 
 
 Peng Zhang
 pzhang.x...@gmail.com
 
 
 
 
 
 On Jul 21, 2014, at 3:09 PM, Serega Sheypak serega.shey...@gmail.com
 wrote:
 
 I've inspected the code, our approach wouldn't work with
 booleanData=false.
 We do calcualte imte similarity in the wrong way...(((
 Thank you
 1. We provide fake user_id and provide --usersFile in order to get
 recommendations for fake user_id, where user_id is a negative item_id.
 It
 worked when we did provide user_id-item_id pairs without preference.
 2. Our target is to get item similarities. We tried
 org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob but
 it
 returns bad result comparing to RecommenderJob with our fake user_id
 (inverted item_id)
 
 1. I'll try the option you provided.
 2. I will remove input with fake user_id and usersFile with these fake
 ids
 
 3.
 
 https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
 I don't understand how to pass ---outputPathForSimilarityMatrix option to
 RecommenderJob
 
 
 2014-07-21 4:58 GMT+04:00 Peng Zhang pzhang.x...@gmail.com:
 
 Seraga,
 
 I have two comments:
 1. Don’t use negative user ids. Since Mahout uses user id as well as
 item
 id as the row/column index, you’d better use 0, 1, 2, etc as ids
 2. If you want to get the item similarity information, you can use
 --outputPathForSimilarityMatrix in the command
 
 Regards,
 Peng Zhang
 M: +86 186-1658-7856
 pzhang.x...@gmail.com
 
 
 
 
 
 On Jul 21, 2014, at 4:00 AM, Serega Sheypak serega.shey...@gmail.com
 wrote:
 
 All bad things happen here:
 
 
 
 Name
 
 RecommenderJob-PartialMultiplyMapper-Reducer
 
 User
 
 oozie
 
 Process User
 
 oozie
 
 Group
 
 oozie
 
 Mapper Class
 
 PartialMultiplyMapper
 
 Reducer Class
 
 AggregateAndRecommendReducer
 
 
 Job Input Directory
 
 hdfs://nameservice1/itemrec/temp/partialMultiply
 
 Job Output Directory
 
 hdfs://nameservice1/itemrec/output/
 
 14/07/20 23:57:47 INFO mapred.JobClient: Map input records=3312879
 
 14/07/20 23:57:47 INFO mapred.JobClient: Map output records=3313251
 
 
 14/07/20 23:57:47 INFO mapred.JobClient: Reduce input
 records=3313251
 
 14/07/20 23:57:47 INFO mapred.JobClient: Reduce output records=0
 
 Why does mahout returns 0 rows? it works when booleanData=true
 (preferences
 are ignored...?)
 
 
 
 2014-07-20 23:19 GMT+04:00 Serega Sheypak serega.shey...@gmail.com:
 
 the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
 users_file:
 --inverted_item_id
 -1
 -2
 -3
 -4
 
 users_items_prefs
 --inverted item_id
 -1 1 1.0
 -2 2

Re: recommenditembased returns 0 records from last map-reduce job

2014-07-20 Thread Peng Zhang
Seraga,

I have two comments:
1. Don’t use negative user ids. Since Mahout uses user id as well as item id as 
the row/column index, you’d better use 0, 1, 2, etc as ids
2. If you want to get the item similarity information, you can use 
--outputPathForSimilarityMatrix in the command

Regards,
Peng Zhang
M: +86 186-1658-7856
pzhang.x...@gmail.com





On Jul 21, 2014, at 4:00 AM, Serega Sheypak serega.shey...@gmail.com wrote:

 All bad things happen here:
 
 
 
 Name
 
 RecommenderJob-PartialMultiplyMapper-Reducer
 
 User
 
 oozie
 
 Process User
 
 oozie
 
 Group
 
 oozie
 
 Mapper Class
 
 PartialMultiplyMapper
 
 Reducer Class
 
 AggregateAndRecommendReducer
 
 
 Job Input Directory
 
 hdfs://nameservice1/itemrec/temp/partialMultiply
 
 Job Output Directory
 
 hdfs://nameservice1/itemrec/output/
 
 14/07/20 23:57:47 INFO mapred.JobClient: Map input records=3312879
 
 14/07/20 23:57:47 INFO mapred.JobClient: Map output records=3313251
 
 
 14/07/20 23:57:47 INFO mapred.JobClient: Reduce input records=3313251
 
 14/07/20 23:57:47 INFO mapred.JobClient: Reduce output records=0
 
 Why does mahout returns 0 rows? it works when booleanData=true (preferences
 are ignored...?)
 
 
 
 2014-07-20 23:19 GMT+04:00 Serega Sheypak serega.shey...@gmail.com:
 
 the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
 users_file:
 --inverted_item_id
 -1
 -2
 -3
 -4
 
 users_items_prefs
 --inverted item_id
 -1 1 1.0
 -2 2 1.0
 -3 3 1.0
 -4 4 1.0
 --user_id item_id pref_value
 11   1 1.6
 11   2 1.6
 123 3 2.0
 123 4 2.0
 333 1 2.0
 333 2 1.6
 --e.t.c.
 
 if I set --booleanData true
 then mahout returns the result.
 
 
 
 
 2014-07-20 23:12 GMT+04:00 Andrew Musselman andrew.mussel...@gmail.com:
 
 I'm confused about how you're constructing the user file, and why there
 are negated item ids here.
 
 Can you post some more details please, including Mahout version and some
 sample data sets?
 
 On Jul 20, 2014, at 11:57 AM, Serega Sheypak serega.shey...@gmail.com
 wrote:
 
 Hi, I'm trying to create item similarity.
 I gather items which users visit during shopping and then create a file:
 user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], depends
 on
 user action type and data source)
 UNION
 -item_id, item_id, 1.0 (from items dictionary)
 
 and I do provide a userFile, where user_id = -item_id
 
 The idea is to get item similary. If any user visits item named A, i
 want
 to show him items B, c, xxx using preferences of other users.
 
 The problem is that the last (???) mapreduce job returns 0 rows:
 
 Here are my settings:
 
 
 sudo -u oozie mahout recommenditembased \
   --input visited_items_with_inverted_items \
 
   --output result \
   --similarityClassname SIMILARITY_LOGLIKELIHOOD \
   --usersFile inverted_items \
   --numRecommendations 500 \
   --booleanData false \
   --maxPrefsPerUser 100 \
   --maxSimilaritiesPerItem 500 \
   --minPrefsPerUser 0\
   --maxPrefsPerUserInItemSimilarity 30 \
   --threshold 0.91 \
   --tempDir  temp \
 
 Some counters... I don't get what do they mean
 
 14/07/20 22:43:08 INFO mapred.JobClient:
 org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
 
 14/07/20 22:43:08 INFO mapred.JobClient: USERS=7528530
 
 14/07/20 22:43:43 INFO mapred.JobClient:
 
 org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
 
 14/07/20 22:43:43 INFO mapred.JobClient:
   USER_RATINGS_NEGLECTED=1,798,738
 
 14/07/20 22:43:43 INFO mapred.JobClient:
 USER_RATINGS_USED=12,429,693
 
 
 14/07/20 22:44:24 INFO mapred.JobClient:
 
 org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
 
 14/07/20 22:44:24 INFO mapred.JobClient: ROWS=3312879
 
 14/07/20 22:45:18 INFO mapred.JobClient:
 
 org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
 
 14/07/20 22:45:18 INFO mapred.JobClient: COOCCURRENCES=35882374
 
 14/07/20 22:45:18 INFO mapred.JobClient: PRUNED_COOCCURRENCES=0
 
 14/07/20 22:46:00 INFO mapred.JobClient: Map input records=3312879
 
 14/07/20 22:46:00 INFO mapred.JobClient: Map output records=17570268
 
 14/07/20 22:46:00 INFO mapred.JobClient: Reduce input
 records=5221907
 
 14/07/20 22:46:00 INFO mapred.JobClient: Reduce output
 records=3312879
 
 
 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input
 records=3312879
 
 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output
 records=3312879
 
 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input
 records=3312879
 
 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output
 records=3312879
 
 14/07/20 22:47:06 INFO mapred.JobClient: Map input records=7528530
 
 14/07/20 22:47:06 INFO mapred.JobClient: Map output records=3313251
 
 14/07/20 22:47:06 INFO mapred.JobClient: Reduce input
 records

Re: Theory behind LogisticRegression in Mahout

2014-05-22 Thread Peng Zhang
Namit,

I think the theory behind Mahout’s logistic regression is stochastic gradient 
descent, rather than maximum likelihood.

Best Regards,
Peng Zhang



On May 22, 2014, at 2:29 PM, namit maheshwari namitmaheshwa...@gmail.com 
wrote:

 Hello Everyone,
 
 Could anyone please let me know the algorithm used behind
 LogisticRegression in Mahout. Also AdaptiveLogisticRegression mentions an
 *annealing* schedule.
 
 I would be grateful if someone could guide me towards the theory behind it.
 
 Thanks
 Namit



Re: SVD mahout implementation

2014-05-20 Thread Peng Zhang
Hi Celal,

Refer to page 26 of the following document, which gives you some code snippet 
on Mahout’s SVD.

http://ir.cs.georgetown.edu/cs422/files/DM-ItemRecommenders.pdf

Best Regards,
Peng Zhang


On May 21, 2014, at 3:44 AM, Celal SAVUR c.sa...@gmail.com wrote:

 Hello Everyone,
 
 I am recently working on a project, and I need to implement SVD
 recmommender system. I am looking for a simple example to see how can I do
 in mahout.
 
 If you can share with me an simple example, it will me perfect.
 
 Thank you in advance.
 
 Celal



Re: Parsing mahout output

2014-05-20 Thread Peng Zhang
Hi Jamal,

Maybe you can use getUserFeature() and getItemFeature() methods in 
Factorization class to look at the latent features.

Best Regards,
Peng Zhang





On May 21, 2014, at 5:42 AM, jamal sasha jamalsha...@gmail.com wrote:

 Hi,
  I want to convert the output of cf module (als,svd) etc to csv. How do I
 do the conversion?
 I want to look at those latent features?
 Thanks
 Jamal



Re: Using clustering output for classification

2014-05-05 Thread Peng Zhang
Angel,

I thinks Ted means each example falls into one cluster. If you have k clusters, 
and each example should have one of the encodings: 1,2,…k.

On May 6, 2014, at 5:27 AM, Angel Luis Scull ascu...@facinf.uho.edu.cu wrote:

 What do you mean with get a 1 of n encodings...
 
 On 05/05/14 16:59, Ted Dunning wrote:
 In theory, what you need to do is take your training data for your
 classifier and run your clustering to get a 1 of n encoding of the cluster
 for each example in the training data.
 
 Then train the classifier using original and new features.
 
 Does that help?  I have a simple demo of the process in R that I do if that
 would help.
 
 
 
 
 On Mon, May 5, 2014 at 5:53 PM, Angel Luis Scull
 ascu...@facinf.uho.edu.cuwrote:
 
 Hello to all
 
 I've a document dataset that I applied kmeans over it an obtained a
 clusters, now I want to use this the association of the vectors and
 clusters as input for a classification algorithm.
 
 How can I achieve that?
 
 thanks in advance