New PMC member and committer: Shinsuke Sugaya

2017-05-10 Thread Donald Szeto
Hi all,

The Project Management Committee (PMC) for Apache PredictionIO (incubating)
has asked Shinsuke Sugaya to become a PMC member and committer, and we are
pleased to announce that he has accepted. He is also a committer of the
Apache Portals project.

He has made major contributions to the PredictionIO 0.11.0 release by
adding Elasticsearch 5 support. This shows solid understanding of the core
PredictionIO codebase. In addition, he also helped cleaning up and
refactoring code in the core. Having him join forces with us would be
beneficial for PredictionIO's growth.

Being a committer enables easier contribution to the project since there is
no need to go via the patch submission process. This should enable better
productivity. Being a PMC member enables assistance with the management and
to guide the direction of the project.

Please join us in welcoming Shinsuke.

Regards,
Donald


Re: Problem scaling UR

2017-05-10 Thread Pat Ferrel
What is the physical architecture? Do you have HBase, Elasticsearch, and Spark 
running on separate machines? If the CPU load is low then it must be IO bound 
reading from Hbase or writing to Elasticsearch. Do you have any input event 
load yet or are you making queries? These will all change the equation and are 
why separating services to run separately makes the most sense.


On May 10, 2017, at 1:47 PM, Bolmo Joosten  wrote:

Thanks for your suggestion. I forgot to mention in my last email that the 
$plus$plus stage takes most time (95%+) and is using only 1-3 CPUs.

I will give it a try with lower driver memory and higher executor memory.

Maybe a hard question, any idea what kind of training time I should expect with 
this data size on this cluster? 

We modified the default UR template to create the eventRDDs from CSV files 
instead of HBASE. Hbase was unable to process this amount of data on the 
cluster. This means we can't provide any personalized recommendations, but that 
is ok for now. 

2017-05-10 10:22 GMT-07:00 Pat Ferrel mailto:p...@occamsmachete.com>>:
You can’t bypass HBase, you can import JSON to HBase directly so I assume this 
is what you are saying.

Executor memory should be higher and driver memory lower. Spark loves memory 
and in this case the lower limit is all your input events and BiMaps for all 
user and item ids. If you don’t have an OOM you are above minimum but 
increasing the executor mem might help, also executor CPUs. The lower limit for 
the driver mem is roughly equal to the amount per executor.

One unfortunate thing about Spark is that you can scale it to do the job in 
minutes but when you go to read or write to/from HBase or Elasticsearch this 
large a cluster will overload the DBs. So training in a long time is not all 
that bad a thing since the cluster will probably not be overloading the IO.


On May 10, 2017, at 8:45 AM, Bolmo Joosten mailto:bolmo.joos...@gmail.com>> wrote:

Hi all,

I have trouble scaling the Universal Recommender to a dataset with 250M events 
(purchase, view, atb). It trains ok on a couple of million events, but the 
training time becomes very long (>48h) on the large dataset.

Hardware specs:

Standalone cluster
20 cores (40 hyper threading)
264GB RAM
Input data size format:

We load directly from CSV files and bypass HBASE. Size of CSV is 19 GB.
PIO JSON format equivalent size: 150 GB
Train command:

pio train -- --driver-memory 64G --executor-memory 8G --executor-cores 2

I have used various variations with driver, executor memory and number of 
cores, but the training time does not seem to be affected by this.

Spark UI tells me the save method (collect > $plus$plus) in URModel.scala takes 
a very long time. See attached dumps of the Spark UI for details. 

Any suggestions?

Thanks, Bolmo










Re: UR PredictionIO quickstart

2017-05-10 Thread Pat Ferrel
The first thing you’ll run into is storage in-memory for all user and item ids. 
20,000 products that have sold and 42,500 users who have bought. This might fit 
in a 16g memory machine but also might require 32g. The number of sales is not 
a big factor. You may even be able to connect it to your sales channel, site or 
app. Sounds like the load will be low to start.


On May 10, 2017, at 1:21 PM, Dennis Honders  wrote:

65000 orders. 100.000 items. Not many items per order. 
80.000 products. Only 20.000 are sold at least once. 
85.000 customers. Half of the customers have bought at least one product 
according to this trainingsdata. 
1500 categories. 
150 manufactures. 

Currently a maximum of 5 properties for the products and customers. 

What setup do you recommend?



Op 10 mei 2017 om 20:44 heeft Pat Ferrel mailto:p...@occamsmachete.com>> het volgende geschreven:

> Probably, how many users and items?
> 
> It will certainly work on a single machine, you may have to pick a less than 
> minimal instance type. We recommend R3 instances and you can upgrade in place 
> if you start out too small. 
> 
> 
> On May 10, 2017, at 10:45 AM, Dennis Honders  > wrote:
> 
> Okay, thanks for the answer. Will also take a look at the update next week. 
> 
> In my case I have like 65000 orders and the complete dataset is about 700.000 
> records. 
> For confirmation, this is considered a small dataset, and small enough for 
> experimenting (Not using it in production) with the UR? 
> 
> 2017-05-10 19:13 GMT+02:00 Pat Ferrel  >:
> Yes unless you have large-ish data. We also have and AWS AMI all set up here: 
> http://actionml.com/docs/awssetupguide 
> . Both should be fine for 
> experimentation but will be too small for big-data.
> 
> BTW all are being updated to the UR V0.6.0 and PIO 0.11.0 by next week though 
> the current version work fine.
> 
> 
> On May 10, 2017, at 8:35 AM, Dennis Honders  > wrote:
> 
> Is the quickstart for PredictionIO 0.11.0 suitable enough for a very basic 
> setup or is the single machine (http://actionml.com/docs/single_machine 
> ) setup a minimal requirement?
> 
> 
> 



Re: Spark pipeline in predictionIO

2017-05-10 Thread Pankil Doshi
Thanks Donald.

I would reach out to dev mailing list.

Pankil

On Wed, May 10, 2017 at 4:23 PM, Donald Szeto  wrote:

> We are going to publish a roadmap soon and one of roadmap items is to
> support Spark ML pipelines natively in parallel of DASE.
>
> If you are interested in helping developing the core code to support it,
> please reach out in the dev mailing list.
>
> On Wed, May 10, 2017 at 1:02 PM Ravi Kiran  wrote:
>
>> Can you let me know once you are able to run in successfully. Atleast I
>> will have a template to start with.
>>
>> On Thu, May 11, 2017 at 1:28 AM, Pankil Doshi 
>> wrote:
>>
>>> We also have a similar use case.
>>>
>>> Trying to use spark pipeline as a model and run the pipeline at the time
>>> of scoring?
>>>
>>>
>>> > On May 9, 2017, at 11:30 AM, Ravi Kiran  wrote:
>>> >
>>> > Hi,
>>> > Most of the templates I have seen dont use spark pipeline. I have
>>> classification model in form of spark pipeline and I want to run it in
>>> predictionIO. Is there sample template available for using spark pipeline
>>> in predicitonIO?
>>> >
>>> >
>>>
>>
>>


Re: Spark pipeline in predictionIO

2017-05-10 Thread Donald Szeto
We are going to publish a roadmap soon and one of roadmap items is to
support Spark ML pipelines natively in parallel of DASE.

If you are interested in helping developing the core code to support it,
please reach out in the dev mailing list.

On Wed, May 10, 2017 at 1:02 PM Ravi Kiran  wrote:

> Can you let me know once you are able to run in successfully. Atleast I
> will have a template to start with.
>
> On Thu, May 11, 2017 at 1:28 AM, Pankil Doshi  wrote:
>
>> We also have a similar use case.
>>
>> Trying to use spark pipeline as a model and run the pipeline at the time
>> of scoring?
>>
>>
>> > On May 9, 2017, at 11:30 AM, Ravi Kiran  wrote:
>> >
>> > Hi,
>> > Most of the templates I have seen dont use spark pipeline. I have
>> classification model in form of spark pipeline and I want to run it in
>> predictionIO. Is there sample template available for using spark pipeline
>> in predicitonIO?
>> >
>> >
>>
>
>


Re: Problem scaling UR

2017-05-10 Thread Bolmo Joosten
Thanks for your suggestion. I forgot to mention in my last email that the
$plus$plus stage takes most time (95%+) and is using only 1-3 CPUs.

I will give it a try with lower driver memory and higher executor memory.

Maybe a hard question, any idea what kind of training time I should expect
with this data size on this cluster?

We modified the default UR template to create the eventRDDs from CSV files
instead of HBASE. Hbase was unable to process this amount of data on the
cluster. This means we can't provide any personalized recommendations, but
that is ok for now.

2017-05-10 10:22 GMT-07:00 Pat Ferrel :

> You can’t bypass HBase, you can import JSON to HBase directly so I assume
> this is what you are saying.
>
> Executor memory should be higher and driver memory lower. Spark loves
> memory and in this case the lower limit is all your input events and BiMaps
> for all user and item ids. If you don’t have an OOM you are above minimum
> but increasing the executor mem might help, also executor CPUs. The lower
> limit for the driver mem is roughly equal to the amount per executor.
>
> One unfortunate thing about Spark is that you can scale it to do the job
> in minutes but when you go to read or write to/from HBase or Elasticsearch
> this large a cluster will overload the DBs. So training in a long time is
> not all that bad a thing since the cluster will probably not be overloading
> the IO.
>
>
> On May 10, 2017, at 8:45 AM, Bolmo Joosten 
> wrote:
>
> Hi all,
>
> I have trouble scaling the Universal Recommender to a dataset with 250M
> events (purchase, view, atb). It trains ok on a couple of million events,
> but the training time becomes very long (>48h) on the large dataset.
>
> Hardware specs:
>
>- Standalone cluster
>- 20 cores (40 hyper threading)
>- 264GB RAM
>
> Input data size format:
>
>- We load directly from CSV files and bypass HBASE. Size of CSV is 19
>GB.
>- PIO JSON format equivalent size: 150 GB
>
> Train command:
>
> pio train -- --driver-memory 64G --executor-memory 8G --executor-cores 2
>
> I have used various variations with driver, executor memory and number of
> cores, but the training time does not seem to be affected by this.
>
> Spark UI tells me the save method (collect > $plus$plus) in URModel.scala
> takes a very long time. See attached dumps of the Spark UI for details.
>
> Any suggestions?
>
> Thanks, Bolmo
>
>
>
>
>
>


Re: UR PredictionIO quickstart

2017-05-10 Thread Dennis Honders
65000 orders. 100.000 items. Not many items per order. 
80.000 products. Only 20.000 are sold at least once. 
85.000 customers. Half of the customers have bought at least one product 
according to this trainingsdata. 
1500 categories. 
150 manufactures. 

Currently a maximum of 5 properties for the products and customers. 

What setup do you recommend?



> Op 10 mei 2017 om 20:44 heeft Pat Ferrel  het 
> volgende geschreven:
> 
> Probably, how many users and items?
> 
> It will certainly work on a single machine, you may have to pick a less than 
> minimal instance type. We recommend R3 instances and you can upgrade in place 
> if you start out too small. 
> 
> 
> On May 10, 2017, at 10:45 AM, Dennis Honders  wrote:
> 
> Okay, thanks for the answer. Will also take a look at the update next week. 
> 
> In my case I have like 65000 orders and the complete dataset is about 700.000 
> records. 
> For confirmation, this is considered a small dataset, and small enough for 
> experimenting (Not using it in production) with the UR? 
> 
> 2017-05-10 19:13 GMT+02:00 Pat Ferrel :
>> Yes unless you have large-ish data. We also have and AWS AMI all set up 
>> here: http://actionml.com/docs/awssetupguide. Both should be fine for 
>> experimentation but will be too small for big-data.
>> 
>> BTW all are being updated to the UR V0.6.0 and PIO 0.11.0 by next week 
>> though the current version work fine.
>> 
>> 
>> On May 10, 2017, at 8:35 AM, Dennis Honders  wrote:
>> 
>> Is the quickstart for PredictionIO 0.11.0 suitable enough for a very basic 
>> setup or is the single machine (http://actionml.com/docs/single_machine) 
>> setup a minimal requirement?
>> 
> 
> 


Re: Docs Universal Recommender

2017-05-10 Thread Marius Rabenarivo
My items are products with name and description and maybe caption extracted
from the image too.

2017-05-11 0:12 GMT+04:00 Pat Ferrel :

> What are your items? How much text? What other content? Unless you are
> recommending long for blogs or news NLP won’t give you much except maybe
> word2vec, which, if it has a good model, will give better than bag-of-words.
>
>
> On May 10, 2017, at 1:05 PM, Marius Rabenarivo 
> wrote:
>
> So in you opinion, do you think that the NLP task should be done in the
> Engine part using a library like mallet or should be implemented in
> algorithm focused library : mahout?
>
> 2017-05-10 23:52 GMT+04:00 Pat Ferrel :
>
>> That is how to make personalized content-based recommendations.You’d have
>> to input content by attaching it to items and recording it separately as a
>> usage event per content bit. The input , for instance would be every term
>> in the description of an item the user purchased. The input would be huge
>> and the current UR + PIO is not optimized for that kind of input. It is not
>> a recommended mode to use the UR and is of dubious value without NLP
>> techniques such as word2vec or NER instead of bag-of-word type content. It
>> might be ok if you have rich metadata like categories or tags.
>>
>> In general content based recommendations are often little better than
>> some filtering of popular or rotating promoted items (with no purchase
>> history), both can be done fairly easily with the UR.
>>
>> Content based with NLP techniques for short lived items like news can
>> work well but require extra phases in from of the recommender to do the NLP.
>>
>>
>>
>> On May 10, 2017, at 12:33 PM, Marius Rabenarivo <
>> mariusrabenar...@gmail.com> wrote:
>>
>> Hello,
>>
>> So to what does the matrix T and vector h_t in this slide match to? :
>> https://docs.google.com/presentation/d/1MzIGFsATNeAYnLfoR679
>> 7ofcLeFRKSX7KB8GAYNtNPY/edit#slide=id.gf4d43b9e8_1_24
>>
>> 2017-05-10 21:10 GMT+04:00 Pat Ferrel :
>>
>>> Content based recommendations are based on, well, content. You can
>>> really only make recs if you have an example item as with the
>>> recommendations you see at the bottom of product page on Amazon.
>>>
>>> For this make sure t have lots of properties of items, even keywords
>>> from descriptions will work, but also categories, tags, brands, price
>>> ranges. etc. These all must be encoded as JSON arrays of strings so prices
>>> might be one of [“$0-$1”, “$1-$5”, …] other things like descriptions
>>> categories or tags can have several strings attached.
>>>
>>> Then issue an item-based query with itemBias set higher (>1) to make use
>>> of usage information first before content since it performs better. Then
>>> add query fields for the various properties but include the values of the
>>> item referenced in the “item” field.
>>>
>>> You will get similar items based on usage data unless there is none then
>>> content will take over to recommend things with similar content. Play with
>>> the itemBias, try >1 by varying amounts since you want usage based
>>> similarity over content most of the time you have usage based data in the
>>> model. There is no hard rule for the bias.
>>>
>>>
>>> On May 10, 2017, at 6:36 AM, Dennis Honders 
>>> wrote:
>>>
>>> According to the docs, the UR is considered as hybrid collaborative
>>> filtering / content-based filtering.
>>> In my case I have a purchase history. Quite a lot of products are never
>>> bought so traditional techniques won't be able to make recommendations. For
>>> those products (never bought/sold), will recommendations be made with
>>> content-based filtering techniques?
>>> If so, what techniques are used in UR?
>>>
>>> 2017-05-08 19:02 GMT+02:00 Pat Ferrel :
>>>
 yes to all for UR v0.5.0

 UR v0.6.0 is sitting in the `develop` branch waiting for one more minor
 fix to be released. It uses the latest release of Mahout 0.13.0 so no need
 to build it for the project. Several new features too. I expect it to be
 out this week.


 On May 8, 2017, at 3:07 AM, Dennis Honders 
 wrote:

 Hi,

 Are the following docs up-to-date?

 PredictionIO: http://actionml.com/docs/pio_quickstart.
 Is version 0.11.0 suitable for UR?

 The UR: http://actionml.com/docs/ur.
 Is 0.5.0 the latest version?
 Is Mahout still necessary?

 Thanks,

 Dennis


>>>
>>>
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "actionml-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to actionml-user+unsubscr...@googlegroups.com.
> To post to this group, send email to actionml-u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/actionml-user/CAC-ATVGvbEM3nzmAPk4%2BD4GM6z1e1t9yJf4irR1kN1y5%
> 3DAk4Ag%40mail.gmail.com
> 

Re: Docs Universal Recommender

2017-05-10 Thread Pat Ferrel
What are your items? How much text? What other content? Unless you are 
recommending long for blogs or news NLP won’t give you much except maybe 
word2vec, which, if it has a good model, will give better than bag-of-words.


On May 10, 2017, at 1:05 PM, Marius Rabenarivo  
wrote:

So in you opinion, do you think that the NLP task should be done in the Engine 
part using a library like mallet or should be implemented in algorithm focused 
library : mahout?

2017-05-10 23:52 GMT+04:00 Pat Ferrel mailto:p...@occamsmachete.com>>:
That is how to make personalized content-based recommendations.You’d have to 
input content by attaching it to items and recording it separately as a usage 
event per content bit. The input , for instance would be every term in the 
description of an item the user purchased. The input would be huge and the 
current UR + PIO is not optimized for that kind of input. It is not a 
recommended mode to use the UR and is of dubious value without NLP techniques 
such as word2vec or NER instead of bag-of-word type content. It might be ok if 
you have rich metadata like categories or tags.

In general content based recommendations are often little better than some 
filtering of popular or rotating promoted items (with no purchase history), 
both can be done fairly easily with the UR. 

Content based with NLP techniques for short lived items like news can work well 
but require extra phases in from of the recommender to do the NLP.



On May 10, 2017, at 12:33 PM, Marius Rabenarivo mailto:mariusrabenar...@gmail.com>> wrote:

Hello,

So to what does the matrix T and vector h_t in this slide match to? : 
https://docs.google.com/presentation/d/1MzIGFsATNeAYnLfoR6797ofcLeFRKSX7KB8GAYNtNPY/edit#slide=id.gf4d43b9e8_1_24
 


2017-05-10 21:10 GMT+04:00 Pat Ferrel mailto:p...@occamsmachete.com>>:
Content based recommendations are based on, well, content. You can really only 
make recs if you have an example item as with the recommendations you see at 
the bottom of product page on Amazon.

For this make sure t have lots of properties of items, even keywords from 
descriptions will work, but also categories, tags, brands, price ranges. etc. 
These all must be encoded as JSON arrays of strings so prices might be one of 
[“$0-$1”, “$1-$5”, …] other things like descriptions categories or tags can 
have several strings attached. 

Then issue an item-based query with itemBias set higher (>1) to make use of 
usage information first before content since it performs better. Then add query 
fields for the various properties but include the values of the item referenced 
in the “item” field. 

You will get similar items based on usage data unless there is none then 
content will take over to recommend things with similar content. Play with the 
itemBias, try >1 by varying amounts since you want usage based similarity over 
content most of the time you have usage based data in the model. There is no 
hard rule for the bias.

  
On May 10, 2017, at 6:36 AM, Dennis Honders mailto:dennishond...@gmail.com>> wrote:

According to the docs, the UR is considered as hybrid collaborative filtering / 
content-based filtering. 
In my case I have a purchase history. Quite a lot of products are never bought 
so traditional techniques won't be able to make recommendations. For those 
products (never bought/sold), will recommendations be made with content-based 
filtering techniques?
If so, what techniques are used in UR?

2017-05-08 19:02 GMT+02:00 Pat Ferrel mailto:p...@occamsmachete.com>>:
yes to all for UR v0.5.0

UR v0.6.0 is sitting in the `develop` branch waiting for one more minor fix to 
be released. It uses the latest release of Mahout 0.13.0 so no need to build it 
for the project. Several new features too. I expect it to be out this week.


On May 8, 2017, at 3:07 AM, Dennis Honders mailto:dennishond...@gmail.com>> wrote:

Hi, 

Are the following docs up-to-date?

PredictionIO: http://actionml.com/docs/pio_quickstart 
. 
Is version 0.11.0 suitable for UR?

The UR: http://actionml.com/docs/ur . 
Is 0.5.0 the latest version? 
Is Mahout still necessary?

Thanks,

Dennis







-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to actionml-user+unsubscr...@googlegroups.com 
.
To post to this group, send email to actionml-u...@googlegroups.com 
.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAC-ATVGvbEM3nzmAPk4%2BD4GM6z1e1t9yJf4irR1kN1y5%3DAk4Ag%40mail.gmail.com
 


Re: Docs Universal Recommender

2017-05-10 Thread Marius Rabenarivo
So in you opinion, do you think that the NLP task should be done in the
Engine part using a library like mallet or should be implemented in
algorithm focused library : mahout?

2017-05-10 23:52 GMT+04:00 Pat Ferrel :

> That is how to make personalized content-based recommendations.You’d have
> to input content by attaching it to items and recording it separately as a
> usage event per content bit. The input , for instance would be every term
> in the description of an item the user purchased. The input would be huge
> and the current UR + PIO is not optimized for that kind of input. It is not
> a recommended mode to use the UR and is of dubious value without NLP
> techniques such as word2vec or NER instead of bag-of-word type content. It
> might be ok if you have rich metadata like categories or tags.
>
> In general content based recommendations are often little better than some
> filtering of popular or rotating promoted items (with no purchase history),
> both can be done fairly easily with the UR.
>
> Content based with NLP techniques for short lived items like news can work
> well but require extra phases in from of the recommender to do the NLP.
>
>
>
> On May 10, 2017, at 12:33 PM, Marius Rabenarivo <
> mariusrabenar...@gmail.com> wrote:
>
> Hello,
>
> So to what does the matrix T and vector h_t in this slide match to? :
> https://docs.google.com/presentation/d/1MzIGFsATNeAYnLfoR6797ofcLeFRK
> SX7KB8GAYNtNPY/edit#slide=id.gf4d43b9e8_1_24
>
> 2017-05-10 21:10 GMT+04:00 Pat Ferrel :
>
>> Content based recommendations are based on, well, content. You can really
>> only make recs if you have an example item as with the recommendations you
>> see at the bottom of product page on Amazon.
>>
>> For this make sure t have lots of properties of items, even keywords from
>> descriptions will work, but also categories, tags, brands, price ranges.
>> etc. These all must be encoded as JSON arrays of strings so prices might be
>> one of [“$0-$1”, “$1-$5”, …] other things like descriptions categories or
>> tags can have several strings attached.
>>
>> Then issue an item-based query with itemBias set higher (>1) to make use
>> of usage information first before content since it performs better. Then
>> add query fields for the various properties but include the values of the
>> item referenced in the “item” field.
>>
>> You will get similar items based on usage data unless there is none then
>> content will take over to recommend things with similar content. Play with
>> the itemBias, try >1 by varying amounts since you want usage based
>> similarity over content most of the time you have usage based data in the
>> model. There is no hard rule for the bias.
>>
>>
>> On May 10, 2017, at 6:36 AM, Dennis Honders 
>> wrote:
>>
>> According to the docs, the UR is considered as hybrid collaborative
>> filtering / content-based filtering.
>> In my case I have a purchase history. Quite a lot of products are never
>> bought so traditional techniques won't be able to make recommendations. For
>> those products (never bought/sold), will recommendations be made with
>> content-based filtering techniques?
>> If so, what techniques are used in UR?
>>
>> 2017-05-08 19:02 GMT+02:00 Pat Ferrel :
>>
>>> yes to all for UR v0.5.0
>>>
>>> UR v0.6.0 is sitting in the `develop` branch waiting for one more minor
>>> fix to be released. It uses the latest release of Mahout 0.13.0 so no need
>>> to build it for the project. Several new features too. I expect it to be
>>> out this week.
>>>
>>>
>>> On May 8, 2017, at 3:07 AM, Dennis Honders 
>>> wrote:
>>>
>>> Hi,
>>>
>>> Are the following docs up-to-date?
>>>
>>> PredictionIO: http://actionml.com/docs/pio_quickstart.
>>> Is version 0.11.0 suitable for UR?
>>>
>>> The UR: http://actionml.com/docs/ur.
>>> Is 0.5.0 the latest version?
>>> Is Mahout still necessary?
>>>
>>> Thanks,
>>>
>>> Dennis
>>>
>>>
>>
>>
>
>


Re: Spark pipeline in predictionIO

2017-05-10 Thread Ravi Kiran
Can you let me know once you are able to run in successfully. Atleast I
will have a template to start with.

On Thu, May 11, 2017 at 1:28 AM, Pankil Doshi  wrote:

> We also have a similar use case.
>
> Trying to use spark pipeline as a model and run the pipeline at the time
> of scoring?
>
>
> > On May 9, 2017, at 11:30 AM, Ravi Kiran  wrote:
> >
> > Hi,
> > Most of the templates I have seen dont use spark pipeline. I have
> classification model in form of spark pipeline and I want to run it in
> predictionIO. Is there sample template available for using spark pipeline
> in predicitonIO?
> >
> >
>


Re: Spark pipeline in predictionIO

2017-05-10 Thread Pankil Doshi
We also have a similar use case.

Trying to use spark pipeline as a model and run the pipeline at the time of 
scoring?


> On May 9, 2017, at 11:30 AM, Ravi Kiran  wrote:
> 
> Hi, 
> Most of the templates I have seen dont use spark pipeline. I have 
> classification model in form of spark pipeline and I want to run it in 
> predictionIO. Is there sample template available for using spark pipeline in 
> predicitonIO?
> 
> 


Re: Docs Universal Recommender

2017-05-10 Thread Pat Ferrel
That is how to make personalized content-based recommendations.You’d have to 
input content by attaching it to items and recording it separately as a usage 
event per content bit. The input , for instance would be every term in the 
description of an item the user purchased. The input would be huge and the 
current UR + PIO is not optimized for that kind of input. It is not a 
recommended mode to use the UR and is of dubious value without NLP techniques 
such as word2vec or NER instead of bag-of-word type content. It might be ok if 
you have rich metadata like categories or tags.

In general content based recommendations are often little better than some 
filtering of popular or rotating promoted items (with no purchase history), 
both can be done fairly easily with the UR. 

Content based with NLP techniques for short lived items like news can work well 
but require extra phases in from of the recommender to do the NLP.


On May 10, 2017, at 12:33 PM, Marius Rabenarivo  
wrote:

Hello,

So to what does the matrix T and vector h_t in this slide match to? : 
https://docs.google.com/presentation/d/1MzIGFsATNeAYnLfoR6797ofcLeFRKSX7KB8GAYNtNPY/edit#slide=id.gf4d43b9e8_1_24
 


2017-05-10 21:10 GMT+04:00 Pat Ferrel mailto:p...@occamsmachete.com>>:
Content based recommendations are based on, well, content. You can really only 
make recs if you have an example item as with the recommendations you see at 
the bottom of product page on Amazon.

For this make sure t have lots of properties of items, even keywords from 
descriptions will work, but also categories, tags, brands, price ranges. etc. 
These all must be encoded as JSON arrays of strings so prices might be one of 
[“$0-$1”, “$1-$5”, …] other things like descriptions categories or tags can 
have several strings attached. 

Then issue an item-based query with itemBias set higher (>1) to make use of 
usage information first before content since it performs better. Then add query 
fields for the various properties but include the values of the item referenced 
in the “item” field. 

You will get similar items based on usage data unless there is none then 
content will take over to recommend things with similar content. Play with the 
itemBias, try >1 by varying amounts since you want usage based similarity over 
content most of the time you have usage based data in the model. There is no 
hard rule for the bias.

  
On May 10, 2017, at 6:36 AM, Dennis Honders mailto:dennishond...@gmail.com>> wrote:

According to the docs, the UR is considered as hybrid collaborative filtering / 
content-based filtering. 
In my case I have a purchase history. Quite a lot of products are never bought 
so traditional techniques won't be able to make recommendations. For those 
products (never bought/sold), will recommendations be made with content-based 
filtering techniques?
If so, what techniques are used in UR?

2017-05-08 19:02 GMT+02:00 Pat Ferrel mailto:p...@occamsmachete.com>>:
yes to all for UR v0.5.0

UR v0.6.0 is sitting in the `develop` branch waiting for one more minor fix to 
be released. It uses the latest release of Mahout 0.13.0 so no need to build it 
for the project. Several new features too. I expect it to be out this week.


On May 8, 2017, at 3:07 AM, Dennis Honders mailto:dennishond...@gmail.com>> wrote:

Hi, 

Are the following docs up-to-date?

PredictionIO: http://actionml.com/docs/pio_quickstart 
. 
Is version 0.11.0 suitable for UR?

The UR: http://actionml.com/docs/ur . 
Is 0.5.0 the latest version? 
Is Mahout still necessary?

Thanks,

Dennis







Re: Docs Universal Recommender

2017-05-10 Thread Marius Rabenarivo
Hello,

So to what does the matrix T and vector h_t in this slide match to? :
https://docs.google.com/presentation/d/1MzIGFsATNeAYnLfoR6797ofcLeFRKSX7KB8GAYNtNPY/edit#slide=id.gf4d43b9e8_1_24

2017-05-10 21:10 GMT+04:00 Pat Ferrel :

> Content based recommendations are based on, well, content. You can really
> only make recs if you have an example item as with the recommendations you
> see at the bottom of product page on Amazon.
>
> For this make sure t have lots of properties of items, even keywords from
> descriptions will work, but also categories, tags, brands, price ranges.
> etc. These all must be encoded as JSON arrays of strings so prices might be
> one of [“$0-$1”, “$1-$5”, …] other things like descriptions categories or
> tags can have several strings attached.
>
> Then issue an item-based query with itemBias set higher (>1) to make use
> of usage information first before content since it performs better. Then
> add query fields for the various properties but include the values of the
> item referenced in the “item” field.
>
> You will get similar items based on usage data unless there is none then
> content will take over to recommend things with similar content. Play with
> the itemBias, try >1 by varying amounts since you want usage based
> similarity over content most of the time you have usage based data in the
> model. There is no hard rule for the bias.
>
>
> On May 10, 2017, at 6:36 AM, Dennis Honders 
> wrote:
>
> According to the docs, the UR is considered as hybrid collaborative
> filtering / content-based filtering.
> In my case I have a purchase history. Quite a lot of products are never
> bought so traditional techniques won't be able to make recommendations. For
> those products (never bought/sold), will recommendations be made with
> content-based filtering techniques?
> If so, what techniques are used in UR?
>
> 2017-05-08 19:02 GMT+02:00 Pat Ferrel :
>
>> yes to all for UR v0.5.0
>>
>> UR v0.6.0 is sitting in the `develop` branch waiting for one more minor
>> fix to be released. It uses the latest release of Mahout 0.13.0 so no need
>> to build it for the project. Several new features too. I expect it to be
>> out this week.
>>
>>
>> On May 8, 2017, at 3:07 AM, Dennis Honders 
>> wrote:
>>
>> Hi,
>>
>> Are the following docs up-to-date?
>>
>> PredictionIO: http://actionml.com/docs/pio_quickstart.
>> Is version 0.11.0 suitable for UR?
>>
>> The UR: http://actionml.com/docs/ur.
>> Is 0.5.0 the latest version?
>> Is Mahout still necessary?
>>
>> Thanks,
>>
>> Dennis
>>
>>
>
>


Re: UR PredictionIO quickstart

2017-05-10 Thread Pat Ferrel
Probably, how many users and items?

It will certainly work on a single machine, you may have to pick a less than 
minimal instance type. We recommend R3 instances and you can upgrade in place 
if you start out too small. 


On May 10, 2017, at 10:45 AM, Dennis Honders  wrote:

Okay, thanks for the answer. Will also take a look at the update next week. 

In my case I have like 65000 orders and the complete dataset is about 700.000 
records. 
For confirmation, this is considered a small dataset, and small enough for 
experimenting (Not using it in production) with the UR? 

2017-05-10 19:13 GMT+02:00 Pat Ferrel mailto:p...@occamsmachete.com>>:
Yes unless you have large-ish data. We also have and AWS AMI all set up here: 
http://actionml.com/docs/awssetupguide 
. Both should be fine for 
experimentation but will be too small for big-data.

BTW all are being updated to the UR V0.6.0 and PIO 0.11.0 by next week though 
the current version work fine.


On May 10, 2017, at 8:35 AM, Dennis Honders mailto:dennishond...@gmail.com>> wrote:

Is the quickstart for PredictionIO 0.11.0 suitable enough for a very basic 
setup or is the single machine (http://actionml.com/docs/single_machine 
) setup a minimal requirement?





Re: UR PredictionIO quickstart

2017-05-10 Thread Dennis Honders
Okay, thanks for the answer. Will also take a look at the update next week.

In my case I have like 65000 orders and the complete dataset is about
700.000 records.
For confirmation, this is considered a small dataset, and small enough for
experimenting (Not using it in production) with the UR?

2017-05-10 19:13 GMT+02:00 Pat Ferrel :

> Yes unless you have large-ish data. We also have and AWS AMI all set up
> here: http://actionml.com/docs/awssetupguide. Both should be fine for
> experimentation but will be too small for big-data.
>
> BTW all are being updated to the UR V0.6.0 and PIO 0.11.0 by next week
> though the current version work fine.
>
>
> On May 10, 2017, at 8:35 AM, Dennis Honders 
> wrote:
>
> Is the quickstart for PredictionIO 0.11.0 suitable enough for a very basic
> setup or is the single machine (http://actionml.com/docs/single_machine)
> setup a minimal requirement?
>
>


Re: UR PredictionIO quickstart

2017-05-10 Thread Pat Ferrel
Yes unless you have large-ish data. We also have and AWS AMI all set up here: 
http://actionml.com/docs/awssetupguide 
. Both should be fine for 
experimentation but will be too small for big-data.

BTW all are being updated to the UR V0.6.0 and PIO 0.11.0 by next week though 
the current version work fine.


On May 10, 2017, at 8:35 AM, Dennis Honders  wrote:

Is the quickstart for PredictionIO 0.11.0 suitable enough for a very basic 
setup or is the single machine (http://actionml.com/docs/single_machine 
) setup a minimal requirement?



Re: Docs Universal Recommender

2017-05-10 Thread Pat Ferrel
Content based recommendations are based on, well, content. You can really only 
make recs if you have an example item as with the recommendations you see at 
the bottom of product page on Amazon.

For this make sure t have lots of properties of items, even keywords from 
descriptions will work, but also categories, tags, brands, price ranges. etc. 
These all must be encoded as JSON arrays of strings so prices might be one of 
[“$0-$1”, “$1-$5”, …] other things like descriptions categories or tags can 
have several strings attached. 

Then issue an item-based query with itemBias set higher (>1) to make use of 
usage information first before content since it performs better. Then add query 
fields for the various properties but include the values of the item referenced 
in the “item” field. 

You will get similar items based on usage data unless there is none then 
content will take over to recommend things with similar content. Play with the 
itemBias, try >1 by varying amounts since you want usage based similarity over 
content most of the time you have usage based data in the model. There is no 
hard rule for the bias.

  
On May 10, 2017, at 6:36 AM, Dennis Honders  wrote:

According to the docs, the UR is considered as hybrid collaborative filtering / 
content-based filtering. 
In my case I have a purchase history. Quite a lot of products are never bought 
so traditional techniques won't be able to make recommendations. For those 
products (never bought/sold), will recommendations be made with content-based 
filtering techniques?
If so, what techniques are used in UR?

2017-05-08 19:02 GMT+02:00 Pat Ferrel mailto:p...@occamsmachete.com>>:
yes to all for UR v0.5.0

UR v0.6.0 is sitting in the `develop` branch waiting for one more minor fix to 
be released. It uses the latest release of Mahout 0.13.0 so no need to build it 
for the project. Several new features too. I expect it to be out this week.


On May 8, 2017, at 3:07 AM, Dennis Honders mailto:dennishond...@gmail.com>> wrote:

Hi, 

Are the following docs up-to-date?

PredictionIO: http://actionml.com/docs/pio_quickstart 
. 
Is version 0.11.0 suitable for UR?

The UR: http://actionml.com/docs/ur . 
Is 0.5.0 the latest version? 
Is Mahout still necessary?

Thanks,

Dennis





UR PredictionIO quickstart

2017-05-10 Thread Dennis Honders
Is the quickstart for PredictionIO 0.11.0 suitable enough for a very basic
setup or is the single machine (http://actionml.com/docs/single_machine)
setup a minimal requirement?


Re: Docs Universal Recommender

2017-05-10 Thread Dennis Honders
According to the docs, the UR is considered as hybrid collaborative
filtering / content-based filtering.
In my case I have a purchase history. Quite a lot of products are never
bought so traditional techniques won't be able to make recommendations. For
those products (never bought/sold), will recommendations be made with
content-based filtering techniques?
If so, what techniques are used in UR?

2017-05-08 19:02 GMT+02:00 Pat Ferrel :

> yes to all for UR v0.5.0
>
> UR v0.6.0 is sitting in the `develop` branch waiting for one more minor
> fix to be released. It uses the latest release of Mahout 0.13.0 so no need
> to build it for the project. Several new features too. I expect it to be
> out this week.
>
>
> On May 8, 2017, at 3:07 AM, Dennis Honders 
> wrote:
>
> Hi,
>
> Are the following docs up-to-date?
>
> PredictionIO: http://actionml.com/docs/pio_quickstart.
> Is version 0.11.0 suitable for UR?
>
> The UR: http://actionml.com/docs/ur.
> Is 0.5.0 the latest version?
> Is Mahout still necessary?
>
> Thanks,
>
> Dennis
>
>