Number of features for ALS

2014-03-27 Thread Sebastian Schelter

Hi,

does anyone know of a principled approach of choosing the number of 
features for ALS (other than cross-validation?)


--sebastian


Re: Number of features for ALS

2014-03-27 Thread Ted Dunning
How can there be any other practical method?  Essentially all of the
mathematical assumptions under-pinning ALS are violated by the real world.
 Why would any mathematical consideration of the number of features be much
more than heuristic?

That said, you can make an information content argument.  You can also make
the argument that if you take too many features, it doesn't much hurt so
you should always take as many as you can compute.



On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter  wrote:

> Hi,
>
> does anyone know of a principled approach of choosing the number of
> features for ALS (other than cross-validation?)
>
> --sebastian
>


Re: Number of features for ALS

2014-03-27 Thread Tevfik Aytekin
Interesting topic,
Ted, can you give examples of those mathematical assumptions
under-pinning ALS which are violated by the real world?

On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning  wrote:
> How can there be any other practical method?  Essentially all of the
> mathematical assumptions under-pinning ALS are violated by the real world.
>  Why would any mathematical consideration of the number of features be much
> more than heuristic?
>
> That said, you can make an information content argument.  You can also make
> the argument that if you take too many features, it doesn't much hurt so
> you should always take as many as you can compute.
>
>
>
> On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter  wrote:
>
>> Hi,
>>
>> does anyone know of a principled approach of choosing the number of
>> features for ALS (other than cross-validation?)
>>
>> --sebastian
>>


Re: Number of features for ALS

2014-03-27 Thread Ted Dunning
Least squares techniques in general depend on an assumption of normal 
distribution of errors. With counts, that is only plausible with large values. 

Also decomposition a like this make linearity assumptions which imply all 
items/words are independent.  They are clearly not. 

Sent from my iPhone

> On Mar 27, 2014, at 7:18, Tevfik Aytekin  wrote:
> 
> Interesting topic,
> Ted, can you give examples of those mathematical assumptions
> under-pinning ALS which are violated by the real world?
> 
>> On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning  wrote:
>> How can there be any other practical method?  Essentially all of the
>> mathematical assumptions under-pinning ALS are violated by the real world.
>> Why would any mathematical consideration of the number of features be much
>> more than heuristic?
>> 
>> That said, you can make an information content argument.  You can also make
>> the argument that if you take too many features, it doesn't much hurt so
>> you should always take as many as you can compute.
>> 
>> 
>> 
>>> On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter  wrote:
>>> 
>>> Hi,
>>> 
>>> does anyone know of a principled approach of choosing the number of
>>> features for ALS (other than cross-validation?)
>>> 
>>> --sebastian
>>> 


Re: Number of features for ALS

2014-03-27 Thread j.barrett Strausser
For my team it has usually been hetereoscedasticity and time inhomogeneity.




On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
wrote:

> Interesting topic,
> Ted, can you give examples of those mathematical assumptions
> under-pinning ALS which are violated by the real world?
>
> On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning 
> wrote:
> > How can there be any other practical method?  Essentially all of the
> > mathematical assumptions under-pinning ALS are violated by the real
> world.
> >  Why would any mathematical consideration of the number of features be
> much
> > more than heuristic?
> >
> > That said, you can make an information content argument.  You can also
> make
> > the argument that if you take too many features, it doesn't much hurt so
> > you should always take as many as you can compute.
> >
> >
> >
> > On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter 
> wrote:
> >
> >> Hi,
> >>
> >> does anyone know of a principled approach of choosing the number of
> >> features for ALS (other than cross-validation?)
> >>
> >> --sebastian
> >>
>



-- 


https://github.com/bearrito
@deepbearrito


Re: Number of features for ALS

2014-03-27 Thread Ted Dunning
For the poly-syllable challenged,

hetereoscedasticity - degree of variation changes.  This is common with
counts because you expect the standard deviation of count data to be
proportional to sqrt(n).

time imhogeneity - changes in behavior over time.  One way to handle this
(roughly) is to first remove variation in personal and item means over time
(if using ratings) and then to segment user histories into episodes.  By
including both short and long episodes you get some repair for changes in
personal preference.  A great example of how this works/breaks is Christmas
music.  On December 26th, you want to *stop* recommending this music so it
really pays to limit histories at this point.  By having an episodic user
session that starts around November and runs to Christmas, you can get good
recommendations for seasonal songs and not pollute the rest of the universe.



On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser <
j.barrett.straus...@gmail.com> wrote:

> For my team it has usually been hetereoscedasticity and time inhomogeneity.
>
>
>
>
> On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
> wrote:
>
> > Interesting topic,
> > Ted, can you give examples of those mathematical assumptions
> > under-pinning ALS which are violated by the real world?
> >
> > On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning 
> > wrote:
> > > How can there be any other practical method?  Essentially all of the
> > > mathematical assumptions under-pinning ALS are violated by the real
> > world.
> > >  Why would any mathematical consideration of the number of features be
> > much
> > > more than heuristic?
> > >
> > > That said, you can make an information content argument.  You can also
> > make
> > > the argument that if you take too many features, it doesn't much hurt
> so
> > > you should always take as many as you can compute.
> > >
> > >
> > >
> > > On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter 
> > wrote:
> > >
> > >> Hi,
> > >>
> > >> does anyone know of a principled approach of choosing the number of
> > >> features for ALS (other than cross-validation?)
> > >>
> > >> --sebastian
> > >>
> >
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>


Re: Number of features for ALS

2014-03-27 Thread j.barrett Strausser
Thanks Ted,

Yes for the time problem. We tend to use aggregations of session data. So
instead of asking for user recommendations we do things like user+sessions
recommendations.

Of course, deciding when sessions start and stop isn't trivial. I ideally
what I would want to is time-weight views using a kernel or convolution.
That's a bit heavy so we typically have a global model, that is is
basically all preferences over times. Then these user+session type models.
We can then combine these at another level to give recommendations based on
what you like throughout time versus what you have been doing recently.



-b


On Thu, Mar 27, 2014 at 1:59 PM, Ted Dunning  wrote:

> For the poly-syllable challenged,
>
> hetereoscedasticity - degree of variation changes.  This is common with
> counts because you expect the standard deviation of count data to be
> proportional to sqrt(n).
>
> time imhogeneity - changes in behavior over time.  One way to handle this
> (roughly) is to first remove variation in personal and item means over time
> (if using ratings) and then to segment user histories into episodes.  By
> including both short and long episodes you get some repair for changes in
> personal preference.  A great example of how this works/breaks is Christmas
> music.  On December 26th, you want to *stop* recommending this music so it
> really pays to limit histories at this point.  By having an episodic user
> session that starts around November and runs to Christmas, you can get good
> recommendations for seasonal songs and not pollute the rest of the
> universe.
>
>
>
> On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser <
> j.barrett.straus...@gmail.com> wrote:
>
> > For my team it has usually been hetereoscedasticity and time
> inhomogeneity.
> >
> >
> >
> >
> > On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
> > wrote:
> >
> > > Interesting topic,
> > > Ted, can you give examples of those mathematical assumptions
> > > under-pinning ALS which are violated by the real world?
> > >
> > > On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning 
> > > wrote:
> > > > How can there be any other practical method?  Essentially all of the
> > > > mathematical assumptions under-pinning ALS are violated by the real
> > > world.
> > > >  Why would any mathematical consideration of the number of features
> be
> > > much
> > > > more than heuristic?
> > > >
> > > > That said, you can make an information content argument.  You can
> also
> > > make
> > > > the argument that if you take too many features, it doesn't much hurt
> > so
> > > > you should always take as many as you can compute.
> > > >
> > > >
> > > >
> > > > On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter 
> > > wrote:
> > > >
> > > >> Hi,
> > > >>
> > > >> does anyone know of a principled approach of choosing the number of
> > > >> features for ALS (other than cross-validation?)
> > > >>
> > > >> --sebastian
> > > >>
> > >
> >
> >
> >
> > --
> >
> >
> > https://github.com/bearrito
> > @deepbearrito
> >
>



-- 


https://github.com/bearrito
@deepbearrito


Re: Number of features for ALS

2014-03-30 Thread Niklas Ekvall
Hi,

My name is Niklas Ekvall and I have a implementation of the recommender
algorithm "Large-scale Parallel Collaborative Filtering for the Netflix
Prize" and now I'm wondering how to choose the number of features and
lambda. Could any of guys help me to explain a stepwise strategy to choose
or optimize these two parameters?

Best regards, Niklas


2014-03-27 19:07 GMT+01:00 j.barrett Strausser <
j.barrett.straus...@gmail.com>:

> Thanks Ted,
>
> Yes for the time problem. We tend to use aggregations of session data. So
> instead of asking for user recommendations we do things like user+sessions
> recommendations.
>
> Of course, deciding when sessions start and stop isn't trivial. I ideally
> what I would want to is time-weight views using a kernel or convolution.
> That's a bit heavy so we typically have a global model, that is is
> basically all preferences over times. Then these user+session type models.
> We can then combine these at another level to give recommendations based on
> what you like throughout time versus what you have been doing recently.
>
>
>
> -b
>
>
> On Thu, Mar 27, 2014 at 1:59 PM, Ted Dunning 
> wrote:
>
> > For the poly-syllable challenged,
> >
> > hetereoscedasticity - degree of variation changes.  This is common with
> > counts because you expect the standard deviation of count data to be
> > proportional to sqrt(n).
> >
> > time imhogeneity - changes in behavior over time.  One way to handle this
> > (roughly) is to first remove variation in personal and item means over
> time
> > (if using ratings) and then to segment user histories into episodes.  By
> > including both short and long episodes you get some repair for changes in
> > personal preference.  A great example of how this works/breaks is
> Christmas
> > music.  On December 26th, you want to *stop* recommending this music so
> it
> > really pays to limit histories at this point.  By having an episodic user
> > session that starts around November and runs to Christmas, you can get
> good
> > recommendations for seasonal songs and not pollute the rest of the
> > universe.
> >
> >
> >
> > On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser <
> > j.barrett.straus...@gmail.com> wrote:
> >
> > > For my team it has usually been hetereoscedasticity and time
> > inhomogeneity.
> > >
> > >
> > >
> > >
> > > On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
> > > wrote:
> > >
> > > > Interesting topic,
> > > > Ted, can you give examples of those mathematical assumptions
> > > > under-pinning ALS which are violated by the real world?
> > > >
> > > > On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning 
> > > > wrote:
> > > > > How can there be any other practical method?  Essentially all of
> the
> > > > > mathematical assumptions under-pinning ALS are violated by the real
> > > > world.
> > > > >  Why would any mathematical consideration of the number of features
> > be
> > > > much
> > > > > more than heuristic?
> > > > >
> > > > > That said, you can make an information content argument.  You can
> > also
> > > > make
> > > > > the argument that if you take too many features, it doesn't much
> hurt
> > > so
> > > > > you should always take as many as you can compute.
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter <
> s...@apache.org>
> > > > wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> does anyone know of a principled approach of choosing the number
> of
> > > > >> features for ALS (other than cross-validation?)
> > > > >>
> > > > >> --sebastian
> > > > >>
> > > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > > https://github.com/bearrito
> > > @deepbearrito
> > >
> >
>
>
>
> --
>
>
> https://github.com/bearrito
> @deepbearrito
>


Re: Number of features for ALS

2014-03-30 Thread Sebastian Schelter
Use k-fold cross-validation or hold-out tests for estimating the quality 
of different parameter combinations.


--sebastian

On 03/30/2014 11:53 AM, Niklas Ekvall wrote:

Hi,

My name is Niklas Ekvall and I have a implementation of the recommender
algorithm "Large-scale Parallel Collaborative Filtering for the Netflix
Prize" and now I'm wondering how to choose the number of features and
lambda. Could any of guys help me to explain a stepwise strategy to choose
or optimize these two parameters?

Best regards, Niklas


2014-03-27 19:07 GMT+01:00 j.barrett Strausser <
j.barrett.straus...@gmail.com>:


Thanks Ted,

Yes for the time problem. We tend to use aggregations of session data. So
instead of asking for user recommendations we do things like user+sessions
recommendations.

Of course, deciding when sessions start and stop isn't trivial. I ideally
what I would want to is time-weight views using a kernel or convolution.
That's a bit heavy so we typically have a global model, that is is
basically all preferences over times. Then these user+session type models.
We can then combine these at another level to give recommendations based on
what you like throughout time versus what you have been doing recently.



-b


On Thu, Mar 27, 2014 at 1:59 PM, Ted Dunning 
wrote:


For the poly-syllable challenged,

hetereoscedasticity - degree of variation changes.  This is common with
counts because you expect the standard deviation of count data to be
proportional to sqrt(n).

time imhogeneity - changes in behavior over time.  One way to handle this
(roughly) is to first remove variation in personal and item means over

time

(if using ratings) and then to segment user histories into episodes.  By
including both short and long episodes you get some repair for changes in
personal preference.  A great example of how this works/breaks is

Christmas

music.  On December 26th, you want to *stop* recommending this music so

it

really pays to limit histories at this point.  By having an episodic user
session that starts around November and runs to Christmas, you can get

good

recommendations for seasonal songs and not pollute the rest of the
universe.



On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser <
j.barrett.straus...@gmail.com> wrote:


For my team it has usually been hetereoscedasticity and time

inhomogeneity.





On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
wrote:


Interesting topic,
Ted, can you give examples of those mathematical assumptions
under-pinning ALS which are violated by the real world?

On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning 
wrote:

How can there be any other practical method?  Essentially all of

the

mathematical assumptions under-pinning ALS are violated by the real

world.

  Why would any mathematical consideration of the number of features

be

much

more than heuristic?

That said, you can make an information content argument.  You can

also

make

the argument that if you take too many features, it doesn't much

hurt

so

you should always take as many as you can compute.



On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter <

s...@apache.org>

wrote:



Hi,

does anyone know of a principled approach of choosing the number

of

features for ALS (other than cross-validation?)

--sebastian







--


https://github.com/bearrito
@deepbearrito







--


https://github.com/bearrito
@deepbearrito







Re: Number of features for ALS

2014-03-30 Thread Niklas Ekvall
Hello Sebastian, could you do a deeper explanation or refer to any article
that handle the subject?

Best regards, Niklas


2014-03-30 20:50 GMT+02:00 Sebastian Schelter :

> Use k-fold cross-validation or hold-out tests for estimating the quality
> of different parameter combinations.
>
> --sebastian
>
>
> On 03/30/2014 11:53 AM, Niklas Ekvall wrote:
>
>> Hi,
>>
>> My name is Niklas Ekvall and I have a implementation of the recommender
>> algorithm "Large-scale Parallel Collaborative Filtering for the Netflix
>> Prize" and now I'm wondering how to choose the number of features and
>> lambda. Could any of guys help me to explain a stepwise strategy to choose
>> or optimize these two parameters?
>>
>> Best regards, Niklas
>>
>>
>> 2014-03-27 19:07 GMT+01:00 j.barrett Strausser <
>> j.barrett.straus...@gmail.com>:
>>
>>  Thanks Ted,
>>>
>>> Yes for the time problem. We tend to use aggregations of session data. So
>>> instead of asking for user recommendations we do things like
>>> user+sessions
>>> recommendations.
>>>
>>> Of course, deciding when sessions start and stop isn't trivial. I ideally
>>> what I would want to is time-weight views using a kernel or convolution.
>>> That's a bit heavy so we typically have a global model, that is is
>>> basically all preferences over times. Then these user+session type
>>> models.
>>> We can then combine these at another level to give recommendations based
>>> on
>>> what you like throughout time versus what you have been doing recently.
>>>
>>>
>>>
>>> -b
>>>
>>>
>>> On Thu, Mar 27, 2014 at 1:59 PM, Ted Dunning 
>>> wrote:
>>>
>>>  For the poly-syllable challenged,
>>>>
>>>> hetereoscedasticity - degree of variation changes.  This is common with
>>>> counts because you expect the standard deviation of count data to be
>>>> proportional to sqrt(n).
>>>>
>>>> time imhogeneity - changes in behavior over time.  One way to handle
>>>> this
>>>> (roughly) is to first remove variation in personal and item means over
>>>>
>>> time
>>>
>>>> (if using ratings) and then to segment user histories into episodes.  By
>>>> including both short and long episodes you get some repair for changes
>>>> in
>>>> personal preference.  A great example of how this works/breaks is
>>>>
>>> Christmas
>>>
>>>> music.  On December 26th, you want to *stop* recommending this music so
>>>>
>>> it
>>>
>>>> really pays to limit histories at this point.  By having an episodic
>>>> user
>>>> session that starts around November and runs to Christmas, you can get
>>>>
>>> good
>>>
>>>> recommendations for seasonal songs and not pollute the rest of the
>>>> universe.
>>>>
>>>>
>>>>
>>>> On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser <
>>>> j.barrett.straus...@gmail.com> wrote:
>>>>
>>>>  For my team it has usually been hetereoscedasticity and time
>>>>>
>>>> inhomogeneity.
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
>>>>> wrote:
>>>>>
>>>>>  Interesting topic,
>>>>>> Ted, can you give examples of those mathematical assumptions
>>>>>> under-pinning ALS which are violated by the real world?
>>>>>>
>>>>>> On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning 
>>>>>> wrote:
>>>>>>
>>>>>>> How can there be any other practical method?  Essentially all of
>>>>>>>
>>>>>> the
>>>
>>>> mathematical assumptions under-pinning ALS are violated by the real
>>>>>>>
>>>>>> world.
>>>>>>
>>>>>>>   Why would any mathematical consideration of the number of features
>>>>>>>
>>>>>> be
>>>>
>>>>> much
>>>>>>
>>>>>>> more than heuristic?
>>>>>>>
>>>>>>> That said, you can make an information content argument.  You can
>>>>>>>
>>>>>> also
>>>>
>>>>> make
>>>>>>
>>>>>>> the argument that if you take too many features, it doesn't much
>>>>>>>
>>>>>> hurt
>>>
>>>> so
>>>>>
>>>>>> you should always take as many as you can compute.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter <
>>>>>>>
>>>>>> s...@apache.org>
>>>
>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>  Hi,
>>>>>>>>
>>>>>>>> does anyone know of a principled approach of choosing the number
>>>>>>>>
>>>>>>> of
>>>
>>>> features for ALS (other than cross-validation?)
>>>>>>>>
>>>>>>>> --sebastian
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> https://github.com/bearrito
>>>>> @deepbearrito
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> https://github.com/bearrito
>>> @deepbearrito
>>>
>>>
>>
>


Re: Number of features for ALS

2014-03-30 Thread Ted Dunning
Niklas,

http://en.wikipedia.org/wiki/Cross-validation_(statistics)

http://statweb.stanford.edu/~tibs/sta306b/cvwrong.pdf



On Sun, Mar 30, 2014 at 12:41 PM, Niklas Ekvall wrote:

> Hello Sebastian, could you do a deeper explanation or refer to any article
> that handle the subject?
>
> Best regards, Niklas
>
>
> 2014-03-30 20:50 GMT+02:00 Sebastian Schelter :
>
> > Use k-fold cross-validation or hold-out tests for estimating the quality
> > of different parameter combinations.
> >
> > --sebastian
> >
> >
> > On 03/30/2014 11:53 AM, Niklas Ekvall wrote:
> >
> >> Hi,
> >>
> >> My name is Niklas Ekvall and I have a implementation of the recommender
> >> algorithm "Large-scale Parallel Collaborative Filtering for the Netflix
> >> Prize" and now I'm wondering how to choose the number of features and
> >> lambda. Could any of guys help me to explain a stepwise strategy to
> choose
> >> or optimize these two parameters?
> >>
> >> Best regards, Niklas
> >>
> >>
> >> 2014-03-27 19:07 GMT+01:00 j.barrett Strausser <
> >> j.barrett.straus...@gmail.com>:
> >>
> >>  Thanks Ted,
> >>>
> >>> Yes for the time problem. We tend to use aggregations of session data.
> So
> >>> instead of asking for user recommendations we do things like
> >>> user+sessions
> >>> recommendations.
> >>>
> >>> Of course, deciding when sessions start and stop isn't trivial. I
> ideally
> >>> what I would want to is time-weight views using a kernel or
> convolution.
> >>> That's a bit heavy so we typically have a global model, that is is
> >>> basically all preferences over times. Then these user+session type
> >>> models.
> >>> We can then combine these at another level to give recommendations
> based
> >>> on
> >>> what you like throughout time versus what you have been doing recently.
> >>>
> >>>
> >>>
> >>> -b
> >>>
> >>>
> >>> On Thu, Mar 27, 2014 at 1:59 PM, Ted Dunning 
> >>> wrote:
> >>>
> >>>  For the poly-syllable challenged,
> >>>>
> >>>> hetereoscedasticity - degree of variation changes.  This is common
> with
> >>>> counts because you expect the standard deviation of count data to be
> >>>> proportional to sqrt(n).
> >>>>
> >>>> time imhogeneity - changes in behavior over time.  One way to handle
> >>>> this
> >>>> (roughly) is to first remove variation in personal and item means over
> >>>>
> >>> time
> >>>
> >>>> (if using ratings) and then to segment user histories into episodes.
>  By
> >>>> including both short and long episodes you get some repair for changes
> >>>> in
> >>>> personal preference.  A great example of how this works/breaks is
> >>>>
> >>> Christmas
> >>>
> >>>> music.  On December 26th, you want to *stop* recommending this music
> so
> >>>>
> >>> it
> >>>
> >>>> really pays to limit histories at this point.  By having an episodic
> >>>> user
> >>>> session that starts around November and runs to Christmas, you can get
> >>>>
> >>> good
> >>>
> >>>> recommendations for seasonal songs and not pollute the rest of the
> >>>> universe.
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser <
> >>>> j.barrett.straus...@gmail.com> wrote:
> >>>>
> >>>>  For my team it has usually been hetereoscedasticity and time
> >>>>>
> >>>> inhomogeneity.
> >>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
> >>>>> wrote:
> >>>>>
> >>>>>  Interesting topic,
> >>>>>> Ted, can you give examples of those mathematical assumptions
> >>>>>> under-pinning ALS which are violated by the real world?
> >>>>>>
> >>>>>> On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning  >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> How can there be any other practical method?  Essentially all of
> >>>>>>>
> >>>>>> the
> >>>
> >>>> mathematical assumptions under-pinning ALS are violated by the real
> >>>>>>>
> >>>>>> world.
> >>>>>>
> >>>>>>>   Why would any mathematical consideration of the number of
> features
> >>>>>>>
> >>>>>> be
> >>>>
> >>>>> much
> >>>>>>
> >>>>>>> more than heuristic?
> >>>>>>>
> >>>>>>> That said, you can make an information content argument.  You can
> >>>>>>>
> >>>>>> also
> >>>>
> >>>>> make
> >>>>>>
> >>>>>>> the argument that if you take too many features, it doesn't much
> >>>>>>>
> >>>>>> hurt
> >>>
> >>>> so
> >>>>>
> >>>>>> you should always take as many as you can compute.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter <
> >>>>>>>
> >>>>>> s...@apache.org>
> >>>
> >>>> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>>  Hi,
> >>>>>>>>
> >>>>>>>> does anyone know of a principled approach of choosing the number
> >>>>>>>>
> >>>>>>> of
> >>>
> >>>> features for ALS (other than cross-validation?)
> >>>>>>>>
> >>>>>>>> --sebastian
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>>
> >>>>> https://github.com/bearrito
> >>>>> @deepbearrito
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>>
> >>> https://github.com/bearrito
> >>> @deepbearrito
> >>>
> >>>
> >>
> >
>


Re: Number of features for ALS

2014-03-30 Thread Pat Ferrel
>>>> On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser <
>>>>> j.barrett.straus...@gmail.com> wrote:
>>>>> 
>>>>> For my team it has usually been hetereoscedasticity and time
>>>>>> 
>>>>> inhomogeneity.
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
>>>>>> wrote:
>>>>>> 
>>>>>> Interesting topic,
>>>>>>> Ted, can you give examples of those mathematical assumptions
>>>>>>> under-pinning ALS which are violated by the real world?
>>>>>>> 
>>>>>>> On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning > 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> How can there be any other practical method?  Essentially all of
>>>>>>>> 
>>>>>>> the
>>>> 
>>>>> mathematical assumptions under-pinning ALS are violated by the real
>>>>>>>> 
>>>>>>> world.
>>>>>>> 
>>>>>>>>  Why would any mathematical consideration of the number of
> features
>>>>>>>> 
>>>>>>> be
>>>>> 
>>>>>> much
>>>>>>> 
>>>>>>>> more than heuristic?
>>>>>>>> 
>>>>>>>> That said, you can make an information content argument.  You can
>>>>>>>> 
>>>>>>> also
>>>>> 
>>>>>> make
>>>>>>> 
>>>>>>>> the argument that if you take too many features, it doesn't much
>>>>>>>> 
>>>>>>> hurt
>>>> 
>>>>> so
>>>>>> 
>>>>>>> you should always take as many as you can compute.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter <
>>>>>>>> 
>>>>>>> s...@apache.org>
>>>> 
>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> does anyone know of a principled approach of choosing the number
>>>>>>>>> 
>>>>>>>> of
>>>> 
>>>>> features for ALS (other than cross-validation?)
>>>>>>>>> 
>>>>>>>>> --sebastian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> https://github.com/bearrito
>>>>>> @deepbearrito
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> 
>>>> https://github.com/bearrito
>>>> @deepbearrito
>>>> 
>>>> 
>>> 
>> 
> 



Re: Number of features for ALS

2014-03-30 Thread Ted Dunning
Yeah... what Pat said.

Off-line evaluations are difficult.  At most, they provide directional
guidance to be refined using live A/B testing.  Of course, A/B testing of
recommenders comes with a new set of tricky issues like different
recommenders learning from each other.

On Sun, Mar 30, 2014 at 4:54 PM, Pat Ferrel  wrote:

> Seems like most people agree that ranking is more important than rating in
> most recommender deployments. RMSE was used for a long time with
> cross-validation (partly because it was the choice of Netflix during the
> competition) but it is really a measure of total rating error.  In the past
> we’ve used mean-average-precision as a good measure of ranking quality. We
> chose hold-out tests based on time, so something like 10% of the most
> recent data was held out for cross-validaton and we measured MAP@n for
> tuning parameters.
>
> http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision
>
> For our data (ecommerce shopping data) most of the ALS tuning parameters
> had very little affect on MAP. However cooccurrence recommenders performed
> much better using the same data. Unfortunately comparing two algorithms
> with offline tests is of questionable value. Still with nothing else to go
> on we went with the cooccurrence recommender.
>
>


Re: Number of features for ALS

2014-04-06 Thread Niklas Ekvall
Hi Pat and Ted!

Yes I agree with about the rank and MAP. But in this case, that is a good
initial guess on the parameters *number of features* and *lambda*?

Where can I find the best article about cooccurrence recommender? And can
one use this approach for different types of data, e.g., ratings, purchase
histories or click histories?

Best, Niklas


2014-03-31 7:53 GMT+02:00 Ted Dunning :

> Yeah... what Pat said.
>
> Off-line evaluations are difficult.  At most, they provide directional
> guidance to be refined using live A/B testing.  Of course, A/B testing of
> recommenders comes with a new set of tricky issues like different
> recommenders learning from each other.
>
> On Sun, Mar 30, 2014 at 4:54 PM, Pat Ferrel  wrote:
>
> > Seems like most people agree that ranking is more important than rating
> in
> > most recommender deployments. RMSE was used for a long time with
> > cross-validation (partly because it was the choice of Netflix during the
> > competition) but it is really a measure of total rating error.  In the
> past
> > we've used mean-average-precision as a good measure of ranking quality.
> We
> > chose hold-out tests based on time, so something like 10% of the most
> > recent data was held out for cross-validaton and we measured MAP@n for
> > tuning parameters.
> >
> >
> http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision
> >
> > For our data (ecommerce shopping data) most of the ALS tuning parameters
> > had very little affect on MAP. However cooccurrence recommenders
> performed
> > much better using the same data. Unfortunately comparing two algorithms
> > with offline tests is of questionable value. Still with nothing else to
> go
> > on we went with the cooccurrence recommender.
> >
> >
>


Re: Number of features for ALS

2014-04-06 Thread Pat Ferrel
> 
> On Apr 6, 2014, at 2:48 AM, Niklas Ekvall  wrote:
> 
> Hi Pat and Ted!
> 
> Yes I agree with about the rank and MAP. But in this case, that is a good
> initial guess on the parameters *number of features* and *lambda*?

20 or 30 features depending on the variance in your data, more is theoretically 
better but usually give rapidly diminishing returns. I forget what lambdas we 
tried

> 
> Where can I find the best article about cooccurrence recommender? And can
> one use this approach for different types of data, e.g., ratings, purchase
> histories or click histories?

Absolutely, but remember that the data you train on is what you are 
recommending. So if you train on detail-views (click paths) the recommender 
will return items to look at, not necessarily the same as items to purchase. If 
you train on what you want to recommend then all of the above will work.

If you want to train on click-paths and recommend purchase you probably need a 
cross-recommender another discussion altogether.

> 
> Best, Niklas
> 
> 
> 2014-03-31 7:53 GMT+02:00 Ted Dunning :
> 
>> Yeah... what Pat said.
>> 
>> Off-line evaluations are difficult.  At most, they provide directional
>> guidance to be refined using live A/B testing.  Of course, A/B testing of
>> recommenders comes with a new set of tricky issues like different
>> recommenders learning from each other.
>> 
>> On Sun, Mar 30, 2014 at 4:54 PM, Pat Ferrel  wrote:
>> 
>>> Seems like most people agree that ranking is more important than rating
>> in
>>> most recommender deployments. RMSE was used for a long time with
>>> cross-validation (partly because it was the choice of Netflix during the
>>> competition) but it is really a measure of total rating error.  In the
>> past
>>> we've used mean-average-precision as a good measure of ranking quality.
>> We
>>> chose hold-out tests based on time, so something like 10% of the most
>>> recent data was held out for cross-validaton and we measured MAP@n for
>>> tuning parameters.
>>> 
>>> 
>> http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision
>>> 
>>> For our data (ecommerce shopping data) most of the ALS tuning parameters
>>> had very little affect on MAP. However cooccurrence recommenders
>> performed
>>> much better using the same data. Unfortunately comparing two algorithms
>>> with offline tests is of questionable value. Still with nothing else to
>> go
>>> on we went with the cooccurrence recommender.
>>> 
>>> 
>> 
> 


Re: Number of features for ALS

2014-04-06 Thread Niklas Ekvall
Thanks Pat!

I did find a book by Ted Dunning and Ellen Friedman (Practical Machine
Learning: Innovations in Recommendations) I guess I can us it to read more
about co-occurrence recommender or co-occurrence analysis.

Best, Niklas



2014-04-06 19:37 GMT+02:00 Pat Ferrel :

> >
> > On Apr 6, 2014, at 2:48 AM, Niklas Ekvall 
> wrote:
> >
> > Hi Pat and Ted!
> >
> > Yes I agree with about the rank and MAP. But in this case, that is a good
> > initial guess on the parameters *number of features* and *lambda*?
>
> 20 or 30 features depending on the variance in your data, more is
> theoretically better but usually give rapidly diminishing returns. I forget
> what lambdas we tried
>
> >
> > Where can I find the best article about cooccurrence recommender? And can
> > one use this approach for different types of data, e.g., ratings,
> purchase
> > histories or click histories?
>
> Absolutely, but remember that the data you train on is what you are
> recommending. So if you train on detail-views (click paths) the recommender
> will return items to look at, not necessarily the same as items to
> purchase. If you train on what you want to recommend then all of the above
> will work.
>
> If you want to train on click-paths and recommend purchase you probably
> need a cross-recommender another discussion altogether.
>
> >
> > Best, Niklas
> >
> >
> > 2014-03-31 7:53 GMT+02:00 Ted Dunning :
> >
> >> Yeah... what Pat said.
> >>
> >> Off-line evaluations are difficult.  At most, they provide directional
> >> guidance to be refined using live A/B testing.  Of course, A/B testing
> of
> >> recommenders comes with a new set of tricky issues like different
> >> recommenders learning from each other.
> >>
> >> On Sun, Mar 30, 2014 at 4:54 PM, Pat Ferrel 
> wrote:
> >>
> >>> Seems like most people agree that ranking is more important than rating
> >> in
> >>> most recommender deployments. RMSE was used for a long time with
> >>> cross-validation (partly because it was the choice of Netflix during
> the
> >>> competition) but it is really a measure of total rating error.  In the
> >> past
> >>> we've used mean-average-precision as a good measure of ranking quality.
> >> We
> >>> chose hold-out tests based on time, so something like 10% of the most
> >>> recent data was held out for cross-validaton and we measured MAP@n for
> >>> tuning parameters.
> >>>
> >>>
> >>
> http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision
> >>>
> >>> For our data (ecommerce shopping data) most of the ALS tuning
> parameters
> >>> had very little affect on MAP. However cooccurrence recommenders
> >> performed
> >>> much better using the same data. Unfortunately comparing two algorithms
> >>> with offline tests is of questionable value. Still with nothing else to
> >> go
> >>> on we went with the cooccurrence recommender.
> >>>
> >>>
> >>
> >
>


Re: Number of features for ALS

2014-04-07 Thread Ted Dunning
That book is a fine beginning, but doesn't have a lot of detail.

Check out Pat's very nice demo site for more information.  I have also
given a ton of talks on the subject.

And, to answer your question, cooccurrence recommendation works great with
diverse sources of behavior.



On Sun, Apr 6, 2014 at 8:40 PM, Niklas Ekvall wrote:

> Thanks Pat!
>
> I did find a book by Ted Dunning and Ellen Friedman (Practical Machine
> Learning: Innovations in Recommendations) I guess I can us it to read more
> about co-occurrence recommender or co-occurrence analysis.
>
> Best, Niklas
>
>
>
> 2014-04-06 19:37 GMT+02:00 Pat Ferrel :
>
> > >
> > > On Apr 6, 2014, at 2:48 AM, Niklas Ekvall 
> > wrote:
> > >
> > > Hi Pat and Ted!
> > >
> > > Yes I agree with about the rank and MAP. But in this case, that is a
> good
> > > initial guess on the parameters *number of features* and *lambda*?
> >
> > 20 or 30 features depending on the variance in your data, more is
> > theoretically better but usually give rapidly diminishing returns. I
> forget
> > what lambdas we tried
> >
> > >
> > > Where can I find the best article about cooccurrence recommender? And
> can
> > > one use this approach for different types of data, e.g., ratings,
> > purchase
> > > histories or click histories?
> >
> > Absolutely, but remember that the data you train on is what you are
> > recommending. So if you train on detail-views (click paths) the
> recommender
> > will return items to look at, not necessarily the same as items to
> > purchase. If you train on what you want to recommend then all of the
> above
> > will work.
> >
> > If you want to train on click-paths and recommend purchase you probably
> > need a cross-recommender another discussion altogether.
> >
> > >
> > > Best, Niklas
> > >
> > >
> > > 2014-03-31 7:53 GMT+02:00 Ted Dunning :
> > >
> > >> Yeah... what Pat said.
> > >>
> > >> Off-line evaluations are difficult.  At most, they provide directional
> > >> guidance to be refined using live A/B testing.  Of course, A/B testing
> > of
> > >> recommenders comes with a new set of tricky issues like different
> > >> recommenders learning from each other.
> > >>
> > >> On Sun, Mar 30, 2014 at 4:54 PM, Pat Ferrel 
> > wrote:
> > >>
> > >>> Seems like most people agree that ranking is more important than
> rating
> > >> in
> > >>> most recommender deployments. RMSE was used for a long time with
> > >>> cross-validation (partly because it was the choice of Netflix during
> > the
> > >>> competition) but it is really a measure of total rating error.  In
> the
> > >> past
> > >>> we've used mean-average-precision as a good measure of ranking
> quality.
> > >> We
> > >>> chose hold-out tests based on time, so something like 10% of the most
> > >>> recent data was held out for cross-validaton and we measured MAP@nfor
> > >>> tuning parameters.
> > >>>
> > >>>
> > >>
> >
> http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision
> > >>>
> > >>> For our data (ecommerce shopping data) most of the ALS tuning
> > parameters
> > >>> had very little affect on MAP. However cooccurrence recommenders
> > >> performed
> > >>> much better using the same data. Unfortunately comparing two
> algorithms
> > >>> with offline tests is of questionable value. Still with nothing else
> to
> > >> go
> > >>> on we went with the cooccurrence recommender.
> > >>>
> > >>>
> > >>
> > >
> >
>


Re: Number of features for ALS

2014-04-08 Thread Niklas Ekvall
Thank you Ted!

Do plan to do any talks in Sweden soon?

Best, Niklas


2014-04-07 14:52 GMT+02:00 Ted Dunning :

> That book is a fine beginning, but doesn't have a lot of detail.
>
> Check out Pat's very nice demo site for more information.  I have also
> given a ton of talks on the subject.
>
> And, to answer your question, cooccurrence recommendation works great with
> diverse sources of behavior.
>
>
>
> On Sun, Apr 6, 2014 at 8:40 PM, Niklas Ekvall  >wrote:
>
> > Thanks Pat!
> >
> > I did find a book by Ted Dunning and Ellen Friedman (Practical Machine
> > Learning: Innovations in Recommendations) I guess I can us it to read
> more
> > about co-occurrence recommender or co-occurrence analysis.
> >
> > Best, Niklas
> >
> >
> >
> > 2014-04-06 19:37 GMT+02:00 Pat Ferrel :
> >
> > > >
> > > > On Apr 6, 2014, at 2:48 AM, Niklas Ekvall 
> > > wrote:
> > > >
> > > > Hi Pat and Ted!
> > > >
> > > > Yes I agree with about the rank and MAP. But in this case, that is a
> > good
> > > > initial guess on the parameters *number of features* and *lambda*?
> > >
> > > 20 or 30 features depending on the variance in your data, more is
> > > theoretically better but usually give rapidly diminishing returns. I
> > forget
> > > what lambdas we tried
> > >
> > > >
> > > > Where can I find the best article about cooccurrence recommender? And
> > can
> > > > one use this approach for different types of data, e.g., ratings,
> > > purchase
> > > > histories or click histories?
> > >
> > > Absolutely, but remember that the data you train on is what you are
> > > recommending. So if you train on detail-views (click paths) the
> > recommender
> > > will return items to look at, not necessarily the same as items to
> > > purchase. If you train on what you want to recommend then all of the
> > above
> > > will work.
> > >
> > > If you want to train on click-paths and recommend purchase you probably
> > > need a cross-recommender another discussion altogether.
> > >
> > > >
> > > > Best, Niklas
> > > >
> > > >
> > > > 2014-03-31 7:53 GMT+02:00 Ted Dunning :
> > > >
> > > >> Yeah... what Pat said.
> > > >>
> > > >> Off-line evaluations are difficult.  At most, they provide
> directional
> > > >> guidance to be refined using live A/B testing.  Of course, A/B
> testing
> > > of
> > > >> recommenders comes with a new set of tricky issues like different
> > > >> recommenders learning from each other.
> > > >>
> > > >> On Sun, Mar 30, 2014 at 4:54 PM, Pat Ferrel 
> > > wrote:
> > > >>
> > > >>> Seems like most people agree that ranking is more important than
> > rating
> > > >> in
> > > >>> most recommender deployments. RMSE was used for a long time with
> > > >>> cross-validation (partly because it was the choice of Netflix
> during
> > > the
> > > >>> competition) but it is really a measure of total rating error.  In
> > the
> > > >> past
> > > >>> we've used mean-average-precision as a good measure of ranking
> > quality.
> > > >> We
> > > >>> chose hold-out tests based on time, so something like 10% of the
> most
> > > >>> recent data was held out for cross-validaton and we measured
> MAP@nfor
> > > >>> tuning parameters.
> > > >>>
> > > >>>
> > > >>
> > >
> >
> http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision
> > > >>>
> > > >>> For our data (ecommerce shopping data) most of the ALS tuning
> > > parameters
> > > >>> had very little affect on MAP. However cooccurrence recommenders
> > > >> performed
> > > >>> much better using the same data. Unfortunately comparing two
> > algorithms
> > > >>> with offline tests is of questionable value. Still with nothing
> else
> > to
> > > >> go
> > > >>> on we went with the cooccurrence recommender.
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>


Re: Number of features for ALS

2014-04-08 Thread Ted Dunning
On Tue, Apr 8, 2014 at 9:40 AM, Niklas Ekvall wrote:

> Do plan to do any talks in Sweden soon?
>

Is last week soon enough?

:-(