A single pass with L2 regularization is relatively easy with a small number
of variables since the penalty is a quadratic form.

For L1 or elastic band, this is much more difficult.  It isn't clear to me
that Michael is still proposing a single pass algorithm for that or not.

It isn't hard to reduce the fitting part of the problem to a relatively
small dense problem if you have a small number of variables.  From there,
iterating on a single machine is pretty easy and as Michael says, the 1000
variable case can be solved quickly.  Presumably the 5000x5000 case can be
solved quickly enough to be still interesting.

The basic idea there is to convert a disk resident problem into a memory
resident problem.  Whether the memory resident part is solved in parallel
or not is a separate question.


To the extent non-sparse problems are important, this is a good step.  My
experience is that big-data optimizers almost always have sparse inputs.




On Sun, Jul 21, 2013 at 2:38 PM, DB Tsai <dbt...@dbtsai.com> wrote:

> Coordinate decent is essentially a iterative algorithm,  how can you do it
> in single pass of data with L2 regularization?
> On Jul 21, 2013 2:09 PM, "Michael Kun Yang" <kuny...@stanford.edu> wrote:
>
>> I will update the document to detail the algorithm.
>>
>>
>> On Sun, Jul 21, 2013 at 1:50 PM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>>
>> > On Sun, Jul 21, 2013 at 1:41 PM, Kun Yang <kuny...@stanford.edu> wrote:
>> >
>> > > The algorithm is not solving the normal equation as in the ordinary
>> > linear
>> > > regression. I did not detail the algorithm to solve the penalized
>> > > optimization in the paper. To solve the penalized version, I will use
>> the
>> > > coordinate descent which is well documented in other paper (see
>> > Freedman's
>> > > paper, for 1000 variables, it takes ~1min to do cross validation in
>> the R
>> > > package) and is very efficient.
>> > >
>> > > As I discussed in the conclusion section, to solve the problem with
>> large
>> > > number of predictors, it is still a challenge even though in the
>> single
>> > > machine or MPI version, but the proposed algorithm can handle the
>> number
>> > of
>> > > variable at the order of 5000 and it covers lots of applications.
>> > >
>> >
>> > Should the document be updated to describe what you intend to do?
>> >
>>
>

Reply via email to