The key tricks are: - do the updates of the averaged model in a sparse fashion. This will require doubling the space kept by the model
- determine when to switch to averaging In addition we should bring in at the same time - more flexibility on loss function (to allow the code to implement SVM) On Thu, Nov 17, 2011 at 2:26 AM, urun dogan <[email protected]> wrote: > Hi Ted; > > I start to read the paper and I think I will finish it today. It is a quite > nice approach and > thanks for supervision. > > Cheers > Ürün > > On Wed, Nov 16, 2011 at 8:14 PM, Ted Dunning <[email protected]> > wrote: > > > On Wed, Nov 16, 2011 at 9:50 AM, urun dogan <[email protected]> wrote: > > > > > > > > I have written the previous email before reading Josh's email. Are > there > > > any objections if I conclude that: implementation of SGD/ASGD based > > methods > > > have priority and therefore I will start implement these methods soon ? > > > > > > > I think that they are important. But I haven't been able to partition > off > > enough time to actually do it so my vote is degraded somewhat. I do know > > that people I have worked with would benefit from the results shown in > the > > Xu paper. > > > > @Ted: If this is the case, I am looking forward to have your supervision > > > about this issue. > > > > > > > Excellent. > > > > Have you looked at the Xu paper? > > >
