On Wed, Apr 23, 2014 at 10:18 PM, DB Tsai <dbt...@dbtsai.com> wrote: > ps, it doesn't make sense to have weight and gradient sparse unless > with strong L1 penalty. >
Sure, I was just checking the obvious things. Have you run it through it a profiler to see where the problem is? > > Sincerely, > > DB Tsai > ------------------------------------------------------- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Wed, Apr 23, 2014 at 10:17 PM, DB Tsai <dbt...@dbtsai.com> wrote: > > In mllib, the weight, and gradient are dense. Only feature is sparse. > > > > Sincerely, > > > > DB Tsai > > ------------------------------------------------------- > > My Blog: https://www.dbtsai.com > > LinkedIn: https://www.linkedin.com/in/dbtsai > > > > > > On Wed, Apr 23, 2014 at 10:16 PM, David Hall <d...@cs.berkeley.edu> > wrote: > >> Was the weight vector sparse? The gradients? Or just the feature > vectors? > >> > >> > >> On Wed, Apr 23, 2014 at 10:08 PM, DB Tsai <dbt...@dbtsai.com> wrote: > >>> > >>> The figure showing the Log-Likelihood vs Time can be found here. > >>> > >>> > >>> > https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf > >>> > >>> Let me know if you can not open it. > >>> > >>> Sincerely, > >>> > >>> DB Tsai > >>> ------------------------------------------------------- > >>> My Blog: https://www.dbtsai.com > >>> LinkedIn: https://www.linkedin.com/in/dbtsai > >>> > >>> > >>> On Wed, Apr 23, 2014 at 9:34 PM, Shivaram Venkataraman < > >>> shiva...@eecs.berkeley.edu> wrote: > >>> > >>> > I don't think the attachment came through in the list. Could you > upload > >>> > the results somewhere and link to them ? > >>> > > >>> > > >>> > On Wed, Apr 23, 2014 at 9:32 PM, DB Tsai <dbt...@dbtsai.com> wrote: > >>> > > >>> >> 123 features per rows, and in average, 89% are zeros. > >>> >> On Apr 23, 2014 9:31 PM, "Evan Sparks" <evan.spa...@gmail.com> > wrote: > >>> >> > >>> >> > What is the number of non zeroes per row (and number of features) > in > >>> >> > the > >>> >> > sparse case? We've hit some issues with breeze sparse support in > the > >>> >> past > >>> >> > but for sufficiently sparse data it's still pretty good. > >>> >> > > >>> >> > > On Apr 23, 2014, at 9:21 PM, DB Tsai <dbt...@stanford.edu> > wrote: > >>> >> > > > >>> >> > > Hi all, > >>> >> > > > >>> >> > > I'm benchmarking Logistic Regression in MLlib using the newly > added > >>> >> > optimizer LBFGS and GD. I'm using the same dataset and the same > >>> >> methodology > >>> >> > in this paper, http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf > >>> >> > > > >>> >> > > I want to know how Spark scale while adding workers, and how > >>> >> optimizers > >>> >> > and input format (sparse or dense) impact performance. > >>> >> > > > >>> >> > > The benchmark code can be found here, > >>> >> > https://github.com/dbtsai/spark-lbfgs-benchmark > >>> >> > > > >>> >> > > The first dataset I benchmarked is a9a which only has 2.2MB. I > >>> >> > duplicated the dataset, and made it 762MB to have 11M rows. This > >>> >> > dataset > >>> >> > has 123 features and 11% of the data are non-zero elements. > >>> >> > > > >>> >> > > In this benchmark, all the dataset is cached in memory. > >>> >> > > > >>> >> > > As we expect, LBFGS converges faster than GD, and at some > point, no > >>> >> > matter how we push GD, it will converge slower and slower. > >>> >> > > > >>> >> > > However, it's surprising that sparse format runs slower than > dense > >>> >> > format. I did see that sparse format takes significantly smaller > >>> >> > amount > >>> >> of > >>> >> > memory in caching RDD, but sparse is 40% slower than dense. I > think > >>> >> sparse > >>> >> > should be fast since when we compute x wT, since x is sparse, we > can > >>> >> > do > >>> >> it > >>> >> > faster. I wonder if there is anything I'm doing wrong. > >>> >> > > > >>> >> > > The attachment is the benchmark result. > >>> >> > > > >>> >> > > Thanks. > >>> >> > > > >>> >> > > Sincerely, > >>> >> > > > >>> >> > > DB Tsai > >>> >> > > ------------------------------------------------------- > >>> >> > > My Blog: https://www.dbtsai.com > >>> >> > > LinkedIn: https://www.linkedin.com/in/dbtsai > >>> >> > > >>> >> > >>> > > >>> > > >> > >> >