Not yet since it's running in the cluster. Will run locally with
profiler. Thanks for help.

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Wed, Apr 23, 2014 at 10:22 PM, David Hall <d...@cs.berkeley.edu> wrote:
> On Wed, Apr 23, 2014 at 10:18 PM, DB Tsai <dbt...@dbtsai.com> wrote:
>>
>> ps, it doesn't make sense to have weight and gradient sparse unless
>> with strong L1 penalty.
>
>
> Sure, I was just checking the obvious things. Have you run it through it a
> profiler to see where the problem is?
>
>
>>
>>
>> Sincerely,
>>
>> DB Tsai
>> -------------------------------------------------------
>> My Blog: https://www.dbtsai.com
>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>
>>
>> On Wed, Apr 23, 2014 at 10:17 PM, DB Tsai <dbt...@dbtsai.com> wrote:
>> > In mllib, the weight, and gradient are dense. Only feature is sparse.
>> >
>> > Sincerely,
>> >
>> > DB Tsai
>> > -------------------------------------------------------
>> > My Blog: https://www.dbtsai.com
>> > LinkedIn: https://www.linkedin.com/in/dbtsai
>> >
>> >
>> > On Wed, Apr 23, 2014 at 10:16 PM, David Hall <d...@cs.berkeley.edu>
>> > wrote:
>> >> Was the weight vector sparse? The gradients? Or just the feature
>> >> vectors?
>> >>
>> >>
>> >> On Wed, Apr 23, 2014 at 10:08 PM, DB Tsai <dbt...@dbtsai.com> wrote:
>> >>>
>> >>> The figure showing the Log-Likelihood vs Time can be found here.
>> >>>
>> >>>
>> >>>
>> >>> https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf
>> >>>
>> >>> Let me know if you can not open it.
>> >>>
>> >>> Sincerely,
>> >>>
>> >>> DB Tsai
>> >>> -------------------------------------------------------
>> >>> My Blog: https://www.dbtsai.com
>> >>> LinkedIn: https://www.linkedin.com/in/dbtsai
>> >>>
>> >>>
>> >>> On Wed, Apr 23, 2014 at 9:34 PM, Shivaram Venkataraman <
>> >>> shiva...@eecs.berkeley.edu> wrote:
>> >>>
>> >>> > I don't think the attachment came through in the list. Could you
>> >>> > upload
>> >>> > the results somewhere and link to them ?
>> >>> >
>> >>> >
>> >>> > On Wed, Apr 23, 2014 at 9:32 PM, DB Tsai <dbt...@dbtsai.com> wrote:
>> >>> >
>> >>> >> 123 features per rows, and in average, 89% are zeros.
>> >>> >> On Apr 23, 2014 9:31 PM, "Evan Sparks" <evan.spa...@gmail.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >> > What is the number of non zeroes per row (and number of features)
>> >>> >> > in
>> >>> >> > the
>> >>> >> > sparse case? We've hit some issues with breeze sparse support in
>> >>> >> > the
>> >>> >> past
>> >>> >> > but for sufficiently sparse data it's still pretty good.
>> >>> >> >
>> >>> >> > > On Apr 23, 2014, at 9:21 PM, DB Tsai <dbt...@stanford.edu>
>> >>> >> > > wrote:
>> >>> >> > >
>> >>> >> > > Hi all,
>> >>> >> > >
>> >>> >> > > I'm benchmarking Logistic Regression in MLlib using the newly
>> >>> >> > > added
>> >>> >> > optimizer LBFGS and GD. I'm using the same dataset and the same
>> >>> >> methodology
>> >>> >> > in this paper, http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf
>> >>> >> > >
>> >>> >> > > I want to know how Spark scale while adding workers, and how
>> >>> >> optimizers
>> >>> >> > and input format (sparse or dense) impact performance.
>> >>> >> > >
>> >>> >> > > The benchmark code can be found here,
>> >>> >> > https://github.com/dbtsai/spark-lbfgs-benchmark
>> >>> >> > >
>> >>> >> > > The first dataset I benchmarked is a9a which only has 2.2MB. I
>> >>> >> > duplicated the dataset, and made it 762MB to have 11M rows. This
>> >>> >> > dataset
>> >>> >> > has 123 features and 11% of the data are non-zero elements.
>> >>> >> > >
>> >>> >> > > In this benchmark, all the dataset is cached in memory.
>> >>> >> > >
>> >>> >> > > As we expect, LBFGS converges faster than GD, and at some
>> >>> >> > > point, no
>> >>> >> > matter how we push GD, it will converge slower and slower.
>> >>> >> > >
>> >>> >> > > However, it's surprising that sparse format runs slower than
>> >>> >> > > dense
>> >>> >> > format. I did see that sparse format takes significantly smaller
>> >>> >> > amount
>> >>> >> of
>> >>> >> > memory in caching RDD, but sparse is 40% slower than dense. I
>> >>> >> > think
>> >>> >> sparse
>> >>> >> > should be fast since when we compute x wT, since x is sparse, we
>> >>> >> > can
>> >>> >> > do
>> >>> >> it
>> >>> >> > faster. I wonder if there is anything I'm doing wrong.
>> >>> >> > >
>> >>> >> > > The attachment is the benchmark result.
>> >>> >> > >
>> >>> >> > > Thanks.
>> >>> >> > >
>> >>> >> > > Sincerely,
>> >>> >> > >
>> >>> >> > > DB Tsai
>> >>> >> > > -------------------------------------------------------
>> >>> >> > > My Blog: https://www.dbtsai.com
>> >>> >> > > LinkedIn: https://www.linkedin.com/in/dbtsai
>> >>> >> >
>> >>> >>
>> >>> >
>> >>> >
>> >>
>> >>
>
>

Reply via email to