Re: Spark LogisticRegression got stuck on dataset with millions of columns

2019-04-23 Thread Weichen Xu
Could you provide your code, and running cluster info ? On Tue, Apr 23, 2019 at 4:10 PM Qian He wrote: > The dataset was using a sparse representation before feeding into > LogisticRegression. > > On Tue, Apr 23, 2019 at 3:15 PM Weichen Xu > wrote: > >> Hi Qian, >> >> Do your dataset use

Re: Spark LogisticRegression got stuck on dataset with millions of columns

2019-04-23 Thread Qian He
The dataset was using a sparse representation before feeding into LogisticRegression. On Tue, Apr 23, 2019 at 3:15 PM Weichen Xu wrote: > Hi Qian, > > Do your dataset use sparse vector format ? > > > > On Mon, Apr 22, 2019 at 5:03 PM Qian He wrote: > >> Hi all, >> >> I'm using Spark provided

Re: Spark LogisticRegression got stuck on dataset with millions of columns

2019-04-23 Thread Weichen Xu
Hi Qian, Do your dataset use sparse vector format ? On Mon, Apr 22, 2019 at 5:03 PM Qian He wrote: > Hi all, > > I'm using Spark provided LogisticRegression to fit a dataset. Each row of > the data has 1.7 million columns, but it is sparse with only hundreds of > 1s. The Spark Ui reported

Spark LogisticRegression got stuck on dataset with millions of columns

2019-04-22 Thread Qian He
Hi all, I'm using Spark provided LogisticRegression to fit a dataset. Each row of the data has 1.7 million columns, but it is sparse with only hundreds of 1s. The Spark Ui reported high GC time when the model is being trained. And my spark application got stuck without any response. I have