I also think that this could be likely a memory related issue. I just ran the
following snippet in a Jupyter Nb:
import numpy as np
from sklearn.linear_model import SGDClassifier
model = SGDClassifier(loss='log',penalty=None,alpha=0.0,
l1_ratio=0.0,fit_intercept=False,n_iter=1,shuffle=False,learning_rate='constant',
eta0=1.0)
X = np.random.random((1000000, 1000))
y = np.zeros(1000000)
y[:1000] = 1
model.fit(X, y)
The dataset takes approx. 8 Gb, but the model fitting is consuming ~16 Gb --
probably due to making a copy of the X array in the code. The Notebook didn't
crash but I think on machines with smaller RAM, this could be an issue. One
workaround you could try is to fit the model iteratively using partial_fit. For
example, 1000 samples at a time or so:
indices = np.arange(y.shape[0])
batch_size = 1000
for start_idx in range(0, indices.shape[0] - batch_size + 1,
batch_size):
index_slice = indices[start_idx:start_idx + batch_size]
model.partial_fit(X[index_slice], y[index_slice], classes=[0, 1])
Best,
Sebastian
> On Jun 2, 2017, at 6:50 AM, Iván Vallés Pérez <[email protected]>
> wrote:
>
> Are you monitoring your RAM memory consumption? I would say that it is the
> cause of the majority of the kernel crashes
> El El vie, 2 jun 2017 a las 12:45, Aymen J <[email protected]> escribió:
> Hey Guys,
>
>
> So I'm trying to fit an SGD classifier on a dataset that has 900,000 for
> about 3,600 features (high cardinality).
>
>
> Here is my model:
>
>
> model = SGDClassifier(loss='log',penalty=None,alpha=0.0,
>
> l1_ratio=0.0,fit_intercept=False,n_iter=1,shuffle=False,learning_rate='constant',
> eta0=1.0)
>
> When I run the model.fit function, The program runs for about 5 minutes, and
> I receive the message "the kernel has died" from Jupyter.
>
> Any idea what may cause that? Is my training data too big (in terms of
> features)? Can I do anything (parameters) to finish training?
>
> Thanks in advance for your help!
>
>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn