Hm, weird that their platform seems to be so picky about it. Have you tried to 
just make the output of the pipeline dense? I.e., 

(model.predict(X)).toarray()

Best,
Sebastian

> On Apr 10, 2019, at 1:10 PM, Liam Geron <l...@chatdesk.com> wrote:
> 
> Hi Sebastian,
> 
> Thanks for the advice! The model actually works on it's own in python fine 
> luckily, so I don't think that that is the issue exactly. I have tried 
> rolling my own estimator to wrap the pipeline to have it call the 
> predict_proba method to return a dense array, however I then came across the 
> problem that I would have to have that custom estimator defined on the Cloud 
> ML end, which I'm unsure how to do.
> 
> Thanks,
> Liam
> 
> On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka <m...@sebastianraschka.com> 
> wrote:
> Hi Liam,
> 
> not sure what your exact error message is, but it may also be that the 
> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer returns 
> sparse arrays. You could probably fix your issues by inserting a 
> "DenseTransformer" into your pipelone (a simple class that just transforms an 
> array from a sparse to a dense format). I've implemented sth like that that 
> you can import or copy&paste it from here:
> 
> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py
> 
> The usage would then basically be
> 
> model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', 
> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))])
> 
> Best,
> Sebastian
> 
> 
> 
> 
> > On Apr 10, 2019, at 12:25 PM, Liam Geron <l...@chatdesk.com> wrote:
> > 
> > Hi all,
> > 
> > I was hoping to get some guidance re: changing the result of the predict 
> > method of the OneVsRestClassifier to return a dense array rather than a 
> > sparse array, given that Google Cloud ML only accepts dense numpy arrays as 
> > a result of a given models predict method. Right now my model architecture 
> > looks like:
> > 
> > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', 
> > OneVsRestClassifier(XGBClassifier()))])
> > 
> > Which returns a sparse array with the predict method. I saw the Stack 
> > Overflow post here: 
> > https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba
> > 
> > which recommends overwriting the predict method with the predict_proba 
> > method, however I found that I can't serialize the model after doing so. I 
> > also have a stack overflow post here: 
> > https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a
> >  which details the specific pickling error.
> > 
> > Is this a known issue? Is there an accepted way to convert this into a 
> > dense array?
> > 
> > Thanks,
> > Liam Geron
> > _______________________________________________
> > scikit-learn mailing list
> > scikit-learn@python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to