Unfortunately I don't believe that you get that level of freedom, it's an API call that automatically calls the model's predict method so I don't think that I get to specify something like model.predict(X).toarray(). I could be wrong however, I don't pretend to be an expert on Cloud ML by any stretch.
Thanks, Liam On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka <m...@sebastianraschka.com> wrote: > Hm, weird that their platform seems to be so picky about it. Have you > tried to just make the output of the pipeline dense? I.e., > > (model.predict(X)).toarray() > > Best, > Sebastian > > > On Apr 10, 2019, at 1:10 PM, Liam Geron <l...@chatdesk.com> wrote: > > > > Hi Sebastian, > > > > Thanks for the advice! The model actually works on it's own in python > fine luckily, so I don't think that that is the issue exactly. I have tried > rolling my own estimator to wrap the pipeline to have it call the > predict_proba method to return a dense array, however I then came across > the problem that I would have to have that custom estimator defined on the > Cloud ML end, which I'm unsure how to do. > > > > Thanks, > > Liam > > > > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka < > m...@sebastianraschka.com> wrote: > > Hi Liam, > > > > not sure what your exact error message is, but it may also be that the > XGBClassifier only accepts dense arrays? I think the TfidfVectorizer > returns sparse arrays. You could probably fix your issues by inserting a > "DenseTransformer" into your pipelone (a simple class that just transforms > an array from a sparse to a dense format). I've implemented sth like that > that you can import or copy&paste it from here: > > > > > https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py > > > > The usage would then basically be > > > > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', > DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) > > > > Best, > > Sebastian > > > > > > > > > > > On Apr 10, 2019, at 12:25 PM, Liam Geron <l...@chatdesk.com> wrote: > > > > > > Hi all, > > > > > > I was hoping to get some guidance re: changing the result of the > predict method of the OneVsRestClassifier to return a dense array rather > than a sparse array, given that Google Cloud ML only accepts dense numpy > arrays as a result of a given models predict method. Right now my model > architecture looks like: > > > > > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', > OneVsRestClassifier(XGBClassifier()))]) > > > > > > Which returns a sparse array with the predict method. I saw the Stack > Overflow post here: > https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba > > > > > > which recommends overwriting the predict method with the predict_proba > method, however I found that I can't serialize the model after doing so. I > also have a stack overflow post here: > https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a > which details the specific pickling error. > > > > > > Is this a known issue? Is there an accepted way to convert this into a > dense array? > > > > > > Thanks, > > > Liam Geron > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn