That's a great tip actually, I was unaware about the MultiOutputClassifier option. I'll give it a try!
Thanks, Liam On Wed, Apr 10, 2019 at 11:03 PM Joel Nothman <joel.noth...@gmail.com> wrote: > I think it's a bit weird if we're returning sparse output from > OneVsRestClassifier.predict if it wasn't fit on sparse Y. > > Actually, I would be in favour of deprecating multilabel support in > OneVsRestClassifier, since it is performing "binary relevance method" for > multilabel, not actually OvR. MultiOutputClassifier duplicates this > functionality (more or less), outputs a dense array (indeed it doesn't > support sparse Y and perhaps it should) and lives closer to functional > alternatives to binary relevance, such as ClassifierChain. > > On Thu, 11 Apr 2019 at 05:32, Liam Geron <l...@chatdesk.com> wrote: > >> Unfortunately I don't believe that you get that level of freedom, it's an >> API call that automatically calls the model's predict method so I don't >> think that I get to specify something like model.predict(X).toarray(). I >> could be wrong however, I don't pretend to be an expert on Cloud ML by any >> stretch. >> >> Thanks, >> Liam >> >> On Wed, Apr 10, 2019 at 3:23 PM Sebastian Raschka < >> m...@sebastianraschka.com> wrote: >> >>> Hm, weird that their platform seems to be so picky about it. Have you >>> tried to just make the output of the pipeline dense? I.e., >>> >>> (model.predict(X)).toarray() >>> >>> Best, >>> Sebastian >>> >>> > On Apr 10, 2019, at 1:10 PM, Liam Geron <l...@chatdesk.com> wrote: >>> > >>> > Hi Sebastian, >>> > >>> > Thanks for the advice! The model actually works on it's own in python >>> fine luckily, so I don't think that that is the issue exactly. I have tried >>> rolling my own estimator to wrap the pipeline to have it call the >>> predict_proba method to return a dense array, however I then came across >>> the problem that I would have to have that custom estimator defined on the >>> Cloud ML end, which I'm unsure how to do. >>> > >>> > Thanks, >>> > Liam >>> > >>> > On Wed, Apr 10, 2019 at 2:06 PM Sebastian Raschka < >>> m...@sebastianraschka.com> wrote: >>> > Hi Liam, >>> > >>> > not sure what your exact error message is, but it may also be that the >>> XGBClassifier only accepts dense arrays? I think the TfidfVectorizer >>> returns sparse arrays. You could probably fix your issues by inserting a >>> "DenseTransformer" into your pipelone (a simple class that just transforms >>> an array from a sparse to a dense format). I've implemented sth like that >>> that you can import or copy&paste it from here: >>> > >>> > >>> https://github.com/rasbt/mlxtend/blob/master/mlxtend/preprocessing/dense_transformer.py >>> > >>> > The usage would then basically be >>> > >>> > model = Pipeline([('tfidf', TfidfVectorizer()), ('to_dense', >>> DenseTransformer()), ('clf', OneVsRestClassifier(XGBClassifier()))]) >>> > >>> > Best, >>> > Sebastian >>> > >>> > >>> > >>> > >>> > > On Apr 10, 2019, at 12:25 PM, Liam Geron <l...@chatdesk.com> wrote: >>> > > >>> > > Hi all, >>> > > >>> > > I was hoping to get some guidance re: changing the result of the >>> predict method of the OneVsRestClassifier to return a dense array rather >>> than a sparse array, given that Google Cloud ML only accepts dense numpy >>> arrays as a result of a given models predict method. Right now my model >>> architecture looks like: >>> > > >>> > > model = Pipeline([('tfidf', TfidfVectorizer()), ('clf', >>> OneVsRestClassifier(XGBClassifier()))]) >>> > > >>> > > Which returns a sparse array with the predict method. I saw the >>> Stack Overflow post here: >>> https://stackoverflow.com/questions/52151548/google-cloud-ml-engine-scikit-learn-prediction-probability-predict-proba >>> > > >>> > > which recommends overwriting the predict method with the >>> predict_proba method, however I found that I can't serialize the model >>> after doing so. I also have a stack overflow post here: >>> https://stackoverflow.com/questions/55366454/how-to-convert-scikit-learn-onevsrestclassifier-predict-method-output-to-dense-a >>> which details the specific pickling error. >>> > > >>> > > Is this a known issue? Is there an accepted way to convert this into >>> a dense array? >>> > > >>> > > Thanks, >>> > > Liam Geron >>> > > _______________________________________________ >>> > > scikit-learn mailing list >>> > > scikit-learn@python.org >>> > > https://mail.python.org/mailman/listinfo/scikit-learn >>> > >>> > _______________________________________________ >>> > scikit-learn mailing list >>> > scikit-learn@python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> > _______________________________________________ >>> > scikit-learn mailing list >>> > scikit-learn@python.org >>> > https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn