Re: [scikit-learn] scikit-learn 0.19.0 is out!
Thanks a lot for all the hard work and congratz! Best, Raga On Aug 12, 2017 1:21 AM, "Sebastian Raschka"wrote: > Yay, as an avid user, thanks to all the developers! This is a great > release indeed -- no breaking changes (at least for my code base) and so > many improvements and additions (that I need to check out in detail) :) > > > > On Aug 12, 2017, at 1:14 AM, Gael Varoquaux < > gael.varoqu...@normalesup.org> wrote: > > > > Hurray, thank you everybody. This is a good one! (as always). > > > > Gaël > > > > On Sat, Aug 12, 2017 at 12:16:07AM +0200, Guillaume Lemaître wrote: > >> Congrats guys > > > >> On 11 August 2017 at 23:57, Andreas Mueller wrote: > > > >>Thank you everybody for making the release possible, in particular > Olivier > >>and Joel :) > > > >>Wohoo! > > > >>___ > >>scikit-learn mailing list > >>scikit-learn@python.org > >>https://mail.python.org/mailman/listinfo/scikit-learn > > -- > >Gael Varoquaux > >Researcher, INRIA Parietal > >NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France > >Phone: ++ 33-1-69-08-79-68 > >http://gael-varoquaux.infohttp://twitter.com/ > GaelVaroquaux > > ___ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn 0.19.0 is out!
Yay, as an avid user, thanks to all the developers! This is a great release indeed -- no breaking changes (at least for my code base) and so many improvements and additions (that I need to check out in detail) :) > On Aug 12, 2017, at 1:14 AM, Gael Varoquaux> wrote: > > Hurray, thank you everybody. This is a good one! (as always). > > Gaël > > On Sat, Aug 12, 2017 at 12:16:07AM +0200, Guillaume Lemaître wrote: >> Congrats guys > >> On 11 August 2017 at 23:57, Andreas Mueller wrote: > >>Thank you everybody for making the release possible, in particular Olivier >>and Joel :) > >>Wohoo! > >>___ >>scikit-learn mailing list >>scikit-learn@python.org >>https://mail.python.org/mailman/listinfo/scikit-learn > -- >Gael Varoquaux >Researcher, INRIA Parietal >NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France >Phone: ++ 33-1-69-08-79-68 >http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn 0.19.0 is out!
Hurray, thank you everybody. This is a good one! (as always). Gaël On Sat, Aug 12, 2017 at 12:16:07AM +0200, Guillaume Lemaître wrote: > Congrats guys > On 11 August 2017 at 23:57, Andreas Muellerwrote: > Thank you everybody for making the release possible, in particular Olivier > and Joel :) > Wohoo! > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn 0.19.0 is out!
Congrats guys On 11 August 2017 at 23:57, Andreas Muellerwrote: > Thank you everybody for making the release possible, in particular Olivier > and Joel :) > > Wohoo! > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Guillaume Lemaitre INRIA Saclay - Parietal team Center for Data Science Paris-Saclay https://glemaitre.github.io/ ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Question-Early Stopping MLPClassifer RandomizedSearchCV
Hello Scikit-Learn Team, I´ve got a question concerning the implementation of Early Stopping in MLPClassifier. I am using it in combination with RandomizedSearchCV. The fraction used for validation in early stopping is set with the parameter validation_fraction of MLPClassifier. How is the validaton set extracted from the training set ? Does the function simply take the last X % from the training set ? Is there a possibility to manually set this validation set ? I wonder whether I correctly understand the functionality: The neural net is trained on the training data and the performance is evaluated after every epoch on the validation set (which is internally selected by the MLPClassifer)? If the Net stops training, the performance on the left out data (Parameter "cv" in RandomizedSearch) is determined ? Thank you very much for your help ! Kind Regards, Fabian Sippl ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] scikit-learn 0.19.0 is out!
Thank you everybody for making the release possible, in particular Olivier and Joel :) Wohoo! ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] scikit-learn 0.19.0 is out!
Grab it with pip or conda ! Quoting the release highlights from the website: We are excited to release a number of great new features including neighbors.LocalOutlierFactor for anomaly detection, preprocessing.QuantileTransformer for robust feature transformation, and the multioutput.ClassifierChain meta-estimator to simply account for dependencies between classes in multilabel problems. We have some new algorithms in existing estimators, such as multiplicative update in decomposition.NMF and multinomial linear_model.LogisticRegression with L1 loss (use solver='saga'). Cross validation is now able to return the results from multiple metric evaluations. The new model_selection.cross_validate can return many scores on the test data as well as training set performance and timings, and we have extended the scoring and refit parameters for grid/randomized search to handle multiple metrics. You can also learn faster. For instance, the new option to cache transformations in pipeline.Pipeline makes grid search over pipelines including slow transformations much more efficient. And you can predict faster: if you’re sure you know what you’re doing, you can turn off validating that the input is finite using config_context. We’ve made some important fixes too. We’ve fixed a longstanding implementation error in metrics.average_precision_score, so please be cautious with prior results reported from that function. A number of errors in the manifold.TSNE implementation have been fixed, particularly in the default Barnes-Hut approximation. semi_supervised.LabelSpreading and semi_supervised.LabelPropagation have had substantial fixes. LabelPropagation was previously broken. LabelSpreading should now correctly respect its alpha parameter. Please see the full changelog at: http://scikit-learn.org/0.19/whats_new.html#version-0-19 Notably some models have changed behaviors (bug fixes) and some methods or parameters part of the public API have been deprecated. A big thank you to anyone who made this release possible and Joel in particular. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Truncated svd not working for complex matrices
On Fri, Aug 11, 2017 at 12:37:12PM -0400, Andreas Mueller wrote: > I opened https://github.com/scikit-learn/scikit-learn/issues/9528 > I suggest to first error everywhere and then fix those for which it seems > easy and worth it, as Joel said, probably mostly in decomposition. > Though adding support even in a few places seems like dangerous feature > creep. I am trying to predent that I am offline and in vacations, so I shouldn't answer. But I do have a clear cut opinion here. I believe that we should decide _not_ to support complex data everywhere. The reason is that the support for complex data will always be incomplete and risks being buggy. Indeed, complex data is very infrequent in machine learning (unlike with signal processing). Hence, it will recieve little usage. In addition, many machine learning algorithms cannot easily be adapted to complex data. To manage user expectation and to ensure quality of the codebase, let us error on complex data. Should we move this discussion on the issue opened by Andy? Gaël > On 08/11/2017 03:16 AM, Raphael C wrote: > >Although the first priority should be correctness (in implementation > >and documentation) and it makes sense to explicitly test for inputs > >for which code will give the wrong answer, it would be great if we > >could support complex data types, especially where it is very little > >extra work. > >Raphael > >On 11 August 2017 at 05:41, Joel Nothmanwrote: > >>Should we be more explicitly forbidding complex data in most estimators, and > >>perhaps allow it in a few where it is tested (particularly decomposition)? > >>On 11 August 2017 at 01:08, André Melo > >>wrote: > >>>Actually, it makes more sense to change > >>> B = safe_sparse_dot(Q.T, M) > >>>To > >>> B = safe_sparse_dot(Q.T.conj(), M) > >>>On 10 August 2017 at 16:56, André Melo > >>>wrote: > Hi Olivier, > Thank you very much for your reply. I was convinced it couldn't be a > fundamental mathematical issue because the singular values were coming > out exactly right, so it had to be a problem with the way complex > values were being handled. > I decided to look at the source code and it turns out the problem is > when the following transformation is applied: > U = np.dot(Q, Uhat) > Replacing this by > U = np.dot(Q.conj(), Uhat) > solves the issue! Should I report this on github? > On 10 August 2017 at 16:13, Olivier Grisel > wrote: > >I have no idea whether the randomized SVD method is supposed to work > >for > >complex data or not (from a mathematical point of view). I think that > >all > >scikit-learn estimators assume real data (or integer data for class > >labels) > >and our input validation utilities will cast numeric values to float64 > >by > >default. This might be the cause of your problem. Have a look at the > >source > >code to confirm. The reference to the paper can also be found in the > >docstring of those functions. > >-- > >Olivier > >___ > >scikit-learn mailing list > >scikit-learn@python.org > >https://mail.python.org/mailman/listinfo/scikit-learn > >>>___ > >>>scikit-learn mailing list > >>>scikit-learn@python.org > >>>https://mail.python.org/mailman/listinfo/scikit-learn > >>___ > >>scikit-learn mailing list > >>scikit-learn@python.org > >>https://mail.python.org/mailman/listinfo/scikit-learn > >___ > >scikit-learn mailing list > >scikit-learn@python.org > >https://mail.python.org/mailman/listinfo/scikit-learn > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Truncated svd not working for complex matrices
I opened https://github.com/scikit-learn/scikit-learn/issues/9528 I suggest to first error everywhere and then fix those for which it seems easy and worth it, as Joel said, probably mostly in decomposition. Though adding support even in a few places seems like dangerous feature creep. On 08/11/2017 03:16 AM, Raphael C wrote: Although the first priority should be correctness (in implementation and documentation) and it makes sense to explicitly test for inputs for which code will give the wrong answer, it would be great if we could support complex data types, especially where it is very little extra work. Raphael On 11 August 2017 at 05:41, Joel Nothmanwrote: Should we be more explicitly forbidding complex data in most estimators, and perhaps allow it in a few where it is tested (particularly decomposition)? On 11 August 2017 at 01:08, André Melo wrote: Actually, it makes more sense to change B = safe_sparse_dot(Q.T, M) To B = safe_sparse_dot(Q.T.conj(), M) On 10 August 2017 at 16:56, André Melo wrote: Hi Olivier, Thank you very much for your reply. I was convinced it couldn't be a fundamental mathematical issue because the singular values were coming out exactly right, so it had to be a problem with the way complex values were being handled. I decided to look at the source code and it turns out the problem is when the following transformation is applied: U = np.dot(Q, Uhat) Replacing this by U = np.dot(Q.conj(), Uhat) solves the issue! Should I report this on github? On 10 August 2017 at 16:13, Olivier Grisel wrote: I have no idea whether the randomized SVD method is supposed to work for complex data or not (from a mathematical point of view). I think that all scikit-learn estimators assume real data (or integer data for class labels) and our input validation utilities will cast numeric values to float64 by default. This might be the cause of your problem. Have a look at the source code to confirm. The reference to the paper can also be found in the docstring of those functions. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Overflow Error with Cross-Validation (but not normally fitting the data)
To all, I am working on a scikit-learn estimator that performs a version of SVC with a custom kernel. Unfortunately, I have been presented with a problem: when running a grid search (or even using the cross_val_score function), my estimator encounters an overflow error when evaluating my kernel (specifically, in an array multiplication operation). What is particularly strange about this is that, when I train the estimator on the whole dataset, this error does not occur. In other words: the problem only appears to occur when the data is split into folds. Is this something that has been seen before? How ought I fix this? I have attached the source code below (in particular, see the notebook for how the problem arises). Best, Sam import numpy as np """assume m positive""" def shift( A,j,m): newA=np.roll(A, m, axis=j) for index, x in np.ndenumerate(newA): if (index[j]
Re: [scikit-learn] Truncated svd not working for complex matrices
Although the first priority should be correctness (in implementation and documentation) and it makes sense to explicitly test for inputs for which code will give the wrong answer, it would be great if we could support complex data types, especially where it is very little extra work. Raphael On 11 August 2017 at 05:41, Joel Nothmanwrote: > Should we be more explicitly forbidding complex data in most estimators, and > perhaps allow it in a few where it is tested (particularly decomposition)? > > On 11 August 2017 at 01:08, André Melo > wrote: >> >> Actually, it makes more sense to change >> >> B = safe_sparse_dot(Q.T, M) >> >> To >> B = safe_sparse_dot(Q.T.conj(), M) >> >> On 10 August 2017 at 16:56, André Melo >> wrote: >> > Hi Olivier, >> > >> > Thank you very much for your reply. I was convinced it couldn't be a >> > fundamental mathematical issue because the singular values were coming >> > out exactly right, so it had to be a problem with the way complex >> > values were being handled. >> > >> > I decided to look at the source code and it turns out the problem is >> > when the following transformation is applied: >> > >> > U = np.dot(Q, Uhat) >> > >> > Replacing this by >> > >> > U = np.dot(Q.conj(), Uhat) >> > >> > solves the issue! Should I report this on github? >> > >> > On 10 August 2017 at 16:13, Olivier Grisel >> > wrote: >> >> I have no idea whether the randomized SVD method is supposed to work >> >> for >> >> complex data or not (from a mathematical point of view). I think that >> >> all >> >> scikit-learn estimators assume real data (or integer data for class >> >> labels) >> >> and our input validation utilities will cast numeric values to float64 >> >> by >> >> default. This might be the cause of your problem. Have a look at the >> >> source >> >> code to confirm. The reference to the paper can also be found in the >> >> docstring of those functions. >> >> >> >> -- >> >> Olivier >> >> >> >> ___ >> >> scikit-learn mailing list >> >> scikit-learn@python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn