Re: [scikit-learn] Truncated svd not working for complex matrices
I agree with Gaël on this. If you want to support complex values just copy the estimators / functions you want and maintain them in a separate package. +1 to error when complex are passed. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Truncated svd not working for complex matrices
On Fri, Aug 11, 2017 at 12:37:12PM -0400, Andreas Mueller wrote: > I opened https://github.com/scikit-learn/scikit-learn/issues/9528 > I suggest to first error everywhere and then fix those for which it seems > easy and worth it, as Joel said, probably mostly in decomposition. > Though adding support even in a few places seems like dangerous feature > creep. I am trying to predent that I am offline and in vacations, so I shouldn't answer. But I do have a clear cut opinion here. I believe that we should decide _not_ to support complex data everywhere. The reason is that the support for complex data will always be incomplete and risks being buggy. Indeed, complex data is very infrequent in machine learning (unlike with signal processing). Hence, it will recieve little usage. In addition, many machine learning algorithms cannot easily be adapted to complex data. To manage user expectation and to ensure quality of the codebase, let us error on complex data. Should we move this discussion on the issue opened by Andy? Gaël > On 08/11/2017 03:16 AM, Raphael C wrote: > >Although the first priority should be correctness (in implementation > >and documentation) and it makes sense to explicitly test for inputs > >for which code will give the wrong answer, it would be great if we > >could support complex data types, especially where it is very little > >extra work. > >Raphael > >On 11 August 2017 at 05:41, Joel Nothman wrote: > >>Should we be more explicitly forbidding complex data in most estimators, and > >>perhaps allow it in a few where it is tested (particularly decomposition)? > >>On 11 August 2017 at 01:08, André Melo > >>wrote: > >>>Actually, it makes more sense to change > >>> B = safe_sparse_dot(Q.T, M) > >>>To > >>> B = safe_sparse_dot(Q.T.conj(), M) > >>>On 10 August 2017 at 16:56, André Melo > >>>wrote: > Hi Olivier, > Thank you very much for your reply. I was convinced it couldn't be a > fundamental mathematical issue because the singular values were coming > out exactly right, so it had to be a problem with the way complex > values were being handled. > I decided to look at the source code and it turns out the problem is > when the following transformation is applied: > U = np.dot(Q, Uhat) > Replacing this by > U = np.dot(Q.conj(), Uhat) > solves the issue! Should I report this on github? > On 10 August 2017 at 16:13, Olivier Grisel > wrote: > >I have no idea whether the randomized SVD method is supposed to work > >for > >complex data or not (from a mathematical point of view). I think that > >all > >scikit-learn estimators assume real data (or integer data for class > >labels) > >and our input validation utilities will cast numeric values to float64 > >by > >default. This might be the cause of your problem. Have a look at the > >source > >code to confirm. The reference to the paper can also be found in the > >docstring of those functions. > >-- > >Olivier > >___ > >scikit-learn mailing list > >scikit-learn@python.org > >https://mail.python.org/mailman/listinfo/scikit-learn > >>>___ > >>>scikit-learn mailing list > >>>scikit-learn@python.org > >>>https://mail.python.org/mailman/listinfo/scikit-learn > >>___ > >>scikit-learn mailing list > >>scikit-learn@python.org > >>https://mail.python.org/mailman/listinfo/scikit-learn > >___ > >scikit-learn mailing list > >scikit-learn@python.org > >https://mail.python.org/mailman/listinfo/scikit-learn > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Researcher, INRIA Parietal NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-79-68 http://gael-varoquaux.infohttp://twitter.com/GaelVaroquaux ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Truncated svd not working for complex matrices
I opened https://github.com/scikit-learn/scikit-learn/issues/9528 I suggest to first error everywhere and then fix those for which it seems easy and worth it, as Joel said, probably mostly in decomposition. Though adding support even in a few places seems like dangerous feature creep. On 08/11/2017 03:16 AM, Raphael C wrote: Although the first priority should be correctness (in implementation and documentation) and it makes sense to explicitly test for inputs for which code will give the wrong answer, it would be great if we could support complex data types, especially where it is very little extra work. Raphael On 11 August 2017 at 05:41, Joel Nothman wrote: Should we be more explicitly forbidding complex data in most estimators, and perhaps allow it in a few where it is tested (particularly decomposition)? On 11 August 2017 at 01:08, André Melo wrote: Actually, it makes more sense to change B = safe_sparse_dot(Q.T, M) To B = safe_sparse_dot(Q.T.conj(), M) On 10 August 2017 at 16:56, André Melo wrote: Hi Olivier, Thank you very much for your reply. I was convinced it couldn't be a fundamental mathematical issue because the singular values were coming out exactly right, so it had to be a problem with the way complex values were being handled. I decided to look at the source code and it turns out the problem is when the following transformation is applied: U = np.dot(Q, Uhat) Replacing this by U = np.dot(Q.conj(), Uhat) solves the issue! Should I report this on github? On 10 August 2017 at 16:13, Olivier Grisel wrote: I have no idea whether the randomized SVD method is supposed to work for complex data or not (from a mathematical point of view). I think that all scikit-learn estimators assume real data (or integer data for class labels) and our input validation utilities will cast numeric values to float64 by default. This might be the cause of your problem. Have a look at the source code to confirm. The reference to the paper can also be found in the docstring of those functions. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Truncated svd not working for complex matrices
Although the first priority should be correctness (in implementation and documentation) and it makes sense to explicitly test for inputs for which code will give the wrong answer, it would be great if we could support complex data types, especially where it is very little extra work. Raphael On 11 August 2017 at 05:41, Joel Nothman wrote: > Should we be more explicitly forbidding complex data in most estimators, and > perhaps allow it in a few where it is tested (particularly decomposition)? > > On 11 August 2017 at 01:08, André Melo > wrote: >> >> Actually, it makes more sense to change >> >> B = safe_sparse_dot(Q.T, M) >> >> To >> B = safe_sparse_dot(Q.T.conj(), M) >> >> On 10 August 2017 at 16:56, André Melo >> wrote: >> > Hi Olivier, >> > >> > Thank you very much for your reply. I was convinced it couldn't be a >> > fundamental mathematical issue because the singular values were coming >> > out exactly right, so it had to be a problem with the way complex >> > values were being handled. >> > >> > I decided to look at the source code and it turns out the problem is >> > when the following transformation is applied: >> > >> > U = np.dot(Q, Uhat) >> > >> > Replacing this by >> > >> > U = np.dot(Q.conj(), Uhat) >> > >> > solves the issue! Should I report this on github? >> > >> > On 10 August 2017 at 16:13, Olivier Grisel >> > wrote: >> >> I have no idea whether the randomized SVD method is supposed to work >> >> for >> >> complex data or not (from a mathematical point of view). I think that >> >> all >> >> scikit-learn estimators assume real data (or integer data for class >> >> labels) >> >> and our input validation utilities will cast numeric values to float64 >> >> by >> >> default. This might be the cause of your problem. Have a look at the >> >> source >> >> code to confirm. The reference to the paper can also be found in the >> >> docstring of those functions. >> >> >> >> -- >> >> Olivier >> >> >> >> ___ >> >> scikit-learn mailing list >> >> scikit-learn@python.org >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Truncated svd not working for complex matrices
Should we be more explicitly forbidding complex data in most estimators, and perhaps allow it in a few where it is tested (particularly decomposition)? On 11 August 2017 at 01:08, André Melo wrote: > Actually, it makes more sense to change > > B = safe_sparse_dot(Q.T, M) > > To > B = safe_sparse_dot(Q.T.conj(), M) > > On 10 August 2017 at 16:56, André Melo > wrote: > > Hi Olivier, > > > > Thank you very much for your reply. I was convinced it couldn't be a > > fundamental mathematical issue because the singular values were coming > > out exactly right, so it had to be a problem with the way complex > > values were being handled. > > > > I decided to look at the source code and it turns out the problem is > > when the following transformation is applied: > > > > U = np.dot(Q, Uhat) > > > > Replacing this by > > > > U = np.dot(Q.conj(), Uhat) > > > > solves the issue! Should I report this on github? > > > > On 10 August 2017 at 16:13, Olivier Grisel > wrote: > >> I have no idea whether the randomized SVD method is supposed to work for > >> complex data or not (from a mathematical point of view). I think that > all > >> scikit-learn estimators assume real data (or integer data for class > labels) > >> and our input validation utilities will cast numeric values to float64 > by > >> default. This might be the cause of your problem. Have a look at the > source > >> code to confirm. The reference to the paper can also be found in the > >> docstring of those functions. > >> > >> -- > >> Olivier > >> > >> ___ > >> scikit-learn mailing list > >> scikit-learn@python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > >> > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Truncated svd not working for complex matrices
Actually, it makes more sense to change B = safe_sparse_dot(Q.T, M) To B = safe_sparse_dot(Q.T.conj(), M) On 10 August 2017 at 16:56, André Melo wrote: > Hi Olivier, > > Thank you very much for your reply. I was convinced it couldn't be a > fundamental mathematical issue because the singular values were coming > out exactly right, so it had to be a problem with the way complex > values were being handled. > > I decided to look at the source code and it turns out the problem is > when the following transformation is applied: > > U = np.dot(Q, Uhat) > > Replacing this by > > U = np.dot(Q.conj(), Uhat) > > solves the issue! Should I report this on github? > > On 10 August 2017 at 16:13, Olivier Grisel wrote: >> I have no idea whether the randomized SVD method is supposed to work for >> complex data or not (from a mathematical point of view). I think that all >> scikit-learn estimators assume real data (or integer data for class labels) >> and our input validation utilities will cast numeric values to float64 by >> default. This might be the cause of your problem. Have a look at the source >> code to confirm. The reference to the paper can also be found in the >> docstring of those functions. >> >> -- >> Olivier >> >> ___ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Truncated svd not working for complex matrices
Hi Olivier, Thank you very much for your reply. I was convinced it couldn't be a fundamental mathematical issue because the singular values were coming out exactly right, so it had to be a problem with the way complex values were being handled. I decided to look at the source code and it turns out the problem is when the following transformation is applied: U = np.dot(Q, Uhat) Replacing this by U = np.dot(Q.conj(), Uhat) solves the issue! Should I report this on github? On 10 August 2017 at 16:13, Olivier Grisel wrote: > I have no idea whether the randomized SVD method is supposed to work for > complex data or not (from a mathematical point of view). I think that all > scikit-learn estimators assume real data (or integer data for class labels) > and our input validation utilities will cast numeric values to float64 by > default. This might be the cause of your problem. Have a look at the source > code to confirm. The reference to the paper can also be found in the > docstring of those functions. > > -- > Olivier > > ___ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Truncated svd not working for complex matrices
I have no idea whether the randomized SVD method is supposed to work for complex data or not (from a mathematical point of view). I think that all scikit-learn estimators assume real data (or integer data for class labels) and our input validation utilities will cast numeric values to float64 by default. This might be the cause of your problem. Have a look at the source code to confirm. The reference to the paper can also be found in the docstring of those functions. -- Olivier ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Truncated svd not working for complex matrices
Hello all, I'm trying to use the randomized version of scikit-learn's TruncatedSVD (although I'm actually calling the internal function randomized_svd to get the actual u, s, v matrices). While it is working fine for real matrices, for complex matrices I can't get back the original matrix even though the singular values are exactly correct: >>> import numpy as np >>> from sklearn.utils.extmath import randomized_svd >>> N = 3 >>> a = np.random.rand(N, N)*(1 + 1j) >>> u1, s1, v1 = np.linalg.svd(a) >>> u2, s2, v2 = randomized_svd(a, n_components=N, n_iter=7) >>> np.allclose(s1, s2) True >>> np.allclose(a, u1.dot(np.diag(s1)).dot(v1)) True >>> np.allclose(a, u2.dot(np.diag(s2)).dot(v2)) False Any idea what could be wrong? Thank you! Best regards, Andre Melo ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn