[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113431#comment-13113431 ] greg sterijevski commented on MATH-607: --- Yes, I concur. Besides, having such long interface name is not a good idea either.. Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-675) MathUtils should have a static method which checks whether an array of doubles or Comparables is monotone
MathUtils should have a static method which checks whether an array of doubles or Comparables is monotone -- Key: MATH-675 URL: https://issues.apache.org/jira/browse/MATH-675 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Assignee: greg sterijevski Priority: Minor Fix For: 3.0 The static method checkOrder in MathUtils is a useful piece of code which checks for monotonically increasing or decreasing elements in an array. It would be useful to have a similar method for Comparable. Furthermore, this new method would just return true or false. Unlike the current checkOrder, no exception would be thrown if monotonicity did not exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113120#comment-13113120 ] greg sterijevski commented on MATH-607: --- I am pushing some changes to SimpleRegression which will allow it to support the UpdatingMultipleRegression interface. There are a couple of additions to Localizable. Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (MATH-649) SimpleRegression needs the ability to suppress the intercept
[ https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski closed MATH-649. - Resolution: Fixed Assignee: greg sterijevski commit - r1167451 I have pushed the cleaned up code. The changes consist of the introduction of a boolean, hasIntercept and changes in the calculation of slope/intercept. SimpleRegression needs the ability to suppress the intercept Key: MATH-649 URL: https://issues.apache.org/jira/browse/MATH-649 Project: Commons Math Issue Type: New Feature Affects Versions: 1.2, 2.1, 2.2 Environment: JAVA Reporter: greg sterijevski Assignee: greg sterijevski Priority: Minor Labels: NOINTERCEPT, SIMPLEREGRESSION Fix For: 3.0 Attachments: simplereg, simplereg2, simpleregtest Original Estimate: 2h Remaining Estimate: 2h The SimpleRegression class is a useful class for running regressions involving one independent variable. It lacks the ability to constrain the constant to be zero. I am attaching a patch which gives a constructor for setting NOINT. I am also checking in two NIST data sets for noint estimation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-649) SimpleRegression needs the ability to suppress the intercept
[ https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-649: -- Attachment: simplereg2 Now without all the formatting changes! SimpleRegression needs the ability to suppress the intercept Key: MATH-649 URL: https://issues.apache.org/jira/browse/MATH-649 Project: Commons Math Issue Type: New Feature Affects Versions: 1.2, 2.1, 2.2 Environment: JAVA Reporter: greg sterijevski Priority: Minor Labels: NOINTERCEPT, SIMPLEREGRESSION Fix For: 3.0 Attachments: simplereg, simplereg2, simpleregtest Original Estimate: 2h Remaining Estimate: 2h The SimpleRegression class is a useful class for running regressions involving one independent variable. It lacks the ability to constrain the constant to be zero. I am attaching a patch which gives a constructor for setting NOINT. I am also checking in two NIST data sets for noint estimation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-649) SimpleRegression needs the ability to suppress the intercept
[ https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100936#comment-13100936 ] greg sterijevski commented on MATH-649: --- I will check both the source changes and test changes once I have clean build (maven site) and there are no style or formatting errors. SimpleRegression needs the ability to suppress the intercept Key: MATH-649 URL: https://issues.apache.org/jira/browse/MATH-649 Project: Commons Math Issue Type: New Feature Affects Versions: 1.2, 2.1, 2.2 Environment: JAVA Reporter: greg sterijevski Priority: Minor Labels: NOINTERCEPT, SIMPLEREGRESSION Fix For: 3.0 Attachments: simplereg, simplereg2, simpleregtest Original Estimate: 2h Remaining Estimate: 2h The SimpleRegression class is a useful class for running regressions involving one independent variable. It lacks the ability to constrain the constant to be zero. I am attaching a patch which gives a constructor for setting NOINT. I am also checking in two NIST data sets for noint estimation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-196) add support to constrained parameter estimation
[ https://issues.apache.org/jira/browse/MATH-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095734#comment-13095734 ] greg sterijevski commented on MATH-196: --- Luc, I am not sure if we are talking about the same implementation, but this paper argues differently: http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf -Greg add support to constrained parameter estimation --- Key: MATH-196 URL: https://issues.apache.org/jira/browse/MATH-196 Project: Commons Math Issue Type: New Feature Affects Versions: 1.2 Reporter: Luc Maisonobe Assignee: Luc Maisonobe Fix For: 3.0 The current estimation package supports only unconstrained problems. It should at least support simple bounds constrains on parameters. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-655) General framework for iterative algorithms
[ https://issues.apache.org/jira/browse/MATH-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095737#comment-13095737 ] greg sterijevski commented on MATH-655: --- In the IterativeAlgorithm class you use the generic Collection class which you instantiate with an ArrayList. Don't you think it would be better to use one of the classes like CopyOnWriteArraySet? This way you can have listeners attach and detach without explicit synchronization. General framework for iterative algorithms -- Key: MATH-655 URL: https://issues.apache.org/jira/browse/MATH-655 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Reporter: Sébastien Brisard Priority: Minor Labels: algorithm, events Attachments: iterative-algorithm.zip Following the thread [Monitoring iterative algorithms|http://mail-archives.apache.org/mod_mbox/commons-dev/201108.mbox/%3CCAGRH7HrgcgoBA=jcoKovjiQU=TjpQHnspBkOGNCu7oDdKk=k...@mail.gmail.com%3E], here is a first attempt at defining a general enough framework for iterative algorithms at large. At the moment, the classes provide support for * maximum number of iterations * events handling ** initialization event (prior to entering the main loop), ** iteration event (after completion of one iteration), ** termination event (after termination of the main loop). These classes do not yet provide support for a stopping criterion. Some points worth to note * For the time being, the classes are part of the o.a.c.m.linear package. * For the time being, {{IterativeAlgorithm.incrementIterationCount()}} throws a {{TooManyEvaluationsException}}. If the proposed new feature is integrated into CM, then a proper {{TooManyIterationsException}} should be created, from which the former could derive. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-651) eigendecompimpl allocates space for array imagEigenvalues when it is not needed
eigendecompimpl allocates space for array imagEigenvalues when it is not needed --- Key: MATH-651 URL: https://issues.apache.org/jira/browse/MATH-651 Project: Commons Math Issue Type: Bug Affects Versions: 3.1 Environment: JAVA Reporter: greg sterijevski Priority: Minor Fix For: 3.1 The class variable imagEigenvalues is allocated even there is no use for it. I propose leaving the reference null. Patch will follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-651) eigendecompimpl allocates space for array imagEigenvalues when it is not needed
[ https://issues.apache.org/jira/browse/MATH-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-651: -- Attachment: eigendecompimpl The patch with proposed changes... eigendecompimpl allocates space for array imagEigenvalues when it is not needed --- Key: MATH-651 URL: https://issues.apache.org/jira/browse/MATH-651 Project: Commons Math Issue Type: Bug Affects Versions: 3.1 Environment: JAVA Reporter: greg sterijevski Priority: Minor Labels: EIGENDECOMPOSITIONIMPL Fix For: 3.1 Attachments: eigendecompimpl The class variable imagEigenvalues is allocated even there is no use for it. I propose leaving the reference null. Patch will follow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-652) Tridiagonal QR decomposition has a faulty test for zero...
[ https://issues.apache.org/jira/browse/MATH-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-652: -- Attachment: tridiagonal Tridiagonal QR decomposition has a faulty test for zero... --- Key: MATH-652 URL: https://issues.apache.org/jira/browse/MATH-652 Project: Commons Math Issue Type: Bug Affects Versions: 3.1 Environment: JAVA Reporter: greg sterijevski Labels: TriDiagonalTransformer Fix For: 3.1 Attachments: tridiagonal Original Estimate: 1h Remaining Estimate: 1h In the method getQT() of TriDiagonalTransformer we have: public RealMatrix getQT() { if (cachedQt == null) { final int m = householderVectors.length; cachedQt = MatrixUtils.createRealMatrix(m, m); // build up first part of the matrix by applying Householder transforms for (int k = m - 1; k = 1; --k) { final double[] hK = householderVectors[k - 1]; cachedQt.setEntry(k, k, 1); final double inv = 1.0 / (secondary[k - 1] * hK[k]); if (hK[k] != 0.0) { double beta = 1.0 / secondary[k - 1]; The faulty line is : final double inv = 1.0 / (secondary[k - 1] * hK[k]); It should be put after the test for the zero, eg: public RealMatrix getQT() { if (cachedQt == null) { final int m = householderVectors.length; cachedQt = MatrixUtils.createRealMatrix(m, m); // build up first part of the matrix by applying Householder transforms for (int k = m - 1; k = 1; --k) { final double[] hK = householderVectors[k - 1]; cachedQt.setEntry(k, k, 1); if (hK[k] != 0.0) { final double inv = 1.0 / (secondary[k - 1] * hK[k]); double beta = 1.0 / secondary[k - 1]; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-652) Tridiagonal QR decomposition has a faulty test for zero...
Tridiagonal QR decomposition has a faulty test for zero... --- Key: MATH-652 URL: https://issues.apache.org/jira/browse/MATH-652 Project: Commons Math Issue Type: Bug Affects Versions: 3.1 Environment: JAVA Reporter: greg sterijevski Fix For: 3.1 Attachments: tridiagonal In the method getQT() of TriDiagonalTransformer we have: public RealMatrix getQT() { if (cachedQt == null) { final int m = householderVectors.length; cachedQt = MatrixUtils.createRealMatrix(m, m); // build up first part of the matrix by applying Householder transforms for (int k = m - 1; k = 1; --k) { final double[] hK = householderVectors[k - 1]; cachedQt.setEntry(k, k, 1); final double inv = 1.0 / (secondary[k - 1] * hK[k]); if (hK[k] != 0.0) { double beta = 1.0 / secondary[k - 1]; The faulty line is : final double inv = 1.0 / (secondary[k - 1] * hK[k]); It should be put after the test for the zero, eg: public RealMatrix getQT() { if (cachedQt == null) { final int m = householderVectors.length; cachedQt = MatrixUtils.createRealMatrix(m, m); // build up first part of the matrix by applying Householder transforms for (int k = m - 1; k = 1; --k) { final double[] hK = householderVectors[k - 1]; cachedQt.setEntry(k, k, 1); if (hK[k] != 0.0) { final double inv = 1.0 / (secondary[k - 1] * hK[k]); double beta = 1.0 / secondary[k - 1]; -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-649) SimpleRegression needs the ability to suppress the intercept
[ https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088737#comment-13088737 ] greg sterijevski commented on MATH-649: --- Mea Culpa! SimpleRegression needs the ability to suppress the intercept Key: MATH-649 URL: https://issues.apache.org/jira/browse/MATH-649 Project: Commons Math Issue Type: New Feature Affects Versions: 1.2, 2.1, 2.2 Environment: JAVA Reporter: greg sterijevski Priority: Minor Labels: NOINTERCEPT, SIMPLEREGRESSION Fix For: 3.0 Attachments: simplereg, simpleregtest Original Estimate: 2h Remaining Estimate: 2h The SimpleRegression class is a useful class for running regressions involving one independent variable. It lacks the ability to constrain the constant to be zero. I am attaching a patch which gives a constructor for setting NOINT. I am also checking in two NIST data sets for noint estimation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-649) SimpleRegression needs the ability to surpress the intercept
SimpleRegression needs the ability to surpress the intercept Key: MATH-649 URL: https://issues.apache.org/jira/browse/MATH-649 Project: Commons Math Issue Type: New Feature Affects Versions: 3.1 Environment: JAVA Reporter: greg sterijevski Priority: Minor Attachments: simplereg, simpleregtest The SimpleRegression class is a useful class for running regressions involving one independent variable. It lacks the ability to constrain the constant to be zero. I am attaching a patch which gives a constructor for setting NOINT. I am also checking in two NIST data sets for noint estimation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-649) SimpleRegression needs the ability to surpress the intercept
[ https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-649: -- Attachment: simpleregtest simplereg Simple regression updates... SimpleRegression needs the ability to surpress the intercept Key: MATH-649 URL: https://issues.apache.org/jira/browse/MATH-649 Project: Commons Math Issue Type: New Feature Affects Versions: 3.1 Environment: JAVA Reporter: greg sterijevski Priority: Minor Labels: NOINTERCEPT, SIMPLEREGRESSION Attachments: simplereg, simpleregtest Original Estimate: 2h Remaining Estimate: 2h The SimpleRegression class is a useful class for running regressions involving one independent variable. It lacks the ability to constrain the constant to be zero. I am attaching a patch which gives a constructor for setting NOINT. I am also checking in two NIST data sets for noint estimation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-602) Inverse condition number
[ https://issues.apache.org/jira/browse/MATH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084667#comment-13084667 ] greg sterijevski commented on MATH-602: --- I agree. I been trying to cook up a nice illustration, but nothing that is good enough yet. In the meanwhile, the R manual has a good discussion which eloquently (at least far more eloquently than me..) summarizes the usefulness of the inverse condition number. http://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rcond.html Suffice it to say, having an index [0..1] is a bit more useful in comparing matrices than an unbounded number. -Greg PS Will post a better example after I have concocted it. Inverse condition number Key: MATH-602 URL: https://issues.apache.org/jira/browse/MATH-602 Project: Commons Math Issue Type: Improvement Affects Versions: 2.2 Environment: All Reporter: greg sterijevski Priority: Minor Labels: Condition, Inverse, Number Fix For: 3.0 Attachments: svdinvcond, tstsvd Original Estimate: 1h Remaining Estimate: 1h In SingularValueDecompositionImpl, the condition number is given as the ratio of the largest singular value to the smallest singular value. While this is the correct calculation, because of concerns over rank deficiency, researchers have traditionally used the inverse of the condition number as a more stable indicator of rank deficiency. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-602) Inverse condition number
[ https://issues.apache.org/jira/browse/MATH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084253#comment-13084253 ] greg sterijevski commented on MATH-602: --- Certainly, did not want to crowd up the ticket. Will submit it later in the day. -Greg Inverse condition number Key: MATH-602 URL: https://issues.apache.org/jira/browse/MATH-602 Project: Commons Math Issue Type: Improvement Affects Versions: 2.2 Environment: All Reporter: greg sterijevski Priority: Minor Labels: Condition, Inverse, Number Fix For: 3.0 Attachments: svdinvcond Original Estimate: 1h Remaining Estimate: 1h In SingularValueDecompositionImpl, the condition number is given as the ratio of the largest singular value to the smallest singular value. While this is the correct calculation, because of concerns over rank deficiency, researchers have traditionally used the inverse of the condition number as a more stable indicator of rank deficiency. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-602) Inverse condition number
[ https://issues.apache.org/jira/browse/MATH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-602: -- Attachment: tstsvd The first trivial test Inverse condition number Key: MATH-602 URL: https://issues.apache.org/jira/browse/MATH-602 Project: Commons Math Issue Type: Improvement Affects Versions: 2.2 Environment: All Reporter: greg sterijevski Priority: Minor Labels: Condition, Inverse, Number Fix For: 3.0 Attachments: svdinvcond, tstsvd Original Estimate: 1h Remaining Estimate: 1h In SingularValueDecompositionImpl, the condition number is given as the ratio of the largest singular value to the smallest singular value. While this is the correct calculation, because of concerns over rank deficiency, researchers have traditionally used the inverse of the condition number as a more stable indicator of rank deficiency. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-615) OLSMultipleRegression seems to fail on the Filippelli Data
[ https://issues.apache.org/jira/browse/MATH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082163#comment-13082163 ] greg sterijevski commented on MATH-615: --- Yes, the major issue of singularity was one where I had a bug in the test. OLSMultipleRegression seems to fail on the Filippelli Data -- Key: MATH-615 URL: https://issues.apache.org/jira/browse/MATH-615 Project: Commons Math Issue Type: Bug Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Filippelli, NIST, OLSMutlipleRegression, QR, data Attachments: filippelli2, tstdiff Running the Filipelli data results in an exception being thrown by OLSMutlipleRegression. The exception states that the matrix is singular. http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml I have added the data to the OLSMutlipleRegressionTest file. Unless I screwed something up in the passing of the data, it looks like the QR decomposition is failing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-615) OLSMultipleRegression seems to fail on the Filippelli Data
[ https://issues.apache.org/jira/browse/MATH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082167#comment-13082167 ] greg sterijevski commented on MATH-615: --- Do check in the Filipelli test though. OLSMultipleRegression seems to fail on the Filippelli Data -- Key: MATH-615 URL: https://issues.apache.org/jira/browse/MATH-615 Project: Commons Math Issue Type: Bug Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Filippelli, NIST, OLSMutlipleRegression, QR, data Attachments: filippelli2, tstdiff Running the Filipelli data results in an exception being thrown by OLSMutlipleRegression. The exception states that the matrix is singular. http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml I have added the data to the OLSMutlipleRegressionTest file. Unless I screwed something up in the passing of the data, it looks like the QR decomposition is failing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-608) Remove methods from RealMatrix Interface
[ https://issues.apache.org/jira/browse/MATH-608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082332#comment-13082332 ] greg sterijevski commented on MATH-608: --- This was not a popular suggestion and would be a major slash and burn operation. Postpone it, or kill it. -Greg Remove methods from RealMatrix Interface Key: MATH-608 URL: https://issues.apache.org/jira/browse/MATH-608 Project: Commons Math Issue Type: Improvement Affects Versions: 1.0, 1.1, 1.2, 2.0, 2.1, 2.2 Environment: Java Reporter: greg sterijevski Priority: Minor Labels: Matrices Fix For: 3.0 Original Estimate: 2h Remaining Estimate: 2h The RealMatrix interface describes several methods which take a RealMatrix and yield a RealMatrix return. They are: RealMatrix multiply(RealMatrix m); RealMatrix preMultiply(RealMatrix m); RealMatrix power(final int p); RealMatrix add(RealMatrix m) RealMatrix subtract(RealMatrix m) There is nothing inherently wrong in making all subclasses of RealMatrix implement these methods. However, as the number of subclasses of RealMatrix increases, the complexity of these methods will also increase. I think these methods should be part of a separate class of 'operators' which handle matrix multiplication, addition, subtraction and exponentiation. Say for example, I implement SymmetricRealMatrix. I would like to store the data of a real symmetric in compressed form, so that I only consume (nrow + 1)*nrow /2 space in memory. When it comes time to implement multiply (for example), I must test to see if the RealMatrix given in the argument is also of Type SymmetricRealMatrix, since that will affect the algorithm I use to do the multiplication. I could access each element of the argument matrix via its getter, but efficiency will suffer. One can think of cases where we might have a DiagonalRealMatrix times a DiagonRealMatrix. One would not want to store the resultant diagonal in a general matrix storage. Keeping track of all of the permutations of Symmetrics, Diagonals,..., and their resultants inside of the body of a function makes for very brittle code. Furthermore, anytime a new type of matrix is defined all matrix multiplication routines would have to be updated. There are special types of operations which result in particular matrix patterns. A matrix times its transpose is itself a symmetric. A general matrix sandwiched between another general matrix and its transpose is a symmetric. Cholesky decompositions form upper and lower triangular matrices. These are common enough occurrences in statistical techniques that it makes sense to put them in their own class (perhaps as static methods). It would keep the contract of the RealMatrix classes very simple. The ReaMatrix would be nothing more than: 1. Marker (is the matrix General, Symmetric, Banded, Diagonal, UpperTriangular..) 2. Opaque data store (except for the operator classes, no one would need to know how the data is actually stored). 3. Indexing scheme. The reason I bring this up, is that I am attempting to write a SymmetricRealMatrix class to support variance-covariance matrices. I noticed that there are relatively few subclasses of RealMatrix. While it would be easy to hack it up for the handful of implementations that exist, that would probably create more problems as the number of types of matrices increases. Thank you, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-602) Inverse condition number
[ https://issues.apache.org/jira/browse/MATH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-602: -- Attachment: svdinvcond Patch Inverse condition number Key: MATH-602 URL: https://issues.apache.org/jira/browse/MATH-602 Project: Commons Math Issue Type: Improvement Affects Versions: 2.2 Environment: All Reporter: greg sterijevski Priority: Minor Labels: Condition, Inverse, Number Fix For: 3.0 Attachments: svdinvcond Original Estimate: 1h Remaining Estimate: 1h In SingularValueDecompositionImpl, the condition number is given as the ratio of the largest singular value to the smallest singular value. While this is the correct calculation, because of concerns over rank deficiency, researchers have traditionally used the inverse of the condition number as a more stable indicator of rank deficiency. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-601) SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation
[ https://issues.apache.org/jira/browse/MATH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069982#comment-13069982 ] greg sterijevski commented on MATH-601: --- Axel, You are correct, while the getRank() emthods criterion was changed to: double tol = FastMath.max(m, n) * singularValues[0] * EPS; there is nothing happening at line 591. The moore-penrose will still be not consistent to the ranks calculation. Line 591 and onwards: if (singularValues[i] 0) { a = 1 / singularValues[i]; } else { a = 0; } So the change of the zero criterion is good, there is one more spot to fix. I would also put a lower bound on tol: tol = FastMath.max(m, n) * singularValues[0] * EPS; if( FastMath.abs(tol) FastMath.sqrt( MathUtils.SAFE_MIN) ){ } -Greg SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation Key: MATH-601 URL: https://issues.apache.org/jira/browse/MATH-601 Project: Commons Math Issue Type: Bug Affects Versions: 2.2, 3.0 Environment: All Reporter: greg sterijevski Labels: Pseudoinverse Attachments: SingularValueDecompositionImpl.patch Original Estimate: 24h Remaining Estimate: 24h In the SingularValueDecompositionImpl's internal private class Solver, a pseudo inverse matrix is calculated: In lines 2600-264 we have: if (singularValues[i] 0) { a = 1 / singularValues[i]; } else { a = 0; } This is not consistent with the manner in which rank is determined (lines 225 to 233). That is to say a matrix could potentially be rank deficient, yet the psuedoinverse would still include the redundant columns... Also, there is the problem of very small singular values which could result in overflow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-601) SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation
[ https://issues.apache.org/jira/browse/MATH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069983#comment-13069983 ] greg sterijevski commented on MATH-601: --- Sorry, Sent the previous inadvertently, f( FastMath.abs(tol) FastMath.sqrt( MathUtils.SAFE_MIN) ){ tol = FastMath.sqrt( MathUtils.SAFE_MIN); } That should guard against the case of a small matrix with small eigenvalues. -Greg On Sat, Jul 23, 2011 at 10:31 AM, Greg Sterijevski SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation Key: MATH-601 URL: https://issues.apache.org/jira/browse/MATH-601 Project: Commons Math Issue Type: Bug Affects Versions: 2.2, 3.0 Environment: All Reporter: greg sterijevski Labels: Pseudoinverse Attachments: SingularValueDecompositionImpl.patch Original Estimate: 24h Remaining Estimate: 24h In the SingularValueDecompositionImpl's internal private class Solver, a pseudo inverse matrix is calculated: In lines 2600-264 we have: if (singularValues[i] 0) { a = 1 / singularValues[i]; } else { a = 0; } This is not consistent with the manner in which rank is determined (lines 225 to 233). That is to say a matrix could potentially be rank deficient, yet the psuedoinverse would still include the redundant columns... Also, there is the problem of very small singular values which could result in overflow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-601) SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation
[ https://issues.apache.org/jira/browse/MATH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069628#comment-13069628 ] greg sterijevski commented on MATH-601: --- Patch looks good to me... SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation Key: MATH-601 URL: https://issues.apache.org/jira/browse/MATH-601 Project: Commons Math Issue Type: Bug Affects Versions: 2.2, 3.0 Environment: All Reporter: greg sterijevski Labels: Pseudoinverse Attachments: SingularValueDecompositionImpl.patch Original Estimate: 24h Remaining Estimate: 24h In the SingularValueDecompositionImpl's internal private class Solver, a pseudo inverse matrix is calculated: In lines 2600-264 we have: if (singularValues[i] 0) { a = 1 / singularValues[i]; } else { a = 0; } This is not consistent with the manner in which rank is determined (lines 225 to 233). That is to say a matrix could potentially be rank deficient, yet the psuedoinverse would still include the redundant columns... Also, there is the problem of very small singular values which could result in overflow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069322#comment-13069322 ] greg sterijevski commented on MATH-607: --- How do you propose to allow for the growth of global fit statistics? Keep the getter pattern? If we decide to keep the getter pattern, then for sure eliminate the array, eliminate the static int indices. What is the ML thread? -Greg Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069330#comment-13069330 ] greg sterijevski commented on MATH-607: --- Thanks on both counts. I can understand your reticence on the array approach. Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-624) Need a method to solve upper and lower triangular systems
Need a method to solve upper and lower triangular systems - Key: MATH-624 URL: https://issues.apache.org/jira/browse/MATH-624 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Fix For: 3.0 I have run into a need to solve triangular systems. While (as Phil and Ted point out) I could use the LU and QR decompositions, it seems cleaner to have a couple of static functions which do this. I am including a patch to provide an implementation and the tests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-624) Need a method to solve upper and lower triangular systems
[ https://issues.apache.org/jira/browse/MATH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-624: -- Attachment: upperlowertests upperlowermethods Both patches pass the checkstyle check and the findbugs... Need a method to solve upper and lower triangular systems - Key: MATH-624 URL: https://issues.apache.org/jira/browse/MATH-624 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Backsolve, Forwardsolve, LowerTriangular, UpperTriangular Fix For: 3.0 Attachments: upperlowermethods, upperlowertests Original Estimate: 0h Remaining Estimate: 0h I have run into a need to solve triangular systems. While (as Phil and Ted point out) I could use the LU and QR decompositions, it seems cleaner to have a couple of static functions which do this. I am including a patch to provide an implementation and the tests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-607: -- Attachment: millerreg_take2 Attached patch should fix the checkstyle errors... for the miller regression. Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-607: -- Attachment: RegressResults2 This patch should fix the errors in checkstyle for RegressionResults. Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: RegressResults2, millerreg, millerreg_take2, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-615) OLSMultipleRegression seems to fail on the Filippelli Data
[ https://issues.apache.org/jira/browse/MATH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-615: -- Attachment: filippelli2 The situation is not as dire as I first thought. The original test I uploaded had a bug which resulted in a singular matrix, correctly. This current test still fails, but the failure occurs with a tolerance of 1.0e-5 for the parameters. OLSMultipleRegression seems to fail on the Filippelli Data -- Key: MATH-615 URL: https://issues.apache.org/jira/browse/MATH-615 Project: Commons Math Issue Type: Bug Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Filippelli, NIST, OLSMutlipleRegression, QR, data Attachments: filippelli2, tstdiff Running the Filipelli data results in an exception being thrown by OLSMutlipleRegression. The exception states that the matrix is singular. http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml I have added the data to the OLSMutlipleRegressionTest file. Unless I screwed something up in the passing of the data, it looks like the QR decomposition is failing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-607: -- Attachment: millerregtest millerreg Attached is the Miller regression and tests. Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: millerreg, millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-619) ADJUSTED R SQUARED INCORRECT IN REGRESSION RESULTS
[ https://issues.apache.org/jira/browse/MATH-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-619: -- Attachment: regres ADJUSTED R SQUARED INCORRECT IN REGRESSION RESULTS -- Key: MATH-619 URL: https://issues.apache.org/jira/browse/MATH-619 Project: Commons Math Issue Type: Bug Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Attachments: regres I forgot to cast to double when dividing two integers: this.globalFitInfo[ADJRSQ_IDX] = 1.0 - (1.0 - this.globalFitInfo[RSQ_IDX]) * ( nobs / ( (nobs - rank))); Should be this.globalFitInfo[ADJRSQ_IDX] = 1.0 - (1.0 - this.globalFitInfo[RSQ_IDX]) * ( (double) nobs / ( (double) (nobs - rank))); Patch attached. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-619) ADJUSTED R SQUARED INCORRECT IN REGRESSION RESULTS
ADJUSTED R SQUARED INCORRECT IN REGRESSION RESULTS -- Key: MATH-619 URL: https://issues.apache.org/jira/browse/MATH-619 Project: Commons Math Issue Type: Bug Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Attachments: regres I forgot to cast to double when dividing two integers: this.globalFitInfo[ADJRSQ_IDX] = 1.0 - (1.0 - this.globalFitInfo[RSQ_IDX]) * ( nobs / ( (nobs - rank))); Should be this.globalFitInfo[ADJRSQ_IDX] = 1.0 - (1.0 - this.globalFitInfo[RSQ_IDX]) * ( (double) nobs / ( (double) (nobs - rank))); Patch attached. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-615) OLSMultipleRegression seems to fail on the Filippelli Data
OLSMultipleRegression seems to fail on the Filippelli Data -- Key: MATH-615 URL: https://issues.apache.org/jira/browse/MATH-615 Project: Commons Math Issue Type: Bug Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Running the Filipelli data results in an exception being thrown by OLSMutlipleRegression. The exception states that the matrix is singular. http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml I have added the data to the OLSMutlipleRegressionTest file. Unless I screwed something up in the passing of the data, it looks like the QR decomposition is failing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-615) OLSMultipleRegression seems to fail on the Filippelli Data
[ https://issues.apache.org/jira/browse/MATH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-615: -- Attachment: tstdiff The OLSMutlipleRegressionTest changes with Filipelli included... OLSMultipleRegression seems to fail on the Filippelli Data -- Key: MATH-615 URL: https://issues.apache.org/jira/browse/MATH-615 Project: Commons Math Issue Type: Bug Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Filippelli, NIST, OLSMutlipleRegression, QR, data Attachments: tstdiff Running the Filipelli data results in an exception being thrown by OLSMutlipleRegression. The exception states that the matrix is singular. http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml I have added the data to the OLSMutlipleRegressionTest file. Unless I screwed something up in the passing of the data, it looks like the QR decomposition is failing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-616) Wampler Test Data for OLSMultipleRegression
Wampler Test Data for OLSMultipleRegression --- Key: MATH-616 URL: https://issues.apache.org/jira/browse/MATH-616 Project: Commons Math Issue Type: Test Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski The current tests for the OLSMultipleRegression class do not include the Wampler1-4 datasets. This patch (which I will attach) includes the Wampler data. The test passes on my box after I lower the tolerances from 1.0e-8 to 1.0e-6 for the parameter vector on wampler4 (and 1, I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-616) Wampler Test Data for OLSMultipleRegression
[ https://issues.apache.org/jira/browse/MATH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-616: -- Attachment: wamplerdiff The wampler data... Wampler Test Data for OLSMultipleRegression --- Key: MATH-616 URL: https://issues.apache.org/jira/browse/MATH-616 Project: Commons Math Issue Type: Test Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Data, NIST, QR, Wampler Attachments: wamplerdiff The current tests for the OLSMultipleRegression class do not include the Wampler1-4 datasets. This patch (which I will attach) includes the Wampler data. The test passes on my box after I lower the tolerances from 1.0e-8 to 1.0e-6 for the parameter vector on wampler4 (and 1, I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-616) Wampler Test Data for OLSMultipleRegression
[ https://issues.apache.org/jira/browse/MATH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-616: -- Attachment: wamplerdiff2 Blame it on Netbeans!!! ;-) I must have hit format source... Hopefully this copy is better! Wampler Test Data for OLSMultipleRegression --- Key: MATH-616 URL: https://issues.apache.org/jira/browse/MATH-616 Project: Commons Math Issue Type: Test Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Data, NIST, QR, Wampler Attachments: wamplerdiff, wamplerdiff2 The current tests for the OLSMultipleRegression class do not include the Wampler1-4 datasets. This patch (which I will attach) includes the Wampler data. The test passes on my box after I lower the tolerances from 1.0e-8 to 1.0e-6 for the parameter vector on wampler4 (and 1, I think). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060774#comment-13060774 ] greg sterijevski commented on MATH-607: --- Phil, underlying solver is QR or Gaussian this info would exist. If the underlying method is SVD, then we would register the rank reduction, but we would not be able to attribute it to a particular column in the design matrix. I am probably in agreement with with making RegressionResults concrete, but there were a couple of considerations which forced me to interface. Say that I begin with the following augmented matrix: | X'X X'Y| | X'YY'Y| where X is the design matrix ( nobs x nreg ), Y is the dependent variable (nobs x 1 ) On a copy of the cross products matrix (the thing above), I get the following via gaussian elimination: | inv(X'X) -beta| | -beta e'e| inv(X'X) is the inverse of the X'X matrix. -beta is the OLS vector of slopes. e'e is the sum of squared errors. Getting most of the info (that RegressionResults surfaces) is simply a matter of indexing. All I need to do in this case is write a wrapper around a symmetric matrix which implements the interface. I suppose that there could be constructor which took the matrix above and did the indexing, but that seems too dirty. Furthermore, there are probably other optimized formats for OLS which have similar aspects. I wanted to keep the door open to other schemes, without making (potentially large) copies of variance matrices, standard errors and so forth a necessity. On the name of the getter for number of observations, I am okay with whatever you feel is a better name. So you are saying the UpdatingOLSRegression be an abstract class? There are not that many methods in the interface. That would be okay if were sure that subclasses always overrode either the regress(...) methods or the addObservations(...) methods. I worry that you might get have a base class full of nothing but abstract functions. So, modulo the one name change, I propose to just change these to classes Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060782#comment-13060782 ] greg sterijevski commented on MATH-607: --- Sorry for duplicating part of my response, but gmail has truncated it (maybe google is telling me something about my ideas... ;0 ) My complete response is: I agree on eliminating getRedundant() and isRedundant(int idx). If the underlying solver is QR or Gaussian this info would exist. If the underlying method is SVD, then we would register the rank reduction, but we would not be able to attribute it to a particular column in the design matrix. I am probably in agreement with with making RegressionResults concrete, but there were a couple of considerations which forced me to interface. Say that I begin with the following augmented matrix: | X'X X'Y| | X'YY'Y| where X is the design matrix ( nobs x nreg ), Y is the dependent variable (nobs x 1 ) On a copy of the cross products matrix (the thing above), I get the following via gaussian elimination: | inv(X'X) -beta| | -beta e'e| inv(X'X) is the inverse of the X'X matrix. -beta is the OLS vector of slopes. e'e is the sum of squared errors. Getting most of the info (that RegressionResults surfaces) is simply a matter of indexing. All I need to do in this case is write a wrapper around a symmetric matrix which implements the interface. I suppose that there could be constructor which took the matrix above and did the indexing, but that seems too dirty. Furthermore, there are probably other optimized formats for OLS which have similar aspects. I wanted to keep the door open to other schemes, without making (potentially large) copies of variance matrices, standard errors and so forth a necessity. On the name of the getter for number of observations, I am okay with whatever you feel is a better name. Regarding the model interface, I would again suggest that we just define this as a class, UpdatingOLSRegression. I suppose that if we end up implementing a weighted or other non-OLS version, we might want to factor out a common interface like what exists for MultipleLinearRegression, but in retrospect, I am not sure that interface was worth much. Note that all that we could factor out is essentially what is in MultivariateRegression, which is analogous to your RegressionResults. So you are saying the UpdatingOLSRegression be an abstract class? There are not that many methods in the interface. That would be okay if were sure that subclasses always overrode either the regress(...) methods or the addObservations(...) methods. I worry that you might get have a base class full of nothing but abstract functions. Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060792#comment-13060792 ] greg sterijevski commented on MATH-607: --- One more thing, on the subject of the adjusted R Squared. I am not sure I would include this, since this is dependent on knowledge that a constant exists. I currently envision being handed some data. If the data has a column which is nothing but ones, great. If not, great again. I could not come up with an elegant way to handle constant detection, and therefore a clean way to determine the Busse R squared. I guess we could keep a flag for each regressor. If the regressor has a changed value then we would say it is not a constant. The other approach is to test the residuals for bias-if there is no bias, then constant or not we are okay. Though that would be messy since I do not keep the data around. Either way makes for a bit of unpleasantness that yields very little? Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060800#comment-13060800 ] greg sterijevski commented on MATH-607: --- On the results object: There are vars *( vars + 1 ) /2 elements in the cov matrix, vars int parameters, vars int standard errors and a some other assorted stuff. Not terribly large at first. However, consider doing panel regression via dummy variables, the covariance matrix can get fast very quickly. That being said, I don't think making RegressionResults a concrete class is a gamestopper. Should I send a follow up patch with results made concrete? On the regression object: Are you concerned that we will be removing methods from any interface we specify today? Or do you think the contract is too restrictive? The reason I am pushing for interface is that I have two candidates for concrete implementation of updating regression. The first implementation is based on Gentleman's lemma and is detailed in the following article: Algorithm AS 274: Least Squares Routines to Supplement those of Gentleman Author: Alan J Miller Source Journal of the Royal Statistical Society Vol 41 No 2 (1992) The second approach is one detailed by this article by Goodnight: A Tutorial on the SWEEP Operator James H. Goodnight The American Statistician, Vol. 33, No. 3. (Aug., 1979), pp. 149-158. The first approach never forms the cross products matrix, the second does. They are significantly different approaches to dealing with large data sets. How would I do this in the concrete class you propose? Thanks, -Greg Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-607: -- Attachment: updating_reg_cut2 Phil, Attached is the patch based on your comments. Please review. -Greg Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: updating_reg_cut2, updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-608) Remove methods from RealMatrix Interface
Remove methods from RealMatrix Interface Key: MATH-608 URL: https://issues.apache.org/jira/browse/MATH-608 Project: Commons Math Issue Type: Improvement Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Priority: Minor The RealMatrix interface describes several methods which take a RealMatrix and yield a RealMatrix return. They are: RealMatrix multiply(RealMatrix m); RealMatrix preMultiply(RealMatrix m); RealMatrix power(final int p); RealMatrix add(RealMatrix m) RealMatrix subtract(RealMatrix m) There is nothing inherently wrong in making all subclasses of RealMatrix implement these methods. However, as the number of subclasses of RealMatrix increases, the complexity of these methods will also increase. I think these methods should be part of a separate class of 'operators' which handle matrix multiplication, addition, subtraction and exponentiation. Say for example, I implement SymmetricRealMatrix. I would like to store the data of a real symmetric in compressed form, so that I only consume (nrow + 1)*nrow /2 space in memory. When it comes time to implement multiply (for example), I must test to see if the RealMatrix given in the argument is also of Type SymmetricRealMatrix, since that will affect the algorithm I use to do the multiplication. I could access each element of the argument matrix via its getter, but efficiency will suffer. One can think of cases where we might have a DiagonalRealMatrix times a DiagonRealMatrix. One would not want to store the resultant diagonal in a general matrix storage. Keeping track of all of the permutations of Symmetrics, Diagonals,..., and their resultants inside of the body of a function makes for very brittle code. Furthermore, anytime a new type of matrix is defined all matrix multiplication routines would have to be updated. There are special types of operations which result in particular matrix patterns. A matrix times its transpose is itself a symmetric. A general matrix sandwiched between another general matrix and its transpose is a symmetric. Cholesky decompositions form upper and lower triangular matrices. These are common enough occurrences in statistical techniques that it makes sense to put them in their own class (perhaps as static methods). It would keep the contract of the RealMatrix classes very simple. The ReaMatrix would be nothing more than: 1. Marker (is the matrix General, Symmetric, Banded, Diagonal, UpperTriangular..) 2. Opaque data store (except for the operator classes, no one would need to know how the data is actually stored). 3. Indexing scheme. The reason I bring this up, is that I am attempting to write a SymmetricRealMatrix class to support variance-covariance matrices. I noticed that there are relatively few subclasses of RealMatrix. While it would be easy to hack it up for the handful of implementations that exist, that would probably create more problems as the number of types of matrices increases. Thank you, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Fix For: 3.0 The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.
[ https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] greg sterijevski updated MATH-607: -- Attachment: updating_reg_ifaces This is the patch file with the proposed changes. Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets. - Key: MATH-607 URL: https://issues.apache.org/jira/browse/MATH-607 Project: Commons Math Issue Type: New Feature Affects Versions: 3.0 Environment: Java Reporter: greg sterijevski Labels: Gentleman's, QR, Regression, Updating, decomposition, lemma Fix For: 3.0 Attachments: updating_reg_ifaces Original Estimate: 840h Remaining Estimate: 840h The current multiple regression class does a QR decomposition on the complete data set. This necessitates the loading incore of the complete dataset. For large datasets, or large datasets and a requirement to do datamining or stepwise regression this is not practical. There are techniques which form the normal equations on the fly, as well as ones which form the QR decomposition on an update basis. I am proposing, first, the specification of an UpdatingLinearRegression interface which defines basic functionality all such techniques must fulfill. Related to this 'updating' regression, the results of running a regression on some subset of the data should be encapsulated in an immutable object. This is to ensure that subsequent additions of observations do not corrupt or render inconsistent parameter estimates. I am calling this interface RegressionResults. Once the community has reached a consensus on the interface, work on the concrete implementation of these techniques will take place. Thanks, -Greg -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-465) Incorrect matrix rank via SVD
[ https://issues.apache.org/jira/browse/MATH-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054046#comment-13054046 ] greg sterijevski commented on MATH-465: --- My apologies if I am missing something, but here is what I noticed about the SVD. On lines 124-127 of SingularValueDecompositionImpl we have: for (int i = 0; i p; i++) { singularValues[i] = FastMath.sqrt(FastMath.abs(singularValues[i])); } This is potentially the offending line. First is the problem of negative eigenvalues. Negative variance in the principal components should probably be dealt with explicitly? Perhaps by throwing a MathException? Second, and the issue which this bug report deals with, is taking a square root of a very small number (1) will return a larger number. If you apply the threshold test in getRank() (final double threshold = FastMath.max(m, n) * FastMath.ulp(singularValues[0]) ) prior to taking the square root, I believe this problem would be resolved. More importantly, philosophically, you test for zero variance. This is the appropriate test. Also, rank could be precalculated in the above loop. Incorrect matrix rank via SVD - Key: MATH-465 URL: https://issues.apache.org/jira/browse/MATH-465 Project: Commons Math Issue Type: Bug Affects Versions: 2.1 Environment: Windows XP Prof. Vs. 2002 Reporter: Marisa Thoma Fix For: 3.0 The getRank() function of SingularValueDecompositionImpl does not work properly. This problem is probably related to the numerical stability problems mentioned in [MATH-327|https://issues.apache.org/jira/browse/MATH-327] and [MATH-320|https://issues.apache.org/jira/browse/MATH-320]. Example call with the standard matrix from R (rank 2): {code:title=TestSVDRank.java} import org.apache.commons.math.linear.Array2DRowRealMatrix; import org.apache.commons.math.linear.RealMatrix; import org.apache.commons.math.linear.SingularValueDecomposition; import org.apache.commons.math.linear.SingularValueDecompositionImpl; public class TestSVDRank { public static void main(String[] args) { double[][] d = { { 1, 1, 1 }, { 0, 0, 0 }, { 1, 2, 3 } }; RealMatrix m = new Array2DRowRealMatrix(d); SingularValueDecomposition svd = new SingularValueDecompositionImpl(m); int r = svd.getRank(); System.out.println(Rank: +r); } } {code} The rank is computed as 3. This problem also occurs for larger matrices. I discovered the problem when trying to replace the corresponding JAMA method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-601) SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation
SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation Key: MATH-601 URL: https://issues.apache.org/jira/browse/MATH-601 Project: Commons Math Issue Type: Bug Affects Versions: 2.2, 3.0 Environment: All Reporter: greg sterijevski In the SingularValueDecompositionImpl's internal private class Solver, a pseudo inverse matrix is calculated: In lines 2600-264 we have: if (singularValues[i] 0) { a = 1 / singularValues[i]; } else { a = 0; } This is not consistent with the manner in which rank is determined (lines 225 to 233). That is to say a matrix could potentially be rank deficient, yet the psuedoinverse would still include the redundant columns... Also, there is the problem of very small singular values which could result in overflow. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MATH-602) Inverse condition number
Inverse condition number Key: MATH-602 URL: https://issues.apache.org/jira/browse/MATH-602 Project: Commons Math Issue Type: Improvement Affects Versions: 2.2, 3.0 Environment: All Reporter: greg sterijevski Priority: Minor In SingularValueDecompositionImpl, the condition number is given as the ratio of the largest singular value to the smallest singular value. While this is the correct calculation, because of concerns over rank deficiency, researchers have traditionally used the inverse of the condition number as a more stable indicator of rank deficiency. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MATH-320) NaN singular value from SVD
[ https://issues.apache.org/jira/browse/MATH-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054069#comment-13054069 ] greg sterijevski commented on MATH-320: --- Did anyone notice that the 3rd eigenvalue is negative? On my box the eigenvalue is -2.1028862676867717E-14. I am not sure what the fix was, but whatever problems existed still persist. NaN singular value from SVD --- Key: MATH-320 URL: https://issues.apache.org/jira/browse/MATH-320 Project: Commons Math Issue Type: Bug Affects Versions: 2.0 Environment: Linux (Ubuntu 9.10) java version 1.6.0_16 Reporter: Dieter Vandenbussche Fix For: 2.1 The following jython code Start code from org.apache.commons.math.linear import * Alist = [[1.0, 2.0, 3.0],[2.0,3.0,4.0],[3.0,5.0,7.0]] A = Array2DRowRealMatrix(Alist) decomp = SingularValueDecompositionImpl(A) print decomp.getSingularValues() End code prints array('d', [11.218599757513008, 0.3781791648535976, nan]) The last singular value should be something very close to 0 since the matrix is rank deficient. When i use the result from getSolver() to solve a system, i end up with a bunch of NaNs in the solution. I assumed i would get back a least squares solution. Does this SVD implementation require that the matrix be full rank? If so, then i would expect an exception to be thrown from the constructor or one of the methods. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira