[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-09-23 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113431#comment-13113431
 ] 

greg sterijevski commented on MATH-607:
---

Yes, I concur. Besides, having such long interface name is not a good idea
either..




 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: RegressResults2, millerreg, millerreg_take2, 
 millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-675) MathUtils should have a static method which checks whether an array of doubles or Comparables is monotone

2011-09-22 Thread greg sterijevski (JIRA)
MathUtils should have a static method which checks whether an array of doubles 
or Comparables is monotone 
--

 Key: MATH-675
 URL: https://issues.apache.org/jira/browse/MATH-675
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
Assignee: greg sterijevski
Priority: Minor
 Fix For: 3.0


The static method checkOrder in MathUtils is a useful piece of code which 
checks for monotonically increasing or decreasing elements in an array. It 
would be useful to have a similar method for Comparable. Furthermore, this new 
method would just return true or false. Unlike the current checkOrder, no 
exception would be thrown if monotonicity did not exist. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-09-22 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113120#comment-13113120
 ] 

greg sterijevski commented on MATH-607:
---

I am pushing some changes to SimpleRegression which will allow it to support 
the UpdatingMultipleRegression interface. There are a couple of additions to 
Localizable. 

 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: RegressResults2, millerreg, millerreg_take2, 
 millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Closed] (MATH-649) SimpleRegression needs the ability to suppress the intercept

2011-09-09 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski closed MATH-649.
-

Resolution: Fixed
  Assignee: greg sterijevski

commit - r1167451
I have pushed the cleaned up code. The changes consist of the introduction of a 
boolean, hasIntercept and changes in the calculation of slope/intercept.  

 SimpleRegression needs the ability to suppress the intercept
 

 Key: MATH-649
 URL: https://issues.apache.org/jira/browse/MATH-649
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 1.2, 2.1, 2.2
 Environment: JAVA
Reporter: greg sterijevski
Assignee: greg sterijevski
Priority: Minor
  Labels: NOINTERCEPT, SIMPLEREGRESSION
 Fix For: 3.0

 Attachments: simplereg, simplereg2, simpleregtest

   Original Estimate: 2h
  Remaining Estimate: 2h

 The SimpleRegression class is a useful class for running regressions 
 involving one independent variable. It lacks the ability to constrain the 
 constant to be zero. I am attaching a patch which gives a constructor for 
 setting NOINT. I am also checking in two NIST data sets for noint estimation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-649) SimpleRegression needs the ability to suppress the intercept

2011-09-08 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-649:
--

Attachment: simplereg2

Now without all the formatting changes!

 SimpleRegression needs the ability to suppress the intercept
 

 Key: MATH-649
 URL: https://issues.apache.org/jira/browse/MATH-649
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 1.2, 2.1, 2.2
 Environment: JAVA
Reporter: greg sterijevski
Priority: Minor
  Labels: NOINTERCEPT, SIMPLEREGRESSION
 Fix For: 3.0

 Attachments: simplereg, simplereg2, simpleregtest

   Original Estimate: 2h
  Remaining Estimate: 2h

 The SimpleRegression class is a useful class for running regressions 
 involving one independent variable. It lacks the ability to constrain the 
 constant to be zero. I am attaching a patch which gives a constructor for 
 setting NOINT. I am also checking in two NIST data sets for noint estimation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-649) SimpleRegression needs the ability to suppress the intercept

2011-09-08 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100936#comment-13100936
 ] 

greg sterijevski commented on MATH-649:
---

I will check both the source changes and test changes once I have clean build 
(maven site) and there are no style or formatting errors. 



 SimpleRegression needs the ability to suppress the intercept
 

 Key: MATH-649
 URL: https://issues.apache.org/jira/browse/MATH-649
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 1.2, 2.1, 2.2
 Environment: JAVA
Reporter: greg sterijevski
Priority: Minor
  Labels: NOINTERCEPT, SIMPLEREGRESSION
 Fix For: 3.0

 Attachments: simplereg, simplereg2, simpleregtest

   Original Estimate: 2h
  Remaining Estimate: 2h

 The SimpleRegression class is a useful class for running regressions 
 involving one independent variable. It lacks the ability to constrain the 
 constant to be zero. I am attaching a patch which gives a constructor for 
 setting NOINT. I am also checking in two NIST data sets for noint estimation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-196) add support to constrained parameter estimation

2011-09-01 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095734#comment-13095734
 ] 

greg sterijevski commented on MATH-196:
---

Luc, 

I am not sure if we are talking about the same implementation, but this paper 
argues differently:

http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf

-Greg

 add support to constrained parameter estimation
 ---

 Key: MATH-196
 URL: https://issues.apache.org/jira/browse/MATH-196
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 1.2
Reporter: Luc Maisonobe
Assignee: Luc Maisonobe
 Fix For: 3.0


 The current estimation package supports only unconstrained problems. It 
 should at least support simple bounds constrains on parameters.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-655) General framework for iterative algorithms

2011-09-01 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095737#comment-13095737
 ] 

greg sterijevski commented on MATH-655:
---

In the IterativeAlgorithm class you use the generic Collection class which you 
instantiate with an ArrayList. Don't you think it would be better to use one of 
the classes like CopyOnWriteArraySet? This way you can have listeners attach 
and detach without explicit synchronization. 

 General framework for iterative algorithms
 --

 Key: MATH-655
 URL: https://issues.apache.org/jira/browse/MATH-655
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
Reporter: Sébastien Brisard
Priority: Minor
  Labels: algorithm, events
 Attachments: iterative-algorithm.zip


 Following the thread [Monitoring iterative 
 algorithms|http://mail-archives.apache.org/mod_mbox/commons-dev/201108.mbox/%3CCAGRH7HrgcgoBA=jcoKovjiQU=TjpQHnspBkOGNCu7oDdKk=k...@mail.gmail.com%3E],
  here is a first attempt at defining a general enough framework for iterative 
 algorithms at large. At the moment, the classes provide support for
 * maximum number of iterations
 * events handling
 ** initialization event (prior to entering the main loop),
 ** iteration event (after completion of one iteration),
 ** termination event (after termination of the main loop).
 These classes do not yet provide support for a stopping criterion.
 Some points worth to note
 * For the time being, the classes are part of the o.a.c.m.linear package.
 * For the time being, {{IterativeAlgorithm.incrementIterationCount()}} throws 
 a {{TooManyEvaluationsException}}. If the proposed new feature is integrated 
 into CM, then a proper {{TooManyIterationsException}} should be created, from 
 which the former could derive.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-651) eigendecompimpl allocates space for array imagEigenvalues when it is not needed

2011-08-24 Thread greg sterijevski (JIRA)
eigendecompimpl allocates space for array imagEigenvalues when it is not needed
---

 Key: MATH-651
 URL: https://issues.apache.org/jira/browse/MATH-651
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.1
 Environment: JAVA
Reporter: greg sterijevski
Priority: Minor
 Fix For: 3.1


The class variable imagEigenvalues is allocated even there is no use for it. I 
propose leaving the reference null. Patch will follow. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-651) eigendecompimpl allocates space for array imagEigenvalues when it is not needed

2011-08-24 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-651:
--

Attachment: eigendecompimpl

The patch with proposed changes... 

 eigendecompimpl allocates space for array imagEigenvalues when it is not 
 needed
 ---

 Key: MATH-651
 URL: https://issues.apache.org/jira/browse/MATH-651
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.1
 Environment: JAVA
Reporter: greg sterijevski
Priority: Minor
  Labels: EIGENDECOMPOSITIONIMPL
 Fix For: 3.1

 Attachments: eigendecompimpl


 The class variable imagEigenvalues is allocated even there is no use for it. 
 I propose leaving the reference null. Patch will follow. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-652) Tridiagonal QR decomposition has a faulty test for zero...

2011-08-24 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-652:
--

Attachment: tridiagonal

 Tridiagonal QR decomposition has a faulty test for zero... 
 ---

 Key: MATH-652
 URL: https://issues.apache.org/jira/browse/MATH-652
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.1
 Environment: JAVA
Reporter: greg sterijevski
  Labels: TriDiagonalTransformer
 Fix For: 3.1

 Attachments: tridiagonal

   Original Estimate: 1h
  Remaining Estimate: 1h

 In the method getQT() of TriDiagonalTransformer we have:
 public RealMatrix getQT() {
 if (cachedQt == null) {
 final int m = householderVectors.length;
 cachedQt = MatrixUtils.createRealMatrix(m, m);
 // build up first part of the matrix by applying Householder 
 transforms
 for (int k = m - 1; k = 1; --k) {
 final double[] hK = householderVectors[k - 1];
 cachedQt.setEntry(k, k, 1);
 final double inv = 1.0 / (secondary[k - 1] * hK[k]);
 if (hK[k] != 0.0) {
 double beta = 1.0 / secondary[k - 1];
 The faulty line is : final double inv = 1.0 / (secondary[k - 1] * hK[k]);
 It should be put after the test for the zero, eg:
 public RealMatrix getQT() {
 if (cachedQt == null) {
 final int m = householderVectors.length;
 cachedQt = MatrixUtils.createRealMatrix(m, m);
 // build up first part of the matrix by applying Householder 
 transforms
 for (int k = m - 1; k = 1; --k) {
 final double[] hK = householderVectors[k - 1];
 cachedQt.setEntry(k, k, 1);
 if (hK[k] != 0.0) {
 final double inv = 1.0 / (secondary[k - 1] * hK[k]);
 double beta = 1.0 / secondary[k - 1];

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-652) Tridiagonal QR decomposition has a faulty test for zero...

2011-08-24 Thread greg sterijevski (JIRA)
Tridiagonal QR decomposition has a faulty test for zero... 
---

 Key: MATH-652
 URL: https://issues.apache.org/jira/browse/MATH-652
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.1
 Environment: JAVA
Reporter: greg sterijevski
 Fix For: 3.1
 Attachments: tridiagonal

In the method getQT() of TriDiagonalTransformer we have:

public RealMatrix getQT() {
if (cachedQt == null) {
final int m = householderVectors.length;
cachedQt = MatrixUtils.createRealMatrix(m, m);

// build up first part of the matrix by applying Householder 
transforms
for (int k = m - 1; k = 1; --k) {
final double[] hK = householderVectors[k - 1];
cachedQt.setEntry(k, k, 1);
final double inv = 1.0 / (secondary[k - 1] * hK[k]);
if (hK[k] != 0.0) {
double beta = 1.0 / secondary[k - 1];

The faulty line is : final double inv = 1.0 / (secondary[k - 1] * hK[k]);
It should be put after the test for the zero, eg:

public RealMatrix getQT() {
if (cachedQt == null) {
final int m = householderVectors.length;
cachedQt = MatrixUtils.createRealMatrix(m, m);

// build up first part of the matrix by applying Householder 
transforms
for (int k = m - 1; k = 1; --k) {
final double[] hK = householderVectors[k - 1];
cachedQt.setEntry(k, k, 1);
if (hK[k] != 0.0) {
final double inv = 1.0 / (secondary[k - 1] * hK[k]);
double beta = 1.0 / secondary[k - 1];




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-649) SimpleRegression needs the ability to suppress the intercept

2011-08-22 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088737#comment-13088737
 ] 

greg sterijevski commented on MATH-649:
---

Mea Culpa!




 SimpleRegression needs the ability to suppress the intercept
 

 Key: MATH-649
 URL: https://issues.apache.org/jira/browse/MATH-649
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 1.2, 2.1, 2.2
 Environment: JAVA
Reporter: greg sterijevski
Priority: Minor
  Labels: NOINTERCEPT, SIMPLEREGRESSION
 Fix For: 3.0

 Attachments: simplereg, simpleregtest

   Original Estimate: 2h
  Remaining Estimate: 2h

 The SimpleRegression class is a useful class for running regressions 
 involving one independent variable. It lacks the ability to constrain the 
 constant to be zero. I am attaching a patch which gives a constructor for 
 setting NOINT. I am also checking in two NIST data sets for noint estimation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-649) SimpleRegression needs the ability to surpress the intercept

2011-08-20 Thread greg sterijevski (JIRA)
SimpleRegression needs the ability to surpress the intercept


 Key: MATH-649
 URL: https://issues.apache.org/jira/browse/MATH-649
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.1
 Environment: JAVA
Reporter: greg sterijevski
Priority: Minor
 Attachments: simplereg, simpleregtest

The SimpleRegression class is a useful class for running regressions involving 
one independent variable. It lacks the ability to constrain the constant to be 
zero. I am attaching a patch which gives a constructor for setting NOINT. I am 
also checking in two NIST data sets for noint estimation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-649) SimpleRegression needs the ability to surpress the intercept

2011-08-20 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-649:
--

Attachment: simpleregtest
simplereg

Simple regression updates... 

 SimpleRegression needs the ability to surpress the intercept
 

 Key: MATH-649
 URL: https://issues.apache.org/jira/browse/MATH-649
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.1
 Environment: JAVA
Reporter: greg sterijevski
Priority: Minor
  Labels: NOINTERCEPT, SIMPLEREGRESSION
 Attachments: simplereg, simpleregtest

   Original Estimate: 2h
  Remaining Estimate: 2h

 The SimpleRegression class is a useful class for running regressions 
 involving one independent variable. It lacks the ability to constrain the 
 constant to be zero. I am attaching a patch which gives a constructor for 
 setting NOINT. I am also checking in two NIST data sets for noint estimation. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-602) Inverse condition number

2011-08-13 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084667#comment-13084667
 ] 

greg sterijevski commented on MATH-602:
---

I agree. I been trying to cook up a nice illustration, but nothing that is
good enough yet.

In the meanwhile, the R manual has a good discussion which eloquently (at
least far more eloquently than me..) summarizes the usefulness of the
inverse condition number.

http://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/rcond.html

Suffice it to say, having an index [0..1] is a bit more useful in comparing
matrices than an unbounded number.

-Greg

PS Will post a better example after I have concocted it.







 Inverse condition number
 

 Key: MATH-602
 URL: https://issues.apache.org/jira/browse/MATH-602
 Project: Commons Math
  Issue Type: Improvement
Affects Versions: 2.2
 Environment: All
Reporter: greg sterijevski
Priority: Minor
  Labels: Condition, Inverse, Number
 Fix For: 3.0

 Attachments: svdinvcond, tstsvd

   Original Estimate: 1h
  Remaining Estimate: 1h

 In SingularValueDecompositionImpl, the condition number is given as the ratio 
 of the largest singular value to the smallest singular value. While this is 
 the correct calculation, because of concerns over rank deficiency, 
 researchers have traditionally used the inverse of the condition number as a 
 more stable indicator of rank deficiency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-602) Inverse condition number

2011-08-12 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084253#comment-13084253
 ] 

greg sterijevski commented on MATH-602:
---

Certainly, did not want to crowd up the ticket. Will submit it later in the
day.

-Greg




 Inverse condition number
 

 Key: MATH-602
 URL: https://issues.apache.org/jira/browse/MATH-602
 Project: Commons Math
  Issue Type: Improvement
Affects Versions: 2.2
 Environment: All
Reporter: greg sterijevski
Priority: Minor
  Labels: Condition, Inverse, Number
 Fix For: 3.0

 Attachments: svdinvcond

   Original Estimate: 1h
  Remaining Estimate: 1h

 In SingularValueDecompositionImpl, the condition number is given as the ratio 
 of the largest singular value to the smallest singular value. While this is 
 the correct calculation, because of concerns over rank deficiency, 
 researchers have traditionally used the inverse of the condition number as a 
 more stable indicator of rank deficiency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-602) Inverse condition number

2011-08-12 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-602:
--

Attachment: tstsvd

The first trivial test

 Inverse condition number
 

 Key: MATH-602
 URL: https://issues.apache.org/jira/browse/MATH-602
 Project: Commons Math
  Issue Type: Improvement
Affects Versions: 2.2
 Environment: All
Reporter: greg sterijevski
Priority: Minor
  Labels: Condition, Inverse, Number
 Fix For: 3.0

 Attachments: svdinvcond, tstsvd

   Original Estimate: 1h
  Remaining Estimate: 1h

 In SingularValueDecompositionImpl, the condition number is given as the ratio 
 of the largest singular value to the smallest singular value. While this is 
 the correct calculation, because of concerns over rank deficiency, 
 researchers have traditionally used the inverse of the condition number as a 
 more stable indicator of rank deficiency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-615) OLSMultipleRegression seems to fail on the Filippelli Data

2011-08-10 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082163#comment-13082163
 ] 

greg sterijevski commented on MATH-615:
---

Yes,  the major issue of singularity was one where I had a bug in the test.





 OLSMultipleRegression seems to fail on the Filippelli Data
 --

 Key: MATH-615
 URL: https://issues.apache.org/jira/browse/MATH-615
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Filippelli, NIST, OLSMutlipleRegression, QR, data
 Attachments: filippelli2, tstdiff


 Running the Filipelli data results in an exception being thrown by 
 OLSMutlipleRegression. The exception states that the matrix is singular. 
 http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml
 I have added the data to the OLSMutlipleRegressionTest file. 
 Unless I screwed something up in the passing of the data, it looks like the 
 QR decomposition is failing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-615) OLSMultipleRegression seems to fail on the Filippelli Data

2011-08-10 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082167#comment-13082167
 ] 

greg sterijevski commented on MATH-615:
---

Do check in the Filipelli test though.




 OLSMultipleRegression seems to fail on the Filippelli Data
 --

 Key: MATH-615
 URL: https://issues.apache.org/jira/browse/MATH-615
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Filippelli, NIST, OLSMutlipleRegression, QR, data
 Attachments: filippelli2, tstdiff


 Running the Filipelli data results in an exception being thrown by 
 OLSMutlipleRegression. The exception states that the matrix is singular. 
 http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml
 I have added the data to the OLSMutlipleRegressionTest file. 
 Unless I screwed something up in the passing of the data, it looks like the 
 QR decomposition is failing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-608) Remove methods from RealMatrix Interface

2011-08-10 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082332#comment-13082332
 ] 

greg sterijevski commented on MATH-608:
---

This was not a popular suggestion and would be a major slash and burn
operation. Postpone it, or kill it.

-Greg




 Remove methods from RealMatrix Interface
 

 Key: MATH-608
 URL: https://issues.apache.org/jira/browse/MATH-608
 Project: Commons Math
  Issue Type: Improvement
Affects Versions: 1.0, 1.1, 1.2, 2.0, 2.1, 2.2
 Environment: Java
Reporter: greg sterijevski
Priority: Minor
  Labels: Matrices
 Fix For: 3.0

   Original Estimate: 2h
  Remaining Estimate: 2h

 The RealMatrix interface describes several methods which take a RealMatrix 
 and yield a RealMatrix return. They are:
 RealMatrix multiply(RealMatrix m);
 RealMatrix preMultiply(RealMatrix m);
 RealMatrix power(final int p);
 RealMatrix add(RealMatrix m)
 RealMatrix subtract(RealMatrix m)
 There is nothing inherently wrong in making all subclasses of RealMatrix 
 implement these methods. However, as the number of subclasses of RealMatrix 
 increases, the complexity of these methods will also increase. I think these 
 methods should be part of a separate class of 'operators' which handle matrix 
 multiplication, addition, subtraction and exponentiation.
 Say for example, I implement SymmetricRealMatrix. I would like to store the 
 data of a real symmetric in compressed form, so that I only consume (nrow + 
 1)*nrow /2 space in memory. When it comes time to implement multiply (for 
 example), I must test to see if the RealMatrix given in the argument is also 
 of Type SymmetricRealMatrix, since that will affect the algorithm I use to do 
 the multiplication. I could access each element of the argument matrix via 
 its getter, but efficiency will suffer. One can think of cases where we might 
 have a DiagonalRealMatrix times a DiagonRealMatrix. One would not want to 
 store the resultant diagonal in a general matrix storage. Keeping track of 
 all of the permutations of Symmetrics, Diagonals,..., and their resultants 
 inside of the body of a function makes for very brittle code. Furthermore, 
 anytime a new type of matrix is defined all matrix multiplication routines 
 would have to be updated.  
 There are special types of operations which result in particular matrix 
 patterns. A matrix times its transpose is itself a symmetric. A general 
 matrix sandwiched between another general matrix and its transpose is a 
 symmetric. Cholesky decompositions form upper and lower triangular matrices. 
 These are common enough occurrences in statistical techniques that it makes 
 sense to put them in their own class (perhaps as static methods). It would 
 keep the contract of the RealMatrix classes very simple. The ReaMatrix would 
 be nothing more than:
 1. Marker (is the matrix General, Symmetric, Banded, Diagonal, 
 UpperTriangular..)
 2. Opaque data store (except for the operator classes, no one would need to 
 know how the data is actually stored).
 3. Indexing scheme. 
 The reason I bring this up, is that I am attempting to write a 
 SymmetricRealMatrix class to support variance-covariance matrices. I noticed 
 that there are relatively few subclasses of RealMatrix. While it would be 
 easy to hack it up for the handful of implementations that exist, that would 
 probably create more problems as the number of types of matrices increases.
 Thank you,
 -Greg 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-602) Inverse condition number

2011-08-10 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-602:
--

Attachment: svdinvcond

Patch

 Inverse condition number
 

 Key: MATH-602
 URL: https://issues.apache.org/jira/browse/MATH-602
 Project: Commons Math
  Issue Type: Improvement
Affects Versions: 2.2
 Environment: All
Reporter: greg sterijevski
Priority: Minor
  Labels: Condition, Inverse, Number
 Fix For: 3.0

 Attachments: svdinvcond

   Original Estimate: 1h
  Remaining Estimate: 1h

 In SingularValueDecompositionImpl, the condition number is given as the ratio 
 of the largest singular value to the smallest singular value. While this is 
 the correct calculation, because of concerns over rank deficiency, 
 researchers have traditionally used the inverse of the condition number as a 
 more stable indicator of rank deficiency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-601) SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation

2011-07-23 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069982#comment-13069982
 ] 

greg sterijevski commented on MATH-601:
---

Axel,

You are correct, while the getRank()  emthods criterion was changed to:
double tol = FastMath.max(m, n) * singularValues[0] * EPS;
there is nothing happening at line 591. The moore-penrose will still be not
consistent to the ranks calculation.

Line 591 and onwards:

if (singularValues[i]  0) {
a = 1 / singularValues[i];
} else {
a = 0;
}

So the change of the zero criterion is good, there is one more spot to fix.

I would also put a lower bound on tol:

tol = FastMath.max(m, n) * singularValues[0] * EPS;

if( FastMath.abs(tol)  FastMath.sqrt( MathUtils.SAFE_MIN) ){

}


-Greg




 SingularValueDecompositionImpl psuedoinverse is not consistent with Rank 
 calculation
 

 Key: MATH-601
 URL: https://issues.apache.org/jira/browse/MATH-601
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 2.2, 3.0
 Environment: All
Reporter: greg sterijevski
  Labels: Pseudoinverse
 Attachments: SingularValueDecompositionImpl.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In the SingularValueDecompositionImpl's internal private class Solver, a 
 pseudo inverse matrix is calculated:
 In lines 2600-264 we have:
 if (singularValues[i]  0) {
  a = 1 / singularValues[i];
 } else {
  a = 0;
 }
 This is not consistent with the manner in which rank is determined (lines 225 
 to 233). That is to say a matrix could potentially be rank deficient, yet the 
 psuedoinverse would still include the redundant columns... 
 Also, there is the problem of very small singular values which could result 
 in overflow.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-601) SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation

2011-07-23 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069983#comment-13069983
 ] 

greg sterijevski commented on MATH-601:
---

Sorry,

Sent the previous inadvertently,

f( FastMath.abs(tol)  FastMath.sqrt( MathUtils.SAFE_MIN) ){
 tol = FastMath.sqrt( MathUtils.SAFE_MIN);
}

That should guard against the case of a small matrix with small eigenvalues.

-Greg

On Sat, Jul 23, 2011 at 10:31 AM, Greg Sterijevski



 SingularValueDecompositionImpl psuedoinverse is not consistent with Rank 
 calculation
 

 Key: MATH-601
 URL: https://issues.apache.org/jira/browse/MATH-601
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 2.2, 3.0
 Environment: All
Reporter: greg sterijevski
  Labels: Pseudoinverse
 Attachments: SingularValueDecompositionImpl.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In the SingularValueDecompositionImpl's internal private class Solver, a 
 pseudo inverse matrix is calculated:
 In lines 2600-264 we have:
 if (singularValues[i]  0) {
  a = 1 / singularValues[i];
 } else {
  a = 0;
 }
 This is not consistent with the manner in which rank is determined (lines 225 
 to 233). That is to say a matrix could potentially be rank deficient, yet the 
 psuedoinverse would still include the redundant columns... 
 Also, there is the problem of very small singular values which could result 
 in overflow.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-601) SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation

2011-07-22 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069628#comment-13069628
 ] 

greg sterijevski commented on MATH-601:
---

Patch looks good to me... 

 SingularValueDecompositionImpl psuedoinverse is not consistent with Rank 
 calculation
 

 Key: MATH-601
 URL: https://issues.apache.org/jira/browse/MATH-601
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 2.2, 3.0
 Environment: All
Reporter: greg sterijevski
  Labels: Pseudoinverse
 Attachments: SingularValueDecompositionImpl.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 In the SingularValueDecompositionImpl's internal private class Solver, a 
 pseudo inverse matrix is calculated:
 In lines 2600-264 we have:
 if (singularValues[i]  0) {
  a = 1 / singularValues[i];
 } else {
  a = 0;
 }
 This is not consistent with the manner in which rank is determined (lines 225 
 to 233). That is to say a matrix could potentially be rank deficient, yet the 
 psuedoinverse would still include the redundant columns... 
 Also, there is the problem of very small singular values which could result 
 in overflow.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-07-21 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069322#comment-13069322
 ] 

greg sterijevski commented on MATH-607:
---

How do you propose to allow for the growth of global fit statistics? Keep
the getter pattern?
If we decide to keep the getter pattern, then for sure eliminate the array,
eliminate the static int indices.

What is the ML thread?

-Greg




 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: RegressResults2, millerreg, millerreg_take2, 
 millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-07-21 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069330#comment-13069330
 ] 

greg sterijevski commented on MATH-607:
---

Thanks on both counts. I can understand your reticence on the array
approach.




 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: RegressResults2, millerreg, millerreg_take2, 
 millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-624) Need a method to solve upper and lower triangular systems

2011-07-20 Thread greg sterijevski (JIRA)
Need a method to solve upper and lower triangular systems
-

 Key: MATH-624
 URL: https://issues.apache.org/jira/browse/MATH-624
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
 Fix For: 3.0


I have run into a need to solve triangular systems. While (as Phil and Ted 
point out) I could use the LU and QR decompositions, it seems cleaner to have a 
couple of static functions which do this. I am including a patch to provide an 
implementation and the tests.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-624) Need a method to solve upper and lower triangular systems

2011-07-20 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-624:
--

Attachment: upperlowertests
upperlowermethods

Both patches pass the checkstyle check and the findbugs... 



 Need a method to solve upper and lower triangular systems
 -

 Key: MATH-624
 URL: https://issues.apache.org/jira/browse/MATH-624
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Backsolve, Forwardsolve, LowerTriangular, UpperTriangular
 Fix For: 3.0

 Attachments: upperlowermethods, upperlowertests

   Original Estimate: 0h
  Remaining Estimate: 0h

 I have run into a need to solve triangular systems. While (as Phil and Ted 
 point out) I could use the LU and QR decompositions, it seems cleaner to have 
 a couple of static functions which do this. I am including a patch to provide 
 an implementation and the tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-07-20 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-607:
--

Attachment: millerreg_take2

Attached patch should fix the checkstyle errors... for the miller regression.

 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: millerreg, millerreg_take2, millerregtest, 
 regres_change1, updating_reg_cut2, updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-07-20 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-607:
--

Attachment: RegressResults2

This patch should fix the errors in checkstyle for RegressionResults. 

 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: RegressResults2, millerreg, millerreg_take2, 
 millerregtest, regres_change1, updating_reg_cut2, updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-615) OLSMultipleRegression seems to fail on the Filippelli Data

2011-07-19 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-615:
--

Attachment: filippelli2

The situation is not as dire as I first thought. The original test I uploaded 
had a bug which resulted in a singular matrix, correctly. This current test 
still fails, but the failure occurs with a tolerance of 1.0e-5 for the 
parameters. 

 OLSMultipleRegression seems to fail on the Filippelli Data
 --

 Key: MATH-615
 URL: https://issues.apache.org/jira/browse/MATH-615
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Filippelli, NIST, OLSMutlipleRegression, QR, data
 Attachments: filippelli2, tstdiff


 Running the Filipelli data results in an exception being thrown by 
 OLSMutlipleRegression. The exception states that the matrix is singular. 
 http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml
 I have added the data to the OLSMutlipleRegressionTest file. 
 Unless I screwed something up in the passing of the data, it looks like the 
 QR decomposition is failing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-07-18 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-607:
--

Attachment: millerregtest
millerreg

Attached is the Miller regression and tests. 



 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: millerreg, millerregtest, regres_change1, 
 updating_reg_cut2, updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-619) ADJUSTED R SQUARED INCORRECT IN REGRESSION RESULTS

2011-07-13 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-619:
--

Attachment: regres

 ADJUSTED R SQUARED INCORRECT IN REGRESSION RESULTS
 --

 Key: MATH-619
 URL: https://issues.apache.org/jira/browse/MATH-619
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
 Attachments: regres


 I forgot to cast to double when dividing two integers:
 this.globalFitInfo[ADJRSQ_IDX] = 1.0 - 
 (1.0 - this.globalFitInfo[RSQ_IDX]) *
 (  nobs / ( (nobs - rank)));
 Should be
 this.globalFitInfo[ADJRSQ_IDX] = 1.0 - 
 (1.0 - this.globalFitInfo[RSQ_IDX]) *
 ( (double) nobs / ( (double) (nobs - rank)));
 Patch attached.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-619) ADJUSTED R SQUARED INCORRECT IN REGRESSION RESULTS

2011-07-13 Thread greg sterijevski (JIRA)
ADJUSTED R SQUARED INCORRECT IN REGRESSION RESULTS
--

 Key: MATH-619
 URL: https://issues.apache.org/jira/browse/MATH-619
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
 Attachments: regres

I forgot to cast to double when dividing two integers:

this.globalFitInfo[ADJRSQ_IDX] = 1.0 - 
(1.0 - this.globalFitInfo[RSQ_IDX]) *
(  nobs / ( (nobs - rank)));
Should be
this.globalFitInfo[ADJRSQ_IDX] = 1.0 - 
(1.0 - this.globalFitInfo[RSQ_IDX]) *
( (double) nobs / ( (double) (nobs - rank)));

Patch attached.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-615) OLSMultipleRegression seems to fail on the Filippelli Data

2011-07-12 Thread greg sterijevski (JIRA)
OLSMultipleRegression seems to fail on the Filippelli Data
--

 Key: MATH-615
 URL: https://issues.apache.org/jira/browse/MATH-615
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski


Running the Filipelli data results in an exception being thrown by 
OLSMutlipleRegression. The exception states that the matrix is singular. 
http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml

I have added the data to the OLSMutlipleRegressionTest file. 

Unless I screwed something up in the passing of the data, it looks like the QR 
decomposition is failing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-615) OLSMultipleRegression seems to fail on the Filippelli Data

2011-07-12 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-615:
--

Attachment: tstdiff

The OLSMutlipleRegressionTest changes with Filipelli included... 

 OLSMultipleRegression seems to fail on the Filippelli Data
 --

 Key: MATH-615
 URL: https://issues.apache.org/jira/browse/MATH-615
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Filippelli, NIST, OLSMutlipleRegression, QR, data
 Attachments: tstdiff


 Running the Filipelli data results in an exception being thrown by 
 OLSMutlipleRegression. The exception states that the matrix is singular. 
 http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml
 I have added the data to the OLSMutlipleRegressionTest file. 
 Unless I screwed something up in the passing of the data, it looks like the 
 QR decomposition is failing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-616) Wampler Test Data for OLSMultipleRegression

2011-07-12 Thread greg sterijevski (JIRA)
Wampler Test Data for OLSMultipleRegression
---

 Key: MATH-616
 URL: https://issues.apache.org/jira/browse/MATH-616
 Project: Commons Math
  Issue Type: Test
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski


The current tests for the OLSMultipleRegression class do not include the 
Wampler1-4 datasets. This patch (which I will attach) includes the Wampler data.

The test passes on my box after I lower the tolerances from 1.0e-8 to 1.0e-6 
for the parameter vector on wampler4 (and 1, I think). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-616) Wampler Test Data for OLSMultipleRegression

2011-07-12 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-616:
--

Attachment: wamplerdiff

The wampler data...

 Wampler Test Data for OLSMultipleRegression
 ---

 Key: MATH-616
 URL: https://issues.apache.org/jira/browse/MATH-616
 Project: Commons Math
  Issue Type: Test
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Data, NIST, QR, Wampler
 Attachments: wamplerdiff


 The current tests for the OLSMultipleRegression class do not include the 
 Wampler1-4 datasets. This patch (which I will attach) includes the Wampler 
 data.
 The test passes on my box after I lower the tolerances from 1.0e-8 to 1.0e-6 
 for the parameter vector on wampler4 (and 1, I think). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-616) Wampler Test Data for OLSMultipleRegression

2011-07-12 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-616:
--

Attachment: wamplerdiff2

Blame it on Netbeans!!! ;-) I must have hit format source... 

Hopefully this copy is better! 

 Wampler Test Data for OLSMultipleRegression
 ---

 Key: MATH-616
 URL: https://issues.apache.org/jira/browse/MATH-616
 Project: Commons Math
  Issue Type: Test
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Data, NIST, QR, Wampler
 Attachments: wamplerdiff, wamplerdiff2


 The current tests for the OLSMultipleRegression class do not include the 
 Wampler1-4 datasets. This patch (which I will attach) includes the Wampler 
 data.
 The test passes on my box after I lower the tolerances from 1.0e-8 to 1.0e-6 
 for the parameter vector on wampler4 (and 1, I think). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-07-06 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060774#comment-13060774
 ] 

greg sterijevski commented on MATH-607:
---

Phil,

underlying solver is QR or Gaussian this info would exist. If the underlying
method is SVD, then we would register the rank reduction, but we would not
be able to attribute it to a particular column in the design matrix.

I am probably in agreement with with making RegressionResults concrete, but
there were a couple of considerations which forced me to interface.

Say that I begin with the following augmented matrix:
 | X'X X'Y|
 | X'YY'Y|
  where X is the design matrix ( nobs x nreg ), Y is the dependent variable
(nobs x 1 )

On a copy of the cross products matrix (the thing above), I get the
following via gaussian elimination:

 | inv(X'X) -beta|
 | -beta   e'e|

inv(X'X) is the inverse of the X'X matrix. -beta is the OLS vector of
slopes. e'e is the sum of squared errors.

Getting most of the info (that RegressionResults surfaces) is simply a
matter of indexing. All I need to do in this case is write a wrapper around
a symmetric matrix which implements the interface.

I suppose that there could be constructor which took the matrix above and
did the indexing, but that seems too dirty. Furthermore, there are probably
other optimized formats for OLS which have similar aspects. I wanted to keep
the door open to other schemes, without making (potentially large) copies of
variance matrices, standard errors and so forth a necessity.


On the name of the getter for number of observations, I am okay with
whatever you feel is a better name.



So you are saying the UpdatingOLSRegression be an abstract class? There are
not that many methods in the interface. That would be okay if were sure that
subclasses always overrode either the regress(...) methods or the
addObservations(...) methods. I worry that you might get have a base class
full of nothing but abstract functions.

So, modulo the one name change, I propose to just change these to classes


 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-07-06 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060782#comment-13060782
 ] 

greg sterijevski commented on MATH-607:
---

Sorry for duplicating part of my response, but gmail has truncated it (maybe 
google is telling me something about my ideas... ;0 )

My complete response is:

I agree on eliminating getRedundant() and isRedundant(int idx). If the 
underlying solver is QR or Gaussian this info would exist. If the underlying 
method is SVD, then we would register the rank reduction, but we would not be 
able to attribute it to a particular column in the design matrix.

I am probably in agreement with with making RegressionResults concrete, but 
there were a couple of considerations which forced me to interface.

Say that I begin with the following augmented matrix:
 | X'X X'Y|
 | X'YY'Y|
  where X is the design matrix ( nobs x nreg ), Y is the dependent variable 
(nobs x 1 )

On a copy of the cross products matrix (the thing above), I get the following 
via gaussian elimination:

 | inv(X'X) -beta|
 | -beta   e'e|

inv(X'X) is the inverse of the X'X matrix. -beta is the OLS vector of slopes. 
e'e is the sum of squared errors.

Getting most of the info (that RegressionResults surfaces) is simply a matter 
of indexing. All I need to do in this case is write a wrapper around a 
symmetric matrix which implements the interface.

I suppose that there could be constructor which took the matrix above and did 
the indexing, but that seems too dirty. Furthermore, there are probably other 
optimized formats for OLS which have similar aspects. I wanted to keep the door 
open to other schemes, without making (potentially large) copies of variance 
matrices, standard errors and so forth a necessity.


On the name of the getter for number of observations, I am okay with whatever 
you feel is a better name.
 

Regarding the model interface, I would again suggest that we just define 
this as a class, UpdatingOLSRegression.  I suppose that if we end up 
implementing a weighted or other non-OLS version, we might want to factor out a 
common interface like what exists for MultipleLinearRegression, but in 
retrospect, I am not sure that interface was worth much.  Note that all that we 
could factor out is essentially what is in MultivariateRegression, which is 
analogous to your RegressionResults.


So you are saying the UpdatingOLSRegression be an abstract class? There are not 
that many methods in the interface. That would be okay if were sure that 
subclasses always overrode either the regress(...) methods or the 
addObservations(...) methods. I worry that you might get have a base class full 
of nothing but abstract functions.

 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-07-06 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060792#comment-13060792
 ] 

greg sterijevski commented on MATH-607:
---

One more thing, on the subject of the adjusted R Squared. I am not sure I would 
include this, since this is dependent on knowledge that a constant exists. I 
currently envision being handed some data. If the data has a column which is 
nothing but ones, great. If not, great again. I could not come up with an 
elegant way to handle constant detection, and therefore a clean way to 
determine the Busse R squared. 

I guess we could keep a flag for each regressor. If the regressor has a changed 
value then we would say it is not a constant. The other approach is to test the 
residuals for bias-if there is no bias, then constant or not we are okay. 
Though that would be messy since I do not keep the data around. Either way 
makes for a bit of unpleasantness that yields very little? 

 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-07-06 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060800#comment-13060800
 ] 

greg sterijevski commented on MATH-607:
---

On the results object:

There are vars *( vars + 1 ) /2 elements in the cov matrix, vars int
parameters, vars int standard errors and a some other assorted stuff. Not
terribly large at first. However, consider doing panel regression via dummy
variables, the covariance matrix can get fast very quickly. That being said,
I don't think making RegressionResults a concrete class is a gamestopper.
Should I send a follow up patch with results made concrete?

On the regression object:

Are you concerned that we will be removing methods from any interface we
specify today? Or do you think the contract is too restrictive? The reason I
am pushing for interface is that I have two candidates for concrete
implementation of updating regression. The first implementation is based on
Gentleman's lemma and is detailed in the following article:

Algorithm AS 274: Least Squares Routines to Supplement those of Gentleman
Author: Alan J Miller
Source Journal of the Royal Statistical Society Vol 41 No 2 (1992)

The second approach is one detailed by this article by Goodnight:
A Tutorial on the SWEEP Operator
James H. Goodnight
The American Statistician, Vol. 33, No. 3. (Aug., 1979), pp. 149-158.

The first approach never forms the cross products matrix, the second does.
They are significantly different approaches to dealing with large data sets.


How would I do this in the concrete class you propose?

Thanks,

-Greg





 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-07-06 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-607:
--

Attachment: updating_reg_cut2

Phil,

Attached is the patch based on your comments. Please review.

-Greg

 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: updating_reg_cut2, updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-608) Remove methods from RealMatrix Interface

2011-06-30 Thread greg sterijevski (JIRA)
Remove methods from RealMatrix Interface


 Key: MATH-608
 URL: https://issues.apache.org/jira/browse/MATH-608
 Project: Commons Math
  Issue Type: Improvement
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
Priority: Minor


The RealMatrix interface describes several methods which take a RealMatrix and 
yield a RealMatrix return. They are:

RealMatrix multiply(RealMatrix m);
RealMatrix preMultiply(RealMatrix m);
RealMatrix power(final int p);
RealMatrix add(RealMatrix m)
RealMatrix subtract(RealMatrix m)

There is nothing inherently wrong in making all subclasses of RealMatrix 
implement these methods. However, as the number of subclasses of RealMatrix 
increases, the complexity of these methods will also increase. I think these 
methods should be part of a separate class of 'operators' which handle matrix 
multiplication, addition, subtraction and exponentiation.

Say for example, I implement SymmetricRealMatrix. I would like to store the 
data of a real symmetric in compressed form, so that I only consume (nrow + 
1)*nrow /2 space in memory. When it comes time to implement multiply (for 
example), I must test to see if the RealMatrix given in the argument is also of 
Type SymmetricRealMatrix, since that will affect the algorithm I use to do the 
multiplication. I could access each element of the argument matrix via its 
getter, but efficiency will suffer. One can think of cases where we might have 
a DiagonalRealMatrix times a DiagonRealMatrix. One would not want to store the 
resultant diagonal in a general matrix storage. Keeping track of all of the 
permutations of Symmetrics, Diagonals,..., and their resultants inside of the 
body of a function makes for very brittle code. Furthermore, anytime a new type 
of matrix is defined all matrix multiplication routines would have to be 
updated.  

There are special types of operations which result in particular matrix 
patterns. A matrix times its transpose is itself a symmetric. A general matrix 
sandwiched between another general matrix and its transpose is a symmetric. 
Cholesky decompositions form upper and lower triangular matrices. These are 
common enough occurrences in statistical techniques that it makes sense to put 
them in their own class (perhaps as static methods). It would keep the contract 
of the RealMatrix classes very simple. The ReaMatrix would be nothing more than:

1. Marker (is the matrix General, Symmetric, Banded, Diagonal, 
UpperTriangular..)
2. Opaque data store (except for the operator classes, no one would need to 
know how the data is actually stored).
3. Indexing scheme. 

The reason I bring this up, is that I am attempting to write a 
SymmetricRealMatrix class to support variance-covariance matrices. I noticed 
that there are relatively few subclasses of RealMatrix. While it would be easy 
to hack it up for the handful of implementations that exist, that would 
probably create more problems as the number of types of matrices increases.

Thank you,

-Greg 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-06-29 Thread greg sterijevski (JIRA)
Current Multiple Regression Object does calculations with all data incore. 
There are non incore techniques which would be useful with large datasets.
-

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
 Fix For: 3.0


The current multiple regression class does a QR decomposition on the complete 
data set. This necessitates the loading incore of the complete dataset. For 
large datasets, or large datasets and a requirement to do datamining or 
stepwise regression this is not practical. There are techniques which form the 
normal equations on the fly, as well as ones which form the QR decomposition on 
an update basis. I am proposing, first, the specification of an 
UpdatingLinearRegression interface which defines basic functionality all such 
techniques must fulfill. 

Related to this 'updating' regression, the results of running a regression on 
some subset of the data should be encapsulated in an immutable object. This is 
to ensure that subsequent additions of observations do not corrupt or render 
inconsistent parameter estimates. I am calling this interface 
RegressionResults.  

Once the community has reached a consensus on the interface, work on the 
concrete implementation of these techniques will take place.

Thanks,

-Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-607) Current Multiple Regression Object does calculations with all data incore. There are non incore techniques which would be useful with large datasets.

2011-06-29 Thread greg sterijevski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

greg sterijevski updated MATH-607:
--

Attachment: updating_reg_ifaces

This is the patch file with the proposed changes.

 Current Multiple Regression Object does calculations with all data incore. 
 There are non incore techniques which would be useful with large datasets.
 -

 Key: MATH-607
 URL: https://issues.apache.org/jira/browse/MATH-607
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
 Environment: Java
Reporter: greg sterijevski
  Labels: Gentleman's, QR, Regression, Updating, decomposition, 
 lemma
 Fix For: 3.0

 Attachments: updating_reg_ifaces

   Original Estimate: 840h
  Remaining Estimate: 840h

 The current multiple regression class does a QR decomposition on the complete 
 data set. This necessitates the loading incore of the complete dataset. For 
 large datasets, or large datasets and a requirement to do datamining or 
 stepwise regression this is not practical. There are techniques which form 
 the normal equations on the fly, as well as ones which form the QR 
 decomposition on an update basis. I am proposing, first, the specification of 
 an UpdatingLinearRegression interface which defines basic functionality all 
 such techniques must fulfill. 
 Related to this 'updating' regression, the results of running a regression on 
 some subset of the data should be encapsulated in an immutable object. This 
 is to ensure that subsequent additions of observations do not corrupt or 
 render inconsistent parameter estimates. I am calling this interface 
 RegressionResults.  
 Once the community has reached a consensus on the interface, work on the 
 concrete implementation of these techniques will take place.
 Thanks,
 -Greg

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-465) Incorrect matrix rank via SVD

2011-06-23 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054046#comment-13054046
 ] 

greg sterijevski commented on MATH-465:
---

My apologies if I am missing something, but here is what I noticed about the 
SVD. 

On lines 124-127 of SingularValueDecompositionImpl we have:

for (int i = 0; i  p; i++) {
singularValues[i] = FastMath.sqrt(FastMath.abs(singularValues[i]));
}

This is potentially the offending line. First is the problem of negative 
eigenvalues. Negative variance in the principal components should probably be 
dealt with explicitly? Perhaps by throwing a MathException? Second, and the 
issue which this bug report deals with, is taking a square root of a very small 
number (1) will return a larger number. If you apply the threshold test in 
getRank() (final double threshold = FastMath.max(m, n) * 
FastMath.ulp(singularValues[0]) )  prior to taking the square root, I believe 
this problem would be resolved. More importantly, philosophically, you test for 
zero variance. This is the appropriate test.

Also, rank could be precalculated in the above loop. 

 Incorrect matrix rank via SVD
 -

 Key: MATH-465
 URL: https://issues.apache.org/jira/browse/MATH-465
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 2.1
 Environment: Windows XP Prof. Vs. 2002
Reporter: Marisa Thoma
 Fix For: 3.0


 The getRank() function of SingularValueDecompositionImpl does not work 
 properly. This problem is probably related to the numerical stability 
 problems mentioned in 
 [MATH-327|https://issues.apache.org/jira/browse/MATH-327] and 
 [MATH-320|https://issues.apache.org/jira/browse/MATH-320].
 Example call with the standard matrix from R (rank 2):
 {code:title=TestSVDRank.java}
 import org.apache.commons.math.linear.Array2DRowRealMatrix;
 import org.apache.commons.math.linear.RealMatrix;
 import org.apache.commons.math.linear.SingularValueDecomposition;
 import org.apache.commons.math.linear.SingularValueDecompositionImpl;
 public class TestSVDRank {
   public static void main(String[] args) {
   double[][] d = { { 1, 1, 1 }, { 0, 0, 0 }, { 1, 2, 3 } };
   RealMatrix m = new Array2DRowRealMatrix(d);
   SingularValueDecomposition svd = new 
 SingularValueDecompositionImpl(m);
   int r = svd.getRank();
   System.out.println(Rank: +r);
   }
 }
 {code} 
 The rank is computed as 3. This problem also occurs for larger matrices. I 
 discovered the problem when trying to replace the corresponding JAMA method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-601) SingularValueDecompositionImpl psuedoinverse is not consistent with Rank calculation

2011-06-23 Thread greg sterijevski (JIRA)
SingularValueDecompositionImpl psuedoinverse is not consistent with Rank 
calculation


 Key: MATH-601
 URL: https://issues.apache.org/jira/browse/MATH-601
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 2.2, 3.0
 Environment: All
Reporter: greg sterijevski


In the SingularValueDecompositionImpl's internal private class Solver, a pseudo 
inverse matrix is calculated:

In lines 2600-264 we have:

if (singularValues[i]  0) {
 a = 1 / singularValues[i];
} else {
 a = 0;
}

This is not consistent with the manner in which rank is determined (lines 225 
to 233). That is to say a matrix could potentially be rank deficient, yet the 
psuedoinverse would still include the redundant columns... 

Also, there is the problem of very small singular values which could result in 
overflow.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MATH-602) Inverse condition number

2011-06-23 Thread greg sterijevski (JIRA)
Inverse condition number


 Key: MATH-602
 URL: https://issues.apache.org/jira/browse/MATH-602
 Project: Commons Math
  Issue Type: Improvement
Affects Versions: 2.2, 3.0
 Environment: All
Reporter: greg sterijevski
Priority: Minor


In SingularValueDecompositionImpl, the condition number is given as the ratio 
of the largest singular value to the smallest singular value. While this is the 
correct calculation, because of concerns over rank deficiency, researchers have 
traditionally used the inverse of the condition number as a more stable 
indicator of rank deficiency.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-320) NaN singular value from SVD

2011-06-23 Thread greg sterijevski (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054069#comment-13054069
 ] 

greg sterijevski commented on MATH-320:
---

Did anyone notice that the 3rd eigenvalue is negative? On my box the eigenvalue 
is -2.1028862676867717E-14. I am not sure what the fix was, but whatever 
problems existed still persist. 

 NaN singular value from SVD
 ---

 Key: MATH-320
 URL: https://issues.apache.org/jira/browse/MATH-320
 Project: Commons Math
  Issue Type: Bug
Affects Versions: 2.0
 Environment: Linux (Ubuntu 9.10) java version 1.6.0_16
Reporter: Dieter Vandenbussche
 Fix For: 2.1


 The following jython code
 Start code
 from org.apache.commons.math.linear import *
  
 Alist = [[1.0, 2.0, 3.0],[2.0,3.0,4.0],[3.0,5.0,7.0]]
  
 A = Array2DRowRealMatrix(Alist)
  
 decomp = SingularValueDecompositionImpl(A)
  
 print decomp.getSingularValues()
 End code
 prints
 array('d', [11.218599757513008, 0.3781791648535976, nan])
 The last singular value should be something very close to 0 since the matrix
 is rank deficient.  When i use the result from getSolver() to solve a system, 
 i end 
 up with a bunch of NaNs in the solution.  I assumed i would get back a least 
 squares solution.
 Does this SVD implementation require that the matrix be full rank?  If so, 
 then i would expect
 an exception to be thrown from the constructor or one of the methods.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira