Re: [R] package:Matrix handling of data with identical indices

2006-07-10 Thread Martin Maechler
 roger == roger koenker [EMAIL PROTECTED]
 on Sun, 9 Jul 2006 12:31:16 -0500 writes:

roger On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:

 Your matrix Mc should be flagged as invalid.  Martin and
 I should discuss whether we want to add such a test to
 the validity method.  It is not difficult to add the test
 but there will be a penalty in that it will slow down all
 operations on such matrices and I'm not sure if we want
 to pay that price to catch a rather infrequently occuring
 problem.

roger Elaborating the validity procedure to flag such
roger instances seems to be well worth the speed penalty in
roger my view.  Of course, anticipating every such misstep
roger imposes a heavy burden on developers and constitutes
roger the real cost of more elaborate validity checking.

As I found, we already *have* a validate_dgCMatrix  in C code,
and adding an improved test for the validity of the 'p' slot,
solves ``all problems'' mentioned above --- without any
performance penalty.
Hence., in the upcoming next version of 'Matrix' (0.95-12),
John will get a proper error message immediately from
calling new(...) with the wrong 'p' (or 'Dim').

Martin

roger [My 2cents based on experience with SparseM.]

roger url: www.econ.uiuc.edu/~roger Roger Koenker email
roger [EMAIL PROTECTED] Department of Economics vox:
roger 217-333-4558 University of Illinois fax: 217-244-6678
roger Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] package:Matrix handling of data with identical indices

2006-07-09 Thread Douglas Bates
The apparent contradiction here is again a case of inadequate
documentation and checking.

Because the order of the triplets in the triplet form of a sparse
matrix is essentially random it is permissible to have repeated
indices.  As you have seen, the interpretation of repeated indices is
that the value at any index is the sum of the values in the triplets
corresponding to that index.

 It is not permissible to have repeated indices in the compressed
form.  In the compressed form there is a well-defined ordering of the
indices, first by columns then by row within column and the row
indices must be increasing within columns.

Your matrix Mc should be flagged as invalid.  Martin and I should
discuss whether we want to add such a test to the validity method.  It
is not difficult to add the test but there will be a penalty in that
it will slow down all operations on such matrices and I'm not sure if
we want to pay that price to catch a rather infrequently occuring
problem.

There is some documentation of the internal representations of these
formats in the directory $R_LIB/Matrix/doc/UFSparse/.  The User Guides
in that directory are taken directly from the various sparse matrix
packages that Tim Davis at the University of Florida has written.  He
has a book that is scheduled for publication this  September

Tim Davis (2006), Direct Methods for Sparse Linear Systems, SIAM,
Philadelphia, PA

I hope we will be able to refer to that book for details of the
representation and algorithms.

On 7/8/06, Thaden, John J [EMAIL PROTECTED] wrote:
 In the Matrix package v. 0.995-11 I see that the dgTMatrix
 Class for compressed, sparse, triplet-form matrices handles
 Identically indexed data instances by summing their values,
 e.g.,

 library(Matrix)
 (Mt - new(dgTMatrix,
i = as.integer(c(0,0,1,1,4)),
j = as.integer(c(0,1,2,2,4)),
x = as.double(1:5),
Dim = as.integer(c(5,5
 ## 5 x 5 sparse Matrix of class dgTMatrix
 ## [1,] 1 2 . . .
 ## [2,] . . 7 . .--- 7 = 3 + 4.
 ## [3,] . . . . .
 ## [4,] . . . . .
 ## [5,] . . . . 5

 # If instead I make a dgCMatrix-class matrix, the first
 # instance is overwritten by the second, e.g.,

 library(Matrix)
 (Mc - new(dgCMatrix,
i = as.integer(c(0,0,1,1,4)),
p = as.integer(c(0,1,2,4,5)),
x = as.double(1:5),
Dim = as.integer(c(5,5
 ## 5 x 5 sparse Matrix of class dgCMatrix
 ##
 ## [1,] 1 2 . .
 ## [2,] . . 4 .   -- the datum '3' has been lost.
 ## [3,] . . . .
 ## [4,] . . . .
 ## [5,] . . . 5

 # If one arrives at the dgCMatrix via the dgTMatrix class,
 # the summed value is of course preserved, e.g.,

 (Mtc - as(Mt, dgCMatrix))
 ## 5 x 5 sparse Matrix of class dgCMatrix
 ##
 ## [1,] 1 2 . . .
 ## [2,] . . 7 . .
 ## [3,] . . . . .
 ## [4,] . . . . .
 ## [5,] . . . . 5

 As there is nothing inherent in either compressed, sparse,
 format that would prevent recognition and handling of
 duplicated index pairs, I'm curious why the dgCMatrix
 class doesn't also add x values in those instances?
 I wonder also if others might benefit also by being able
 to choose how these instances are handled, i.e.,
 whether they are summed, averaged or overwritten?

 -John Thaden, Ph.D.
 Research Assistant Professor of Geriatrics
 University of Arkansas for Medical Sciences
 Little Rock AR, USA


 Confidentiality Notice: This e-mail message, including any attachments, is 
 for the sole use of the intended recipient(s) and may contain confidential 
 and privileged information.  Any unauthorized review, use, disclosure or 
 distribution is prohibited.  If you are not the intended recipient, please 
 contact the sender by reply e-mail and destroy all copies of the original 
 message.



__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] package:Matrix handling of data with identical indices

2006-07-09 Thread roger koenker


On 7/8/06, Thaden, John J [EMAIL PROTECTED] wrote:

 As there is nothing inherent in either compressed, sparse,
 format that would prevent recognition and handling of
 duplicated index pairs, I'm curious why the dgCMatrix
 class doesn't also add x values in those instances?

why not multiply them?  or take the larger one, or ...?  I would
interpret this as a case of user negligence -- there is no
natural default behavior for such cases.

On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:

 Your matrix Mc should be flagged as invalid.  Martin and I should
 discuss whether we want to add such a test to the validity method.  It
 is not difficult to add the test but there will be a penalty in that
 it will slow down all operations on such matrices and I'm not sure if
 we want to pay that price to catch a rather infrequently occuring
 problem.

Elaborating the validity procedure to flag such instances seems
to be well worth the  speed penalty in my view.  Of course,
anticipating every such misstep imposes a heavy burden
on developers and constitutes the real cost of more elaborate
validity checking.

[My 2cents based on experience with SparseM.]

url:www.econ.uiuc.edu/~rogerRoger Koenker
email   [EMAIL PROTECTED]   Department of Economics
vox:217-333-4558University of Illinois
fax:217-244-6678Champaign, IL 61820

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] package:Matrix handling of data with identical indices

2006-07-09 Thread Thaden, John J
On Sunday, July 09, 2006 12:31 PM, Roger Koenker = RK
[EMAIL PROTECTED] wrote 

RK On 7/8/06, Thaden, John J [EMAIL PROTECTED] wrote:

JT As there is nothing inherent in either compressed, sparse,
JT format that would prevent recognition and handling of
JT duplicated index pairs, I'm curious why the dgCMatrix
JT class doesn't also add x values in those instances?

RK why not multiply them?  or take the larger one, 
RK or ...?  I would interpret this as a case of user
RK negligence -- there is no natural default behavior
RK for such cases.

This user created example data to illustrate his question, but
of course he faces real data, analytical chemical in this case,
data that happen to come with an 8.4% occurrence of non-unique
index pairs, and also, quite literally, with a natural way 
to treat cases (the ~nature~ of the assay makes it correct to
sum them).  I can think of other natural data sets where 
averaging would be the natural behavior. So you are right 
that there is no default natural behavior, thus, my 
suggestion to leave that to user choice via function argument
or class slot, defaulted to summing.

Actually in this case there ~is~ one behavior superior to 
summing -- abstracting one of the data pair (that share indices)
into a second (very sparse) overlay matrix.  Perhaps it is
my negligence not to have done this instead querying the list :-)
I am doing it now.

Regards,
-John Thaden 

RK On Jul 9, 2006, at 11:06 AM, Douglas Bates wrote:

  DB Your matrix Mc should be flagged as invalid.  Martin and I should
  DB discuss whether we want to add such a test to the validity method.
It
  DB is not difficult to add the test but there will be a penalty in
that
  DB it will slow down all operations on such matrices and I'm not sure
if
  DB we want to pay that price to catch a rather infrequently occuring
  DB problem.

RK Elaborating the validity procedure to flag such instances seems
RK to be well worth the  speed penalty in my view.  Of course,
RK anticipating every such misstep imposes a heavy burden
RK on developers and constitutes the real cost of more elaborate
RK validity checking.
RK
RK [My 2cents based on experience with SparseM.]

Confidentiality Notice: This e-mail message, including any a...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] package:Matrix handling of data with identical indices

2006-07-09 Thread Thaden, John J
Thanks. I see in the UMFPACK manual that the convention for csc matrices
for monotonic-increasing row indices is as you say.  

I notice UMFPACK flags errors liberally. What if Matrix or some other
package were made to interface with UMFPACK directly, to tap these and
other features?

-John Thaden

[EMAIL PROTECTED] wrote:

...
...
DB It is not permissible to have repeated indices in the compressed
form.  In the compressed form there is a well-defined ordering of the
indices, first by columns then by row within column and the row
indices must be increasing within columns.

DB Your matrix Mc should be flagged as invalid.  Martin and I should
discuss whether we want to add such a test to the validity method.  It
is not difficult to add the test but there will be a penalty in that
it will slow down all operations on such matrices and I'm not sure if
we want to pay that price to catch a rather infrequently occuring
problem.

DB There is some documentation of the internal representations of these
formats in the directory $R_LIB/Matrix/doc/UFSparse/.  The User Guides
in that directory are taken directly from the various sparse matrix
packages that Tim Davis at the University of Florida has written.  He
has a book that is scheduled for publication this  September

DB Tim Davis (2006), Direct Methods for Sparse Linear Systems, SIAM,
Philadelphia, PA

DB I hope we will be able to refer to that book for details of the
representation and algorithms.

On 7/8/06, Thaden, John J [EMAIL PROTECTED] wrote:
 In the Matrix package v. 0.995-11 I see that the dgTMatrix
 Class for compressed, sparse, triplet-form matrices handles
 Identically indexed data instances by summing their values,
 e.g.,

 library(Matrix)
 (Mt - new(dgTMatrix,
i = as.integer(c(0,0,1,1,4)),
j = as.integer(c(0,1,2,2,4)),
x = as.double(1:5),
Dim = as.integer(c(5,5
 ## 5 x 5 sparse Matrix of class dgTMatrix
 ## [1,] 1 2 . . .
 ## [2,] . . 7 . .--- 7 = 3 + 4.
 ## [3,] . . . . .
 ## [4,] . . . . .
 ## [5,] . . . . 5

 # If instead I make a dgCMatrix-class matrix, the first
 # instance is overwritten by the second, e.g.,

 library(Matrix)
 (Mc - new(dgCMatrix,
i = as.integer(c(0,0,1,1,4)),
p = as.integer(c(0,1,2,4,5)),
x = as.double(1:5),
Dim = as.integer(c(5,5
 ## 5 x 5 sparse Matrix of class dgCMatrix
 ##
 ## [1,] 1 2 . .
 ## [2,] . . 4 .   -- the datum '3' has been lost.
 ## [3,] . . . .
 ## [4,] . . . .
 ## [5,] . . . 5

 # If one arrives at the dgCMatrix via the dgTMatrix class,
 # the summed value is of course preserved, e.g.,

 (Mtc - as(Mt, dgCMatrix))
 ## 5 x 5 sparse Matrix of class dgCMatrix
 ##
 ## [1,] 1 2 . . .
 ## [2,] . . 7 . .
 ## [3,] . . . . .
 ## [4,] . . . . .
 ## [5,] . . . . 5

 As there is nothing inherent in either compressed, sparse,
 format that would prevent recognition and handling of
 duplicated index pairs, I'm curious why the dgCMatrix
 class doesn't also add x values in those instances?
 I wonder also if others might benefit also by being able
 to choose how these instances are handled, i.e.,
 whether they are summed, averaged or overwritten?

 -John Thaden, Ph.D.
 Research Assistant Professor of Geriatrics
 University of Arkansas for Medical Sciences
 Little Rock AR, USA


 Confidentiality Notice: This e-mail message, including any
attachments, is for the sole use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized
review, use, disclosure or distribution is prohibited.  If you are not
the intended recipient, please contact the sender by reply e-mail and
destroy all copies of the original message.



Confidentiality Notice: This e-mail message, including any a...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] package:Matrix handling of data with identical indices

2006-07-08 Thread Thaden, John J
In the Matrix package v. 0.995-11 I see that the dgTMatrix
Class for compressed, sparse, triplet-form matrices handles
Identically indexed data instances by summing their values,
e.g., 

library(Matrix)
(Mt - new(dgTMatrix, 
   i = as.integer(c(0,0,1,1,4)),
   j = as.integer(c(0,1,2,2,4)),
   x = as.double(1:5),
   Dim = as.integer(c(5,5
## 5 x 5 sparse Matrix of class dgTMatrix
## [1,] 1 2 . . .
## [2,] . . 7 . .--- 7 = 3 + 4.
## [3,] . . . . .
## [4,] . . . . .
## [5,] . . . . 5

# If instead I make a dgCMatrix-class matrix, the first
# instance is overwritten by the second, e.g.,

library(Matrix)
(Mc - new(dgCMatrix, 
   i = as.integer(c(0,0,1,1,4)),
   p = as.integer(c(0,1,2,4,5)),
   x = as.double(1:5),
   Dim = as.integer(c(5,5
## 5 x 5 sparse Matrix of class dgCMatrix
## 
## [1,] 1 2 . .
## [2,] . . 4 .   -- the datum '3' has been lost.
## [3,] . . . .
## [4,] . . . .
## [5,] . . . 5 

# If one arrives at the dgCMatrix via the dgTMatrix class,
# the summed value is of course preserved, e.g.,

(Mtc - as(Mt, dgCMatrix))
## 5 x 5 sparse Matrix of class dgCMatrix
##   
## [1,] 1 2 . . .
## [2,] . . 7 . .
## [3,] . . . . .
## [4,] . . . . .
## [5,] . . . . 5

As there is nothing inherent in either compressed, sparse,
format that would prevent recognition and handling of
duplicated index pairs, I'm curious why the dgCMatrix
class doesn't also add x values in those instances?
I wonder also if others might benefit also by being able
to choose how these instances are handled, i.e.,
whether they are summed, averaged or overwritten?  

-John Thaden, Ph.D.
Research Assistant Professor of Geriatrics
University of Arkansas for Medical Sciences
Little Rock AR, USA


Confidentiality Notice: This e-mail message, including any a...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html