[R] help usin scan on large matrix (caveats to what has been discussed before)

2010-08-12 Thread Martin Tomko

Dear all,
I have a few points that I am unsure about using scan. I know that it is 
covered in the intro to R, and also has been discussed here: 
http://www.mail-archive.com/r-help@r-project.org/msg04869.html

but nevertheless, I cannot get it to work.

I have a potentially very large matrix that I need to read in (35MB). I 
am about to run it on a server with 16G of memory etc, so I hope it will 
work. I ultimately only need to run image() on it, producing a heatmap.


read.table crashes on it, and is slow, so I would like to read it using 
scan.


The file where I store it has the following format:
V1 V2 V3 V4 V5
1 508 424 208 111 66
2 59 101 95 113 81
3 26 30 24 17 18
4 4 0 8 3 9
5 0 0 0 0 0
6 0 0 0 0 0

where the first line are column names, the first column rownames. 
read.table works perfectly without any parameters on this (the file has 
been output using write.table). I use:

rows-length(R)
cols - max(unlist(lapply(R,function(x) length(unlist(gregexpr( 
,x,fixed=TRUE,useBytes=TRUE))


c-scan(file=f,what=list(c(,(rep(integer(0),cols, skip=1)
m-matrix(c, nrow = rows, ncol=cols,byrow=TRUE);

for some reason I end up with a character matrix, which I don't want. Is 
this the proper way to skip the first column (this is not documented 
anywhere - how does one skip the first column in scan???). is my way of 
specifying integer(0) correct?


And finally - would any sparse matrix package be more appropriate, and 
can I use a sparse matrix for the image() function producing typical 
heat,aps? I have seen that some sparse matrix packages produce different 
looking outputs, which would not be appropriate.


Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help usin scan on large matrix (caveats to what has been discussed before)

2010-08-12 Thread peter dalgaard

On Aug 12, 2010, at 11:30 AM, Martin Tomko wrote:

 
 c-scan(file=f,what=list(c(,(rep(integer(0),cols, skip=1)
 m-matrix(c, nrow = rows, ncol=cols,byrow=TRUE);
 
 for some reason I end up with a character matrix, which I don't want. Is this 
 the proper way to skip the first column (this is not documented anywhere - 
 how does one skip the first column in scan???). is my way of specifying 
 integer(0) correct?

No. Well, integer(0) is just superfluous where 0L would do, since scan only 
looks at the types not the contents, but more importantly, what= wants a list 
of as many elements as there are columns and you gave it 

 list(c(,(rep(integer(0),5
[[1]]
[1] 

I think what you actually meant was

c(list(NULL),rep(list(0L),5))



 
 And finally - would any sparse matrix package be more appropriate, and can I 
 use a sparse matrix for the image() function producing typical heat,aps? I 
 have seen that some sparse matrix packages produce different looking outputs, 
 which would not be appropriate.
 
 Thanks
 Martin
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help usin scan on large matrix (caveats to what has been discussed before)

2010-08-12 Thread Martin Tomko

Hi Peter,
thank you for your reply. I still cannot get it to work.
I have modified your code as follows:
rows-length(R)
cols - max(unlist(lapply(R,function(x) length(unlist(gregexpr( 
,x,fixed=TRUE,useBytes=TRUE))

c-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1),rows-1)), skip=1)
m-matrix(c, nrow = rows-1, ncol=cols+1,byrow=TRUE);

the list c seems ok, with all the values I would expect. Still, 
length(c) gives me a value = cols+1, which I find odd (I would expect 
=cols).
I thine repeated it rows-1 times (to account for the header row). The 
values seem ok.
Anyway, I tried to construct the matrix, but when I print it, the values 
are odd:

 m[1:10,1:10]
  [,1] [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
 [1,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 
Integer,15
 [2,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 
Integer,15
 [3,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 
Integer,15
 [4,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 
Integer,15
 [5,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 
Integer,15
 [6,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 
Integer,15
 [7,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 
Integer,15
 [8,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 
Integer,15
 [9,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 
Integer,15

[10,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15


Any idea where the values are gone?
Thanks
Martin

Hence, I filled it into the matrix of dimensions

On 8/12/2010 12:24 PM, peter dalgaard wrote:

On Aug 12, 2010, at 11:30 AM, Martin Tomko wrote:

   

c-scan(file=f,what=list(c(,(rep(integer(0),cols, skip=1)
m-matrix(c, nrow = rows, ncol=cols,byrow=TRUE);

for some reason I end up with a character matrix, which I don't want. Is this the proper 
way to skip the first column (this is not documented anywhere - how does one skip the 
first column in scan???). is my way of specifying integer(0) correct?
 

No. Well, integer(0) is just superfluous where 0L would do, since scan only 
looks at the types not the contents, but more importantly, what= wants a list 
of as many elements as there are columns and you gave it

   

list(c(,(rep(integer(0),5
 

[[1]]
[1] 

I think what you actually meant was

c(list(NULL),rep(list(0L),5))



   

And finally - would any sparse matrix package be more appropriate, and can I 
use a sparse matrix for the image() function producing typical heat,aps? I have 
seen that some sparse matrix packages produce different looking outputs, which 
would not be appropriate.

Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
 
   



--
Martin Tomko
Postdoctoral Research Assistant

Geographic Information Systems Division
Department of Geography
University of Zurich - Irchel
Winterthurerstr. 190
CH-8057 Zurich, Switzerland

email:  martin.to...@geo.uzh.ch
site:   http://www.geo.uzh.ch/~mtomko
mob:+41-788 629 558
tel:+41-44-6355256
fax:+41-44-6356848

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help usin scan on large matrix (caveats to what has been discussed before)

2010-08-12 Thread baptiste AuguiƩ
Hi,

I don't know if this can be useful to you, but I recently wrote a small 
function to read a large datafile like yours in a number of steps, with the 
possibility to save each intermediate block as .Rdata. This is based on 
read.table --- not as efficient as lower-level scan() but it might be good 
enough,

file - 'test.txt'
## write.table(matrix(rnorm(1e6*14), ncol=14), file=file,row.names = F,
## col.names = F )

n - as.numeric(gsub([^0123456789],, system(paste(wc -l , file), 
int=TRUE)))
n

blocks - function(n=18, size=5){
res - c(replicate(n%/%size, size))
if(n%%size) res - c(res, n%%size)
if(!sum(res) == n) stop(ERROR!!!)
res
}
## blocks(1003, 500)


readBlocks - function(file, nbk=1e5, out=tmp, save.inter=TRUE, 
   classes= c(numeric, numeric, rep(NULL, 6),
 numeric, numeric, rep(NULL, 4))){
  
  n - as.numeric(gsub([^0123456789],, system(paste(wc -l , file), 
int=TRUE)))

  ncols - length(grep(NULL, classes, invert=TRUE))
  results - matrix(0, nrow=n, ncol=ncols)
  Nb - blocks(n, nbk)
  skip - c(0, cumsum(Nb))
  for(ii in seq_along(Nb)){
d - read.table(file, colClasses = classes, nrows=Nb[ii], skip=skip[ii], 
comment.char = )
if(save.inter){
  save(d, file=paste(out, ., ii, .rda, sep=))
  }
print(ii)
results[seq(1+skip[ii], skip[ii]+Nb[ii]), ] - as.matrix(d)
rm(d) ; gc() 
  }
  save(results, file=paste(out, .rda, sep=))
  invisible(results)
}

## test - readBlocks(file)

HTH,

baptiste



On Aug 12, 2010, at 1:34 PM, Martin Tomko wrote:

 Hi Peter,
 thank you for your reply. I still cannot get it to work.
 I have modified your code as follows:
 rows-length(R)
 cols - max(unlist(lapply(R,function(x) length(unlist(gregexpr( 
 ,x,fixed=TRUE,useBytes=TRUE))
 c-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1),rows-1)), skip=1)
 m-matrix(c, nrow = rows-1, ncol=cols+1,byrow=TRUE);
 
 the list c seems ok, with all the values I would expect. Still, length(c) 
 gives me a value = cols+1, which I find odd (I would expect =cols).
 I thine repeated it rows-1 times (to account for the header row). The values 
 seem ok.
 Anyway, I tried to construct the matrix, but when I print it, the values are 
 odd:
  m[1:10,1:10]
  [,1] [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
 [1,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [2,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [3,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [4,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [5,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [6,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [7,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [8,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [9,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [10,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 
 
 Any idea where the values are gone?
 Thanks
 Martin
 
 Hence, I filled it into the matrix of dimensions
 
 On 8/12/2010 12:24 PM, peter dalgaard wrote:
 On Aug 12, 2010, at 11:30 AM, Martin Tomko wrote:
 
   
 c-scan(file=f,what=list(c(,(rep(integer(0),cols, skip=1)
 m-matrix(c, nrow = rows, ncol=cols,byrow=TRUE);
 
 for some reason I end up with a character matrix, which I don't want. Is 
 this the proper way to skip the first column (this is not documented 
 anywhere - how does one skip the first column in scan???). is my way of 
 specifying integer(0) correct?
 
 No. Well, integer(0) is just superfluous where 0L would do, since scan only 
 looks at the types not the contents, but more importantly, what= wants a 
 list of as many elements as there are columns and you gave it
 
   
 list(c(,(rep(integer(0),5
 
 [[1]]
 [1] 
 
 I think what you actually meant was
 
 c(list(NULL),rep(list(0L),5))
 
 
 
   
 And finally - would any sparse matrix package be more appropriate, and can 
 I use a sparse matrix for the image() function producing typical heat,aps? 
 I have seen that some sparse matrix packages produce different looking 
 outputs, which would not be appropriate.
 
 Thanks
 Martin
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
   
 
 
 -- 
 Martin Tomko
 Postdoctoral Research Assistant
 
 Geographic Information Systems Division
 Department of Geography
 University of Zurich - Irchel
 Winterthurerstr. 190
 CH-8057 Zurich, Switzerland
 
 email:martin.to...@geo.uzh.ch
 site: http://www.geo.uzh.ch/~mtomko
 mob:  +41-788 629 558
 tel:  +41-44-6355256
 fax:  +41-44-6356848
 
 

Re: [R] help usin scan on large matrix (caveats to what has been discussed before)

2010-08-12 Thread Martin Tomko

Hi baptiste,
thanks a lot. Could you please comment on that code, I cannto figure out 
what it does. Appart from the file name, what parameters does it need? 
Seems to me like you need to know the size of the table a priori. Is 
that right? Do you have to set up the block size depending on that (so 
that you get full multiples of the block to form the resulting frame)?

Cheers
Martin

On 8/12/2010 2:45 PM, baptiste AuguiƩ wrote:

Hi,

I don't know if this can be useful to you, but I recently wrote a small 
function to read a large datafile like yours in a number of steps, with the 
possibility to save each intermediate block as .Rdata. This is based on 
read.table --- not as efficient as lower-level scan() but it might be good 
enough,

file- 'test.txt'
## write.table(matrix(rnorm(1e6*14), ncol=14), file=file,row.names = F,
## col.names = F )

n- as.numeric(gsub([^0123456789],, system(paste(wc -l , file), 
int=TRUE)))
n

blocks- function(n=18, size=5){
res- c(replicate(n%/%size, size))
if(n%%size) res- c(res, n%%size)
if(!sum(res) == n) stop(ERROR!!!)
res
}
## blocks(1003, 500)


readBlocks- function(file, nbk=1e5, out=tmp, save.inter=TRUE,
classes= c(numeric, numeric, rep(NULL, 6),
  numeric, numeric, rep(NULL, 4))){

   n- as.numeric(gsub([^0123456789],, system(paste(wc -l , file), 
int=TRUE)))

   ncols- length(grep(NULL, classes, invert=TRUE))
   results- matrix(0, nrow=n, ncol=ncols)
   Nb- blocks(n, nbk)
   skip- c(0, cumsum(Nb))
   for(ii in seq_along(Nb)){
 d- read.table(file, colClasses = classes, nrows=Nb[ii], skip=skip[ii], comment.char 
= )
 if(save.inter){
   save(d, file=paste(out, ., ii, .rda, sep=))
   }
 print(ii)
 results[seq(1+skip[ii], skip[ii]+Nb[ii]), ]- as.matrix(d)
 rm(d) ; gc()
   }
   save(results, file=paste(out, .rda, sep=))
   invisible(results)
}

## test- readBlocks(file)

HTH,

baptiste



On Aug 12, 2010, at 1:34 PM, Martin Tomko wrote:

   

Hi Peter,
thank you for your reply. I still cannot get it to work.
I have modified your code as follows:
rows-length(R)
cols- max(unlist(lapply(R,function(x) length(unlist(gregexpr( 
,x,fixed=TRUE,useBytes=TRUE))
c-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1),rows-1)), skip=1)
m-matrix(c, nrow = rows-1, ncol=cols+1,byrow=TRUE);

the list c seems ok, with all the values I would expect. Still, length(c) gives 
me a value = cols+1, which I find odd (I would expect =cols).
I thine repeated it rows-1 times (to account for the header row). The values 
seem ok.
Anyway, I tried to construct the matrix, but when I print it, the values are 
odd:
 

m[1:10,1:10]
   

  [,1] [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
[1,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[2,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[3,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[4,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[5,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[6,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[7,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[8,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[9,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[10,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15


Any idea where the values are gone?
Thanks
Martin

Hence, I filled it into the matrix of dimensions

On 8/12/2010 12:24 PM, peter dalgaard wrote:
 

On Aug 12, 2010, at 11:30 AM, Martin Tomko wrote:


   

c-scan(file=f,what=list(c(,(rep(integer(0),cols, skip=1)
m-matrix(c, nrow = rows, ncol=cols,byrow=TRUE);

for some reason I end up with a character matrix, which I don't want. Is this the proper 
way to skip the first column (this is not documented anywhere - how does one skip the 
first column in scan???). is my way of specifying integer(0) correct?

 

No. Well, integer(0) is just superfluous where 0L would do, since scan only 
looks at the types not the contents, but more importantly, what= wants a list 
of as many elements as there are columns and you gave it


   

list(c(,(rep(integer(0),5

 

[[1]]
[1] 

I think what you actually meant was

c(list(NULL),rep(list(0L),5))




   

And finally - would any sparse matrix package be more appropriate, and can I 
use a sparse matrix for the image() function producing typical heat,aps? I have 
seen that some sparse matrix packages produce different looking outputs, which 
would not be appropriate.

Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, 

Re: [R] help usin scan on large matrix (caveats to what has been discussed before)

2010-08-12 Thread peter dalgaard

On Aug 12, 2010, at 1:34 PM, Martin Tomko wrote:

 Hi Peter,
 thank you for your reply. I still cannot get it to work.
 I have modified your code as follows:
 rows-length(R)
 cols - max(unlist(lapply(R,function(x) length(unlist(gregexpr( 
 ,x,fixed=TRUE,useBytes=TRUE))

Notice that the above is completely useless to the reader unless you tell us 
what R is (except for a statistical programming language ;-)) 

 c-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1),rows-1)), skip=1)

What's the outer rep() and rows-1 doing in there???! Notice that the 
parentheses don't match up as I think you think they do, so there's really only 
one argument to rep(), making it a no-op. The rows-1 is going inside the c, 
which might be causing the apparent extra column. And the number of rows should 
not affect 'what=' anyway. Now if you had done what I wrote...

 m-matrix(c, nrow = rows-1, ncol=cols+1,byrow=TRUE);

If you make a matrix from a list, odd things will happen. You need an 
unlist(c). And more than likely NOT byrow=TRUE. However, I think 
do.call(cbind,c) should do the trick more easily. 

 
 the list c seems ok, with all the values I would expect. Still, length(c) 
 gives me a value = cols+1, which I find odd (I would expect =cols).
 I thine repeated it rows-1 times (to account for the header row). The values 
 seem ok.
 Anyway, I tried to construct the matrix, but when I print it, the values are 
 odd:
  m[1:10,1:10]
  [,1] [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
 [1,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [2,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [3,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [4,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [5,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [6,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [7,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [8,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [9,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 [10,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
 
 
 Any idea where the values are gone?
 Thanks
 Martin
 
 Hence, I filled it into the matrix of dimensions
 
 On 8/12/2010 12:24 PM, peter dalgaard wrote:
 On Aug 12, 2010, at 11:30 AM, Martin Tomko wrote:
 
   
 c-scan(file=f,what=list(c(,(rep(integer(0),cols, skip=1)
 m-matrix(c, nrow = rows, ncol=cols,byrow=TRUE);
 
 for some reason I end up with a character matrix, which I don't want. Is 
 this the proper way to skip the first column (this is not documented 
 anywhere - how does one skip the first column in scan???). is my way of 
 specifying integer(0) correct?
 
 No. Well, integer(0) is just superfluous where 0L would do, since scan only 
 looks at the types not the contents, but more importantly, what= wants a 
 list of as many elements as there are columns and you gave it
 
   
 list(c(,(rep(integer(0),5
 
 [[1]]
 [1] 
 
 I think what you actually meant was
 
 c(list(NULL),rep(list(0L),5))
 
 
 
   
 And finally - would any sparse matrix package be more appropriate, and can 
 I use a sparse matrix for the image() function producing typical heat,aps? 
 I have seen that some sparse matrix packages produce different looking 
 outputs, which would not be appropriate.
 
 Thanks
 Martin
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
   
 
 
 -- 
 Martin Tomko
 Postdoctoral Research Assistant
 
 Geographic Information Systems Division
 Department of Geography
 University of Zurich - Irchel
 Winterthurerstr. 190
 CH-8057 Zurich, Switzerland
 
 email:martin.to...@geo.uzh.ch
 site: http://www.geo.uzh.ch/~mtomko
 mob:  +41-788 629 558
 tel:  +41-44-6355256
 fax:  +41-44-6356848
 

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help usin scan on large matrix (caveats to what has been discussed before)

2010-08-12 Thread Martin Tomko

Hi Peter,
apologies, too fast copying and pasting.
So, here is the explanation:
f-C:/test/mytab.txt;
R-readLines(con=f);

where mytab.txt is a table formatted as noted in previous post (space 
delimited, with header, rownames, containing integers).


Now, my understandign of scan was that I have to specify the FULL number 
of values in it (examples specify things like 200*2000 for a matrix 
etc). That's why I thought that I need to do cols*rows as well. Avoiding 
the first line with headers is simple, avoiding the first column is not 
- hence my questions.
Sorry, the corrected, matching parentheses are here - why did the 
previous execute is a wonder...

c-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1)),rows-1), skip=1)
here, my reasoning was:

* c(list(NULL),rep(list(0L),cols-1)) specifies a template for any line 
(first elelement to be ignored = NULL, it is a string in the table 
specified, and then a repetition of integers - I am still not sure how 
you derived 0L, and what it means and where to find a doc for that.);
* the previous needs to be repeated rows-1 times, hence 
what=rep(c(list(NULL),rep(list(0L),cols-1)),rows-1)


I do nto understand the following:

 You need an unlist(c). And more than likely NOT byrow=TRUE. However, I think 
do.call(cbind,c) should do the trick more easily.

what will unlist(c) do; why should it not be bywrow=TRUE, and how would 
you go about integrating do.call(cbind,c) with matrix. Apologies to 
naive questions, I am a newbie, in principle.


Cheers
Martin




On 8/12/2010 4:29 PM, peter dalgaard wrote:

On Aug 12, 2010, at 1:34 PM, Martin Tomko wrote:

   

Hi Peter,
thank you for your reply. I still cannot get it to work.
I have modified your code as follows:
rows-length(R)
cols- max(unlist(lapply(R,function(x) length(unlist(gregexpr( 
,x,fixed=TRUE,useBytes=TRUE))
 

Notice that the above is completely useless to the reader unless you tell us 
what R is (except for a statistical programming language ;-))

   

c-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1),rows-1)), skip=1)
 

What's the outer rep() and rows-1 doing in there???! Notice that the 
parentheses don't match up as I think you think they do, so there's really only 
one argument to rep(), making it a no-op. The rows-1 is going inside the c, 
which might be causing the apparent extra column. And the number of rows should 
not affect 'what=' anyway. Now if you had done what I wrote...

   

m-matrix(c, nrow = rows-1, ncol=cols+1,byrow=TRUE);
 

If you make a matrix from a list, odd things will happen. You need an 
unlist(c). And more than likely NOT byrow=TRUE. However, I think 
do.call(cbind,c) should do the trick more easily.

   

the list c seems ok, with all the values I would expect. Still, length(c) gives 
me a value = cols+1, which I find odd (I would expect =cols).
I thine repeated it rows-1 times (to account for the header row). The values 
seem ok.
Anyway, I tried to construct the matrix, but when I print it, the values are 
odd:
 

m[1:10,1:10]
   

  [,1] [,2]   [,3]   [,4]   [,5]   [,6]   [,7]
[1,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[2,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[3,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[4,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[5,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[6,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[7,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[8,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[9,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15
[10,] NULL Integer,15 Integer,15 Integer,15 Integer,15 Integer,15 Integer,15


Any idea where the values are gone?
Thanks
Martin

Hence, I filled it into the matrix of dimensions

On 8/12/2010 12:24 PM, peter dalgaard wrote:
 

On Aug 12, 2010, at 11:30 AM, Martin Tomko wrote:


   

c-scan(file=f,what=list(c(,(rep(integer(0),cols, skip=1)
m-matrix(c, nrow = rows, ncol=cols,byrow=TRUE);

for some reason I end up with a character matrix, which I don't want. Is this the proper 
way to skip the first column (this is not documented anywhere - how does one skip the 
first column in scan???). is my way of specifying integer(0) correct?

 

No. Well, integer(0) is just superfluous where 0L would do, since scan only 
looks at the types not the contents, but more importantly, what= wants a list 
of as many elements as there are columns and you gave it


   

list(c(,(rep(integer(0),5

 

[[1]]
[1] 

I think what you actually meant was

c(list(NULL),rep(list(0L),5))




   

And finally - would any sparse matrix package be more appropriate, and can I 
use a sparse matrix for the image() function producing typical heat,aps? I have 
seen 

Re: [R] help usin scan on large matrix (caveats to what has been discussed before)

2010-08-12 Thread Peter Dalgaard
Martin Tomko wrote:
 Hi Peter,
 apologies, too fast copying and pasting.
 So, here is the explanation:
 f-C:/test/mytab.txt;
 R-readLines(con=f);
 
 where mytab.txt is a table formatted as noted in previous post (space 
 delimited, with header, rownames, containing integers).
 
 Now, my understandign of scan was that I have to specify the FULL number 
 of values in it (examples specify things like 200*2000 for a matrix 
 etc). That's why I thought that I need to do cols*rows as well. Avoiding 
 the first line with headers is simple, avoiding the first column is not 
 - hence my questions.
 Sorry, the corrected, matching parentheses are here - why did the 
 previous execute is a wonder...
 c-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1)),rows-1), skip=1)
 here, my reasoning was:
 
 * c(list(NULL),rep(list(0L),cols-1)) specifies a template for any line 
 (first elelement to be ignored = NULL, it is a string in the table 
 specified, and then a repetition of integers - I am still not sure how 
 you derived 0L, and what it means and where to find a doc for that.);
 * the previous needs to be repeated rows-1 times, hence 
 what=rep(c(list(NULL),rep(list(0L),cols-1)),rows-1)
 
 I do nto understand the following:
 
   You need an unlist(c). And more than likely NOT byrow=TRUE. However, I 
 think do.call(cbind,c) should do the trick more easily.
 
 what will unlist(c) do; why should it not be bywrow=TRUE, and how would 
 you go about integrating do.call(cbind,c) with matrix. Apologies to 
 naive questions, I am a newbie, in principle.
 

At this point I think you need to actually try my suggestions, and maybe
read the documentation again. Explaining how you have misunderstood the
documentation is not going to help...

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help usin scan on large matrix (caveats to what has been discussed before)

2010-08-12 Thread Martin Tomko
I did. Did not work. Did you try your code? The matrix did not result into
integer numbers as expected. MY approach resulted in a correct scan
result, at least.

M.
 Martin Tomko wrote:
 Hi Peter,
 apologies, too fast copying and pasting.
 So, here is the explanation:
 f-C:/test/mytab.txt;
 R-readLines(con=f);

 where mytab.txt is a table formatted as noted in previous post (space
 delimited, with header, rownames, containing integers).

 Now, my understandign of scan was that I have to specify the FULL number
 of values in it (examples specify things like 200*2000 for a matrix
 etc). That's why I thought that I need to do cols*rows as well. Avoiding
 the first line with headers is simple, avoiding the first column is not
 - hence my questions.
 Sorry, the corrected, matching parentheses are here - why did the
 previous execute is a wonder...
 c-scan(file=f,what=rep(c(list(NULL),rep(list(0L),cols-1)),rows-1),
 skip=1)
 here, my reasoning was:

 * c(list(NULL),rep(list(0L),cols-1)) specifies a template for any line
 (first elelement to be ignored = NULL, it is a string in the table
 specified, and then a repetition of integers - I am still not sure how
 you derived 0L, and what it means and where to find a doc for that.);
 * the previous needs to be repeated rows-1 times, hence
 what=rep(c(list(NULL),rep(list(0L),cols-1)),rows-1)

 I do nto understand the following:

   You need an unlist(c). And more than likely NOT byrow=TRUE. However, I
 think do.call(cbind,c) should do the trick more easily.

 what will unlist(c) do; why should it not be bywrow=TRUE, and how would
 you go about integrating do.call(cbind,c) with matrix. Apologies to
 naive questions, I am a newbie, in principle.


 At this point I think you need to actually try my suggestions, and maybe
 read the documentation again. Explaining how you have misunderstood the
 documentation is not going to help...

 --
 Peter Dalgaard
 Center for Statistics, Copenhagen Business School
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.