Thanks for the illustration of xtabs. 

A quibble: Doesn't the following work, substituting as.matrix() for 
matrix()? 
(Does seem to conserve the dimensions and dimension names.) 

matrify<-function(datatable, formula = units~site+spp, relativize=F){
  tbl<-xtabs(formula,data=datatable)
  mx <-as.matrix(tbl)
  if (relativize==T) {mx<-mx/rowSums(mx)}
  return(mx)
}




"Christian A. Parker" <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED]
10/07/2008 11:04 AM

To
"ONKELINX, Thierry" <[EMAIL PROTECTED]>
cc
r-sig-ecology@r-project.org
Subject
Re: [R-sig-eco] Clustering large data






This method for converting long to wide format seems to work well with 
pretty large datasets and it uses only base functions.

# this function will return a site*species matrix
# based on the formula variable. Data does not need 
# to be grouped, the xtabs function will take care of
# summing any rows that are equal according to the 
# formula.
### units are the cell value
### site is the row value
### spp is the column value
matrify<-function(datatable, formula = units~site+spp, relativize=F){
  tbl<-xtabs(formula,data=datatable)
  mx<-matrix(tbl,ncol=ncol(tbl))
  colnames(mx)<-colnames(tbl)
  rownames(mx)<-rownames(tbl)
  if (relativize==T) {mx<-mx/rowSums(mx)}
  return(mx)
}



ONKELINX, Thierry wrote:
> Dear all,
>
> We have a problem with a large dataset that we want to cluster. The
> dataset is in a long format: 1154024 rows with presence data. Each row
> has the name of the species and the location. We have 1381 species and
> 6354 locations.
> The main problem is that we need the data in wide format (one row for
> each location, one column for each species) for the clustering
> algorithms. But the 6354 x 1381 dataframe is too big to fit into the
> memory. At least when we use cast from the reshape package to convert
> the dataframe from a long to a wide format.
>
> Are there any clustering tools available that can work with the data in
> a long format or with sparse matrices (only 13% of the matrix is
> non-zero)? If the work with sparse matrices: how to convert our dataset
> to a sparse matrix? Other suggestions are welcome.
>
> We are working with R 2.7.2 on WinXP with 2 GB RAM. --max-mem-size is
> set to 2047M.
>
> Thanks,
>
> Thierry
>
>
> ------------------------------------------------------------------------
> ----
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and Forest
> Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
> methodology and quality assurance
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium 
> tel. + 32 54/436 185
> [EMAIL PROTECTED] 
> www.inbo.be 
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to
> say what the experiment died of.
> ~ Sir Ronald Aylmer Fisher
>
> The plural of anecdote is not data.
> ~ Roger Brinner
>
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of
> data.
> ~ John Tukey
>
> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver 
weer 
> en binden het INBO onder geen enkel beding, zolang dit bericht niet 
bevestigd is
> door een geldig ondertekend document. The views expressed in  this 
message 
> and any annex are purely those of the writer and may not be regarded as 
stating 
> an official position of INBO, as long as the message is not confirmed by 
a duly 
> signed document.
>
> _______________________________________________
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>
>

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Reply via email to