What is the data type for an expression value? Is it assumed that double precision will be needed?
On Fri, Feb 24, 2017 at 4:50 PM, Aaron Lun <a...@wehi.edu.au> wrote: > It's a good place to start, though it would be very handy to have a C(++) > API that can be linked against. I'm not sure how much work that would > entail but it would give downstream developers a lot more options. Sort of > like how we can link to Rhtslib, which speeds up a lot of BAM file > processing, instead of just relying on Rsamtools. > > > -Aaron > > ________________________________ > From: Tim Triche, Jr. <tim.tri...@gmail.com> > Sent: Saturday, 25 February 2017 8:34:58 AM > To: Aaron Lun > Cc: bioc-devel@r-project.org > Subject: Re: [Bioc-devel] any interest in a BiocMatrix core package? > > yes > > the DelayedArray framework that handles HDF5Array, etc. seems like the > right choice? > > --t > > On Fri, Feb 24, 2017 at 1:26 PM, Aaron Lun <a...@wehi.edu.au<mailto:alun@ > wehi.edu.au>> wrote: > Hi everyone, > > I just attended the Human Cell Atlas meeting in Stanford, and people were > talking about gene expression matrices for >1 million cells. If we assume > that we can get non-zero expression profiles for ~5000 genes, we’d be > talking about a 5000 x 1 million matrix for the raw count data. This would > be 20-40 GB in size, which would clearly benefit from sparse (via Matrix) > or disk-backed representations (bigmatrix, BufferedMatrix, rhdf5, etc.). > > I’m wondering whether there is any appetite amongst us for making a > consistent BioC API to handle these matrices, sort of like what > BiocParallel does for multicore and snow. It goes without saying that the > different matrix representations should have consistent functions at the R > level (rbind/cbind, etc.) but it would also be nice to have an integrated > C/C++ API (accessible via LinkedTo). There’s many non-trivial things that > can be done with this type of data, and it is often faster and more memory > efficient to do these complex operations in compiled code. > > I was thinking of something that you could supply any supported matrix > representation to a registered function via .Call; the C++ constructor > would recognise the type of matrix during class instantiation; and > operations (row/column/random read access, also possibly various ways of > writing a matrix) would be overloaded and behave as required for the class. > Only the implementation of the API would need to care about the nitty > gritty of each representation, and we would all be free to write code that > actually does the interesting analytical stuff. > > Anyway, just throwing some thoughts out there. Any comments appreciated. > > Cheers, > > Aaron > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel