Davide Cittaro wrote:
> Actually I was having issues of reading stdin, but I found that was not 
> related to wigToBigWig.
> I'm interested in memory issues, of course, so please keep us updated :-)

Good Afternoon Davide:

I tried a couple of different encodings.  The memory consumed depends upon the 
type
of input.  I worked with the phyloP data for the 46-way vertebrate track on 
hg19,
which is a data set that covers 2,845,303,719 bases of hg19.

A worst case is a variableStep wiggle file, where the coordinates specified
happen to be consecutive.  Normally the best encoding for this would be 
fixedStep.

This phyloP data set, when used in its original fixedStep ascii encoding,
consumes 32 Gb of memory with wigToBigWig in 35 minutes of running time.
When that data is in variableStep format, the wigToBigWig consumes 60 Gb of 
memory
in 2 hours 20 minutes run time.  When that data is in bedGraph
format, the bedGraphToBigWig converter consumes 3 Gb of memory for 1 hour 40 
minutes
run time.

As an aside, using that bedGraph file as an ordinary bed file, the bedToBigBed
converter consumes 19 Gb of memory in 1 hour 15 minutes run time to produce a 
big bed file.

--Hiram

File sizes, input files:

hg19.phyloP.wig.fixedStep.txt - 17 Gb
hg19.phyloP.wig.variableStep.txt - 42 Gb
hg19.phyloP.bedGraph - 71 Gb

Resulting converted files:

hg19.phyloP.from.fixedStep.bw - 8.2 Gb
hg19.phyloP.from.variableStep.bw - 14 Gb
hg19.phyloP.from.bedGraph.bw - 15 Gb

hg19.phyloP.from.bedGraph.bb - 14 Gb

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to