Hi Sonja,

Here's some information regarding your recent question I received from 
one of our ENCODE staff:

The contents of the downloads-only files are left to the discretion of 
the contributing laboratories. In this case, the contributor is Barbara 
Wold of Caltech. You may contact Georgi Marinov <[email protected] 
<mailto:[email protected]>> about the specifics of the files. To aide 
you in your communication with the lab, below is a key to what the Wold 
lab calls RawData5, 6, and 7.

RawData5 =  final.rpkm
RawData6 = gencode_exon
RawData7 = accepted.rpkm

The Wold lab has currently submitted data for hg19 which is not yet 
reviewed for public release. The downloads that they are generating now 
are listed here. You may wish to inquire about these data as well.

Expression Estimates and Transcript Models (Cufflinks):

.junctions - BED12 file containing TopHat-defined splice junctions.
GeneModel.gtf - a gtf file containing gene models produced by Cufflinks 
in de novo mode.
GeneDeNovo.fpkm - FPKM expression level estimates at the gene level for 
de novo assembled transcripts. FPKM (Fragments Per Kilobase per Million 
reads, where a fragment is defined as the nucleic acid fragment from 
which reads originate, and a pair of reads is counted as one fragment) 
is a metric analogous to the widley used RPKM (Reads Per Kilobase per 
Million reads), which normalizes against both transcript length and 
sequencing depth.
TranscriptDeNovo.fpkm - FPKM expression level estimates at the 
transcript level for de novo assembled transcripts.
GeneGencV3c.fpkm - FPKM expression level estimates at the gene level for 
the GENCODE CRCh37.v3c annotation.
TranscriptGencV3c.fpkm - FPKM expression level estimates at the 
transcript level for the GENCODE CRCh37.v3c annotation.
GeneGencV4.fpkm - FPKM expression level estimates at the gene level for 
the GENCODE CRCh37.v4 annotation.
TranscriptGencV4.fpkm - FPKM expression level estimates at the 
transcript level for the GENCODE CRCh37.v4 annotation.


I hope that is helpful.  If you have an additional questions, please 
contact us again at [email protected]


-
Greg Roe
UCSC Genome Bioinformatics Group


On 2/28/11 1:50 AM, Sonja Althammer wrote:
> Good morning,
>
> I wanted to download the RPKM-files from the RNA-Seq experiment in the
> cell-line GM12878 (ENCODE).
> Surprisingly I found 3 files that differ only in view=data5, data6 or data7.
> What is this supposed to mean? and how can I treat them? Are they supposed
> to be combined? But why are they in different files then?
> Below you see the links..
>
> Thanks a lot in advance and have a nice day!
> Sonja
>
>
> http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeCaltechRnaSeq/
>
>    2009-12-06
> wgEncodeCaltechRnaSeqRawData5Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz<http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqRawData5Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz>
>    538K  2009-03-06  dataType=RnaSeq; cell=GM12878; rnaExtract=longPolyA;
> localization=cell; replicate=1; subId=266; dataVersion=ENCODE Feb 2009
> Freeze; grant=Myers; lab=Caltech; labVersion=erange3.0.1;
> mapAlgorithm=erng3; view=RawData5; type=rpkm; insertLength=200;
> readType=2x75   2009-12-06
> wgEncodeCaltechRnaSeqRawData6Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz<http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqRawData6Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz>
>    3.9M  2009-03-06  dataType=RnaSeq; cell=GM12878; rnaExtract=longPolyA;
> localization=cell; replicate=1; subId=266; dataVersion=ENCODE Feb 2009
> Freeze; grant=Myers; lab=Caltech; labVersion=erange3.0.1;
> mapAlgorithm=erng3; view=RawData6; type=rpkm; insertLength=200;
> readType=2x75   2009-12-06
> wgEncodeCaltechRnaSeqRawData7Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz<http://hgdownload.cse.ucsc.edu/goldenPath/hg18/encodeDCC/wgEncodeCaltechRnaSeq/wgEncodeCaltechRnaSeqRawData7Rep1Gm12878CellLongpolyaErng32x75.rpkm.gz>
>    219K  2009-03-06  dataType=RnaSeq; cell=GM12878; rnaExtract=longPolyA;
> localization=cell; replicate=1; subId=266; dataVersion=ENCODE Feb 2009
> Freeze; grant=Myers; lab=Caltech; labVersion=erange3.0.1;
> mapAlgorithm=erng3; view=RawData7; type=rpkm; insertLength=200;
> readType=2x75
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to