Hi Carsten,

Thank you for you interest in ENCODE data. The transfrag data in this
composite track are submitted in bed6 + format. The first 6 columns
are standard bed6 with score being anything the lab would like to use
as a scoring function. In this case, they specify the score as 1000*[#
reads in transfrag]/[# reads in most abundant transfrag in this
dataset]. The remaining fields are specified by the submitting
laboratory, usually in a Readme of some sort. I am sorry that the link
did not work for you. We will attempt to fix that as soon as possible.
In the meantime, you may find the lab's description of their data
here: 
http://genome-preview.cse.ucsc.edu/ENCODE/transfragBed6CshlDiscription.html.
If you have further questions, you may contact the contributor
directly, Jon Preall at CSHL labs in the Hannon Lab [email protected].

Vanessa Kirkup Swing
UCSC Genome Bioinformatics Group

---------- Forwarded message ----------
From: Carsten Raabe <[email protected]>
Date: Thu, Jul 7, 2011 at 8:19 AM
Subject: [Genome] format questions
To: [email protected]


Dear all,

my name is Carsten Raabe and I am working at the institute of
experimental Pathology in Münster, Germany.

I do have a hard time understanding what the columns in the
wgEncodeCshlShortRnaSeqGm12878CytosolShortTransfrags.shortFrags file
stand for.
I tried to go via the table browser in order to see the table option
offered and I assume these would the same fields as in the file
specified above. I do have a bunch question on what the specific columns
stand for. Please find my doubts in detail below...

bin
chromReference sequence chromosome or scaffold
chromStartStart position in chromosome
chromEndEnd position in chromosome
nameName of item
scoreScore from 0-1000 >>>>> what does the score indicate
strand    + or -
length >>>>> Difference between end and start position ??
numUnique >>>> unique reads in the contig ?
numReads >>>> all reads forming the contig ?
minSeqCount >>>> what is the difference between seq and read ?
maxSeqCount
aveSeqCount
firstSeqCount >>>> Besides the question above I am not clear on what
does "first" indicate here.
medSeqCount
thirdSeqCount >>>> The same question as with  "first" in the above row.
minReadCount >>>> Again what would be the difference between read and seq.
maxReadCount
aveReadCount
firstReadCount
medReadCount
thirdReadCount what would the meaning of third here.
numRegions >>>> I assume it refers to significant region within the
contig, how would these (>>> regions) be defined??
regStart
regLength
seqCount
regCount
sumCount


I checked the mailing list and didn't find a lot of information on these
questions above. I furthermore saw the description of tranfrags on the
browser page

"Small RNA reads were assembled into "Transfrags" by merging reads with
one or more overlapping nucleotides. In order to minimize ambiguity from
reads that have the potential to map to multiple genomic loci, only the
uniquely mapping reads were used to generate Transfrags. The BED6+
format of the transfrag files are created from "intervals-to-contigs"
Galaxy tool written by Assaf Gordon in the Hannon lab at CSHL. A
complete description of the columns in this format can be found here.
The Transfrags view includes all transfrags before filtering."


However, the BED6 link doesn't work. I am also confused with regards to
the term unique if only unique reads (>>> unique in position, as stated
in the citation) were utilized to form the corresponding frags, why is
the number of numUnique not always identical to numReads.


I am sorry for troubling you a lot with all these questions, however I
wouldn't know whom else to ask.

Keep up the beautiful work.

Cheers,

Carsten
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to