Hi Carlos,
Thank you very much for this explanation.

The format of my intervals file is:

chr133289059732890664NM_000059_cds_1_0_chr13_32890598_f0+chr1332893213
32893462NM_000059_cds_2_0_chr13_32893214_f0+chr133289921232899321
NM_000059_cds_3_0_chr13_32899213_f0+chr133290023732900287
NM_000059_cds_4_0_chr13_32900238_f0+etc...

Can you please explain me how to change this format so I will be able to
give it as an input to DepthOfCoverage

Thanks,
   Lilach

2012/6/21 Carlos Borroto <carlos.borr...@gmail.com>

> On Thu, Jun 21, 2012 at 10:50 AM, Lilach Friedman <lilac...@gmail.com>
> wrote:
> > Hi Jennifer,
> > Thank you for this reply.
> >
> > I made a new BWA file, this time using the hg19(full) genome.
> > However, when I am trying to use DepthOfCoverage, the reference genomr is
> > stucked on the hg_g1k_v37 (this is the only option to select), and I
> cannot
> > change it to hg19(full). Most probably, because I selected hg_g1k_v37 in
> the
> > previous time I tried to use DepthOfCoverage.
> > It seems as a bug? How can I change it?
> >
>
> Hi Lilach,
>
> I have been dealing with these issues for some time now.
>
> The only genome you can use with Picard and GATK tools in Galaxy is
> hg_g1k_v37. I think this is why.
>
> From GATK Wiki[1]:
> "If you are using human data, your reads must be aligned to one of the
> official b3x (e.g. b36, b37) or hg1x (e.g. hg18, hg19) references. The
> contig ordering in the reference you used must exactly match that of
> one of the official references canonical orderings. These are defined
> by historical karotyping of largest to smallest chromosomes, followed
> by the X, Y, and MT. The order is thus 1, 2, 3, ..., 10, 11, 12, ...
> 20, 21, 22, X, Y, MT. The GATK will detect misordered contigs (for
> example, lexicographically sorted) and throw an error. This draconian
> approach, though unnecessary technically, ensures that all
> supplementary data provided with the GATK works correctly. You can use
> ReorderSam to fix a BAM file aligned to a missorted reference
> sequence."
>
> [1]
> http://www.broadinstitute.org/gsa/wiki/index.php/Input_files_for_the_GATK
>
> So far what I have done when presented with a BAM file produced with
> reference with lexicographical chromosomes ordering, is to use
> Picard's ReorderSam tool, also in Galaxy, selecting hg_g1k_v37 as
> reference. You might not be able to this, as if a recall correctly
> hg19 also use chr1, chr2... instead of 1, 2, ... In that case more
> work needs to be done and at that point is almost easier to just remap
> with the correct reference for use with GATK. In your case it seems
> you already have it. What you might need to do is resort your
> intervals file and probably change the chromosomes identifiers, this I
> think can be done inside Galaxy.
>
> I would love to hear comments about this approach, as sometime I do
> worry like Hiram's comment hints to, that hg19 and hg_g1k_v37 might
> not be completely identical beside the chromosome ordering. In that
> case my resorted BAM or intervals files might be incorrect.
>
> Hope it helps,
> Carlos
>
> > Thanks,
> >   Lilach
> >
> >
> >
> > 2012/6/18 Jennifer Jackson <j...@bx.psu.edu>
> >>
> >> Hi Lilach,
> >>
> >> The problem with this analysis probably has to do with a mismatch
> between
> >> the genomes: the intervals obtained from UCSC (hg19) and the BAM from
> your
> >> BWA (hg_g1k_v37) run.
> >>
> >> UCSC does not contain the genome 'hg_g1k_v37' - the genome available
> from
> >> UCSC is 'hg19'.
> >>
> >> Even though these are technically the same human release, on a practical
> >> level, they have a different arrangement for some of the chromosomes.
> You
> >> can compare NBCI GRCh37  with UCSC hg19 for an explanation. Reference
> >> genomes must be exact in order to be used with tools - base for base.
> When
> >> they are exact, the identifier will be exact between Galaxy and the
> source
> >> (UCSC, Ensembl) or the full Build name will provide enough information
> to
> >> make a connection to NCBI or other.
> >>
> >> Sometimes genomes are similar enough that a dataset sourced from one can
> >> be used with another, if the database attribute is changed and the data
> from
> >> the regions that differ is removed. This may be possible in your case,
> only
> >> trying will let you know how difficult it actually is with your
> analysis.
> >> The GATK pipeline is very sensitive to exact inputs. You will need to be
> >> careful with genome database assignments, etc. Following the links on
> the
> >> tool forms to the GATK help pages can provide some more detail about
> >> expected inputs, if this is something that you are going to try.
> >>
> >> Good luck with the re-run!
> >>
> >> Jen
> >> Galaxy team
> >>
> >>
> >> On 6/18/12 4:42 AM, Lilach Friedman wrote:
> >>
> >> Hi,
> >> I am trying to used Depth of Coverage to see the coverages is specific
> >> intervals.
> >> The intervals were taken from UCSC (exons of 2 genes), loaded to Galaxy
> >> and the file type was changed to intervals.
> >>
> >> I gave to Depth of Coverage two BAM files (resulted from BWA, selection
> of
> >> only raws with the Matching pattern: XT:A:U, and then SAM-to-BAM)
> >> and the intervals file (in advanced GATK options).
> >> The consensus genome is hg_g1k_v37.
> >>
> >> I got the following error message:
> >>
> >> An error occurred running this job: Picked up _JAVA_OPTIONS:
> >> -Djava.io.tmpdir=/space/g2main
> >> ##### ERROR
> >>
> ------------------------------------------------------------------------------------------
> >> ##### ERROR A USER ERROR has occurred (version 1.4-18-g80a4ce0):
> >> ##### ERROR The invalid argume
> >>
> >>
> >> Is it a bug, or did I do anything wrong?
> >>
> >> I will be grateful for any help.
> >>
> >> Thanks!
> >>    Lilach
> >>
> >>
> >> ___________________________________________________________
> >> The Galaxy User list should be used for the discussion of
> >> Galaxy analysis and other features on the public server
> >> at usegalaxy.org.  Please keep all replies on the list by
> >> using "reply all" in your mail client.  For discussion of
> >> local Galaxy instances and the Galaxy source code, please
> >> use the Galaxy Development list:
> >>
> >>   http://lists.bx.psu.edu/listinfo/galaxy-dev
> >>
> >> To manage your subscriptions to this and other Galaxy lists,
> >> please use the interface at:
> >>
> >>   http://lists.bx.psu.edu/
> >>
> >>
> >> --
> >> Jennifer Jackson
> >> http://galaxyproject.org
> >
> >
> > ___________________________________________________________
> > The Galaxy User list should be used for the discussion of
> > Galaxy analysis and other features on the public server
> > at usegalaxy.org.  Please keep all replies on the list by
> > using "reply all" in your mail client.  For discussion of
> > local Galaxy instances and the Galaxy source code, please
> > use the Galaxy Development list:
> >
> >  http://lists.bx.psu.edu/listinfo/galaxy-dev
> >
> > To manage your subscriptions to this and other Galaxy lists,
> > please use the interface at:
> >
> >  http://lists.bx.psu.edu/
>
___________________________________________________________
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to