Re: [galaxy-user] Get only repeatmasked exons

2011-10-20 Thread Anton Nekrutenko
David:
> 
> Question I: what are the small letters and what are the capitals here?
> Are these already masked, exons/introns or what?
> (I downloaded some of these sequences and repeatmasked myself. My pasked 
> sequences overlap with some of "yours" written in small letters.)
> 

In case of mouse the sequences are extracted from softmasked genomic builds 
retrieved from UCSC. So, small lettres = repeats, capital letters = no repeats.

> Question II: Is the strand "honored" by these tool?
> I guess I remember from my old experience that there was an issue although I 
> can not recall what exactly.

Yes, if the strand is explicitly specified. If it is not specified it is 
assumed to be +.

Thanks for using Galaxy.

anton

> 
> Thank you in advance,
> 
> David
> 
> 
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> 
>  http://lists.bx.psu.edu/listinfo/galaxy-dev
> 
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
> 
>  http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] save/export image from workflow editor?

2011-10-20 Thread Anton Nekrutenko
Casey:

Not really except by making a screenshot.

Tx,

anton


Anton Nekrutenko
http://nekrut.bx.psu.edu
http://usegalaxy.org



On Oct 20, 2011, at 3:15 PM, Casey Bergman wrote:

> Dear Galaxy Team -
> 
> [Apologies if this has been answered before on this list before...I couldn't 
> find anything related via google]
> 
> Is there a way to save/export an image of the workflow shown in the workflow 
> editor?  
> 
> Best regards,
> Casey
> 
> 
> 
> 
> 
> 
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> 
>  http://lists.bx.psu.edu/listinfo/galaxy-dev
> 
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
> 
>  http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Issue about uploading data

2011-10-20 Thread Dannon Baker
Nandan,

Have you tried using our FTP upload process?  This is much more robust than the 
standard upload, especially for large files.  See 
http://wiki.g2.bx.psu.edu/Learn/Upload%20via%20FTP

-Dannon

On Oct 20, 2011, at 7:05 PM, Nandan Deshpande wrote:

> Hi,
> 
> My user id is "deshpandenandan1...@gmail.com". I have been trying to upload 2 
> files each around 650 MB in size but the upload is in process for last 15 hrs 
> .. I had a similar problem in uploading data to Galaxy few months back and 
> had to give up using the tool. Can u suggest me a way out?
> 
> cheers,
> 
> Nandan
>  
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> 
>  http://lists.bx.psu.edu/listinfo/galaxy-dev
> 
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
> 
>  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] Issue about uploading data

2011-10-20 Thread Nandan Deshpande
Hi,

My user id is "deshpandenandan1...@gmail.com". I have been trying to upload
2 files each around 650 MB in size but the upload is in process for last 15
hrs .. I had a similar problem in uploading data to Galaxy few months back
and had to give up using the tool. Can u suggest me a way out?

cheers,

Nandan
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] Get only repeatmasked exons

2011-10-20 Thread Managadze, David (NIH/NLM/NCBI) [F]
Dear Galaxy expert(s),

I have .BED file of regions from mouse. I guess many of them can span whole 
genes i.e. many exons; might even span over the gene flanks.
I need to get the REPEATMASKED sequences of only the annotated exons of these 
regions.

I see that If I use the tool "Fetch Sequences->Extract Genomic DNA" on these 
regions, it returns sequences with mixed small and capital letters.

Question I: what are the small letters and what are the capitals here?
Are these already masked, exons/introns or what?
(I downloaded some of these sequences and repeatmasked myself. My pasked 
sequences overlap with some of "yours" written in small letters.)

Question II: Is the strand "honored" by these tool?
I guess I remember from my old experience that there was an issue although I 
can not recall what exactly.

Thank you in advance,

David


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] data format

2011-10-20 Thread Klaudyna Borewicz
Hi,
I would like to use Galaxy to run LEfSe, but I don't know how to get
the data into tabular format that is required
(http://huttenhower.org/galaxy/tool_runner?tool_id=LEfSe_for). My data
is 454 fasta files that I was analyzing with RDP to get the
classification. It works fine, I get .txt file that i can load to
Galaxy, it looks like this:
norank  Root37646
unclassified_Root   9
domain  Bacteria37637
unclassified_Bacteria   5998
phylum  OD1 0
unclassified_OD10
genus   OD1_genera_incertae_sedis   0
phylum  BRC10
unclassified_BRC1   0
genus   BRC1_genera_incertae_sedis  0
phylum  Deferribacteres 0
unclassified_"Deferribacteres"  0
class   Deferribacteres 0
unclassified_Deferribacteres0
order   Deferribacterales   0
unclassified_Deferribacterales  0
family  Deferribacterales_incertae_sedis0
unclassified_Deferribacterales_incertae_sedis   0
genus   Caldithrix  0
family  Deferribacteraceae  0
unclassified_Deferribacteraceae 0
genus   Calditerrivibrio0
genus   Mucispirillum   0
.

but i need to have the labels in a hierarchical organization and I
cannot find the way to get it to work. Please let me know if you have
any suggestions, or maybe RDP is just not the way to go.

Thank you and hope to hear from you soon,
Klaudyna

-- 

Klaudyna Borewicz  M.Sc, B.Sc
Department of Veterinary and Biomedical Sciences
University of Minnesota
1971 Commonwealth Ave.
St. Paul, Minnesota 55108
Phone: (612)624-6226
FAX: (612)625-5203

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] Creating or Downloading GTF files for use with Cuffcompare

2011-10-20 Thread Joe Harrison

Hello Olivier, Emilie and the Galaxy community -

I have run into a similar problem with my RNA-seq analysis, in that I 
can run the analysis up to the point of Cufflinks producing a list of 
FPKM values for my genome of interest (in this case, Staphylococcus 
aureus strain Newman).  However, I cannot find a place to download a 
compatible .GTF file with the reference annotation.  Would you or anyone 
else in the community know of tool or database where .GTF files could be 
created from another input file (such as GFF3), or better yet, just 
downloaded?


As for possibilities with file conversion, most microbial genomes are 
available from NCBI in a variety of formats (but not GTF).  For S. 
aureus Newman, these files can be found at the following link:


ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Staphylococcus_aureus_Newman_uid18801

Many thanks for your help!

Joe

---
Joe J. Harrison
Senior Fellow
Department of Microbiology
University of Washington
1705 NE Pacific Street, HSB J181
Seattle, WA USA 98195


On 10/20/2011 11:26 AM, Emilie Chautard wrote:

Hi Olivier,

Did you try to run Cuffcompare (part of Cufflinks) on your results?
According to the Cufflinks manual 
(http://cufflinks.cbcb.umd.edu/manual.html):


>Cufflinks includes a program that you can use to help analyze the 
transfrags you assemble. The program cuffcompare helps you:

>  - Compare your assembled transcripts to a reference annotation
>  [...]

In the Galaxy version of Cuffcompare, I think that you can provide a 
reference annotation file using "Use Reference Annotation:", which 
will be compared to your results with Cufflinks.
It makes an "union" of the transcripts obtained with Cufflinks with 
the annotation file (both in *.gtf format). You can then obtain a 
transcript identifier for those already annotated.
It also provides a class code for the transcripts, which can inform 
about a potential isoform for example.

Hope this helps.

Emilie
--
Emilie Chautard, PhD
Postdoctoral Fellow

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel: 416-673-8518
Toll-free: 1-866-678-6427
www.oicr.on.ca 



Message: 7
Date: Thu, 20 Oct 2011 15:12:45 +0200
From: GANDRILLON OLIVIER mailto:olivier.gandril...@univ-lyon1.fr>>
To: "galaxy-u...@bx.psu.edu "
mailto:galaxy-u...@bx.psu.edu>>
Subject: [galaxy-user] Names for genes in RNA-Seq analysis
Message-ID: mailto:cac5eaed.8e99%25olivier.gandril...@univ-lyon1.fr>>
Content-Type: text/plain; charset="windows-1252"

Hello

I am using Galaxy to analyse RNA-seq libraries made from chicken
cells.

I just groomed my sequences, passed them through TopHat and then
Cufflinks.

This worked well and in the end I get a list of genes and their
respective FPKM values.

My only problem is that the names of the genes do not appears in
the listing, they are simply reference as "CUFF.1, CUFF.2, " etc?

Could you please tell me how I could obtain gene names? (I went
through the FAQ and could not get the answer).

Sincerely

Olivier



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] save/export image from workflow editor?

2011-10-20 Thread Casey Bergman
Dear Galaxy Team -

[Apologies if this has been answered before on this list before...I couldn't 
find anything related via google]

Is there a way to save/export an image of the workflow shown in the workflow 
editor?  

Best regards,
Casey






___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Names for genes in RNA-Seq analysis (Emilie Chautard)

2011-10-20 Thread Emilie Chautard
Hi Olivier,

Did you try to run Cuffcompare (part of Cufflinks) on your results?
According to the Cufflinks manual (http://cufflinks.cbcb.umd.edu/manual.html
):

>Cufflinks includes a program that you can use to help analyze the
transfrags you assemble. The program cuffcompare helps you:
>  - Compare your assembled transcripts to a reference annotation
>  [...]

In the Galaxy version of Cuffcompare, I think that you can provide a
reference annotation file using "Use Reference Annotation:", which will be
compared to your results with Cufflinks.
It makes an "union" of the transcripts obtained with Cufflinks with the
annotation file (both in *.gtf format). You can then obtain a transcript
identifier for those already annotated.
It also provides a class code for the transcripts, which can inform about a
potential isoform for example.
Hope this helps.

Emilie
--
Emilie Chautard, PhD
Postdoctoral Fellow

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Tel: 416-673-8518
Toll-free: 1-866-678-6427
www.oicr.on.ca



> Message: 7
> Date: Thu, 20 Oct 2011 15:12:45 +0200
> From: GANDRILLON OLIVIER 
> To: "galaxy-u...@bx.psu.edu" 
> Subject: [galaxy-user] Names for genes in RNA-Seq analysis
> Message-ID: 
> Content-Type: text/plain; charset="windows-1252"
>
> Hello
>
> I am using Galaxy to analyse RNA-seq libraries made from chicken cells.
>
> I just groomed my sequences, passed them through TopHat and then Cufflinks.
>
> This worked well and in the end I get a list of genes and their respective
> FPKM values.
>
> My only problem is that the names of the genes do not appears in the
> listing, they are simply reference as "CUFF.1, CUFF.2, " etc?
>
> Could you please tell me how I could obtain gene names? (I went through the
> FAQ and could not get the answer).
>
> Sincerely
>
> Olivier
>
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Zoom function on workflow editor?

2011-10-20 Thread Dannon Baker
Arthur,

There is no zoom for the main editor window, though you can use(and resize) the 
little mini view in the bottom right corner to move around the workflow 
quickly.  You could also use the browser-level zoom (in Chrome and Firefox, at 
least) to see much more at a time.

-Dannon



On Oct 20, 2011, at 10:26 AM, Arthur Goldberg wrote:

> Hi 
> 
> Galaxy's cool!
> Is there a zoom function on the workflow editor? (I know about hiding nodes 
> from the tutorial.)
> 
> Thanks
> Arthur
> 
> -- 
> Senior Research Scientist
> Computational Biology
> Memorial Sloan-Kettering Cancer Center
> cBio Cancer Genomics Portal
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> 
>  http://lists.bx.psu.edu/listinfo/galaxy-dev
> 
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
> 
>  http://lists.bx.psu.edu/

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-user] Zoom function on workflow editor?

2011-10-20 Thread Arthur Goldberg

Hi

Galaxy's cool!
Is there a zoom function on the workflow editor? (I know about hiding 
nodes from the tutorial .)


Thanks
Arthur

--
Senior Research Scientist
Computational Biology
Memorial Sloan-Kettering Cancer Center
cBio Cancer Genomics Portal 
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Names for genes in RNA-Seq analysis

2011-10-20 Thread Carl Schmidt

I am also using Galaxy to analyze RNA-seq libraries from chicken.
While the names of the genes appear in the Cufflinks output, the FPKM  
values are all zero.



On Oct 20, 2011, at 9:12 AM, GANDRILLON OLIVIER wrote:


Hello

I am using Galaxy to analyse RNA-seq libraries made from chicken  
cells.


I just groomed my sequences, passed them through TopHat and then  
Cufflinks.


This worked well and in the end I get a list of genes and their  
respective FPKM values.


My only problem is that the names of the genes do not appears in the  
listing, they are simply reference as "CUFF.1, CUFF.2, " etc…


Could you please tell me how I could obtain gene names? (I went  
through the FAQ and could not get the answer).


Sincerely

Olivier



-
New mail adress: olivier.gandril...@univ-lyon1.fr

Dr Olivier Gandrillon
Centre de Génétique et de Physiologie Moléculaires et Cellulaires
UMR CNRS 5534
Université Claude Bernard Lyon I
Bat Gregor Mendel (ex 741)
16, rue Raphaël Dubois
69622 Villeurbanne Cedex
Phone : 04-72-44-81-90
Fax : 04-72-43-26-85
Web adress :
Lab: http://cgphimc.univ-lyon1.fr/spip.php?rubrique33&lang=en
Perso: http://www.cgmc.univ-lyon1.fr/Gandrillon/OG/OG1.html

"Comment obtenait-il l'adhésion du peuple aux nouveaux mensonges  
qu'il inventait chaque jour? Précisément parce que c'était des  
mensonges et précisément parce qu'ils étaient une insulte à la  
perception. Le peuple était hypnotisé par l'aplomb, ce droit qu'il  
s'octroyait de contredire l'évidence. Les gens portaient un regard  
médusé sur un Goebbels déchaîné. Ils voyaient en transparence son  
souhait énorme de nier le nain boiteux"


Tobie Nathan, in "Qui a tué Arlozoroff"

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Carl Schmidt
Associate Professor
Animal & Food Sciences
University of Delaware
Newark, DE 19716
051 Townsend Hall
schmi...@udel.edu
302-831-1334






___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-user] Names for genes in RNA-Seq analysis

2011-10-20 Thread GANDRILLON OLIVIER
Hello

I am using Galaxy to analyse RNA-seq libraries made from chicken cells.

I just groomed my sequences, passed them through TopHat and then Cufflinks.

This worked well and in the end I get a list of genes and their respective FPKM 
values.

My only problem is that the names of the genes do not appears in the listing, 
they are simply reference as "CUFF.1, CUFF.2, " etc…

Could you please tell me how I could obtain gene names? (I went through the FAQ 
and could not get the answer).

Sincerely

Olivier



-
New mail adress: olivier.gandril...@univ-lyon1.fr

Dr Olivier Gandrillon
Centre de Génétique et de Physiologie Moléculaires et Cellulaires
UMR CNRS 5534
Université Claude Bernard Lyon I
Bat Gregor Mendel (ex 741)
16, rue Raphaël Dubois
69622 Villeurbanne Cedex
Phone : 04-72-44-81-90
Fax : 04-72-43-26-85
Web adress :
Lab: http://cgphimc.univ-lyon1.fr/spip.php?rubrique33&lang=en
Perso: http://www.cgmc.univ-lyon1.fr/Gandrillon/OG/OG1.html

"Comment obtenait-il l'adhésion du peuple aux nouveaux mensonges qu'il 
inventait chaque jour? Précisément parce que c'était des mensonges et 
précisément parce qu'ils étaient une insulte à la perception. Le peuple était 
hypnotisé par l'aplomb, ce droit qu'il s'octroyait de contredire l'évidence. 
Les gens portaient un regard médusé sur un Goebbels déchaîné. Ils voyaient en 
transparence son souhait énorme de nier le nain boiteux"

Tobie Nathan, in "Qui a tué Arlozoroff"

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Patch for better FASTQ description handling

2011-10-20 Thread Peter Cock
On Thu, Oct 20, 2011 at 2:15 PM, Eric Cabot  wrote:
>> I was not aware of this new naming. It seems like a terrible decision from
>> Illumina because now both reads in a pair technically have the same ID (but
>> a different description).
>
> This is not quite the case. Here are two fastq header lines for a pair of
> reads produced by Illumina's CASAVA 1.8:
>
> @XYZZY:123:D0ABCDEFG:7:1101:1445:2057 1:N:0:CTTGTA
> @XYZZY:123:D0ABCDEFG:7:1101:1445:2057 2:N:0:CTTGTA

Yes, Illumina gives both read 1 and read 2 the same template ID
of XYZZY:123:D0ABCDEFG:7:1101:1445:2057 (much like the
two reads would have the same ID in a SAM/BAM file).

> The two key things to note, relevant to this discussion are:
>
> 1. A space character is used to split the fields into two groups.
> This is actually a good thing, because that particular character can NEVER
> appear in either a sequence or a quality line. This make it easy to detect
> name lines as those beginning with "@" (a valid quality character) and also
> having a space. If you are writing a parser for the new Illumina fastq
> format, please don't break the names on spaces!

Yes, you could use the space as a sanity test for *this* style Illumina
FASTQ, and have a bespoke parser which treats this all specially.
But for a generic FASTQ parser you *should* split at the space.

The point is Illumina have changed the meaning of their FASTQ
identifier, it used to be the template ID plus a /1 or /2 suffix, but
now it is just the common template ID used for both parts.

> 2. Appart from the read number, encoded as the digit immediately following
> the space, the two lines are identical--as they were with earlier CASAVA
> versions.  Why is this worse than two lines differing by "/1" vs. "/2"?

Because it is a change from the existing well established convention,
which will require changed to hundreds of scripts and and tools
(guessed number including user's bespoke scripts).

> An additional improvement with the new naming convention is that flowcell
> and run ID's, as well as a flag for not passing filters (where N means does
> PF), are now included.

Yes, that is good.

Peter

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-user] Patch for better FASTQ description handling

2011-10-20 Thread Eric Cabot

Florent Angly wrote:

Peter and Daniel, thanks for the comments.

On 19/10/11 23:49, Peter Cock wrote:
On Wed, Oct 19, 2011 at 2:31 PM, Daniel Blankenberg  
wrote:

Hi Florent,
Sorry for the delay.  I did try the patch out shortly after you 
contributed
it, but it caused the functional to fail.  I was able to fix the 
issue and
allow the existing tests to start passing, but I've been bogged down 
lately
and haven't been able to perform a more thorough review of the code. 
If you
could provide tests with files (e.g. for the tools affected) that 
test the

new functionality, that would be a great help.

I'll have a look at that.
The use of partition removes python compatibility for<2.5, although 
this is

a lesser/non-concern.

I guess you could use split, but special case on there being no space.

Also, I'm not entirely sold on having the "Identifier line" being 
parsed as
  "identifier" +  + "description" instead a single identifier 
line.

That is the normal convention, just like with FASTA.
http://dx.doi.org/10.1093/nar/gkp1137
The Bioperl and Biopython projects use this convention for FASTA and 
FASTQ files.



This would mean that identifiers could not themselves contain spaces,
but "There is no standardization for identifiers" (so they could 
technically
have spaces?). Could two different reads be identified as "Read A" 
and "Read
B", but then would no longer be uniquely identifiable as each would 
then be

identified as "Read".  If this added functionalilty were introduced as
optional behavior (e.g. a user needs to click a checkbox on the tools to
apply the id line splitting), these concerns can be mitigated.
That is expected, "@Read A" and "@Read B" have the same identifier, 
"Read".


Peter, Florent, anyone else: I'd be very interested to hear your 
thoughts on
the above, particularly in respect to know real-world data. For now, 
lets

discount SRA data from this discussion.

See also the new Illumina 1.8 naming convention where they dropped
the /1 and /2 and hit it in the description. It should be tested, but 
I think
Florent's patch will work here (while the current Galaxy behaviour 
won't).


Peter
I was not aware of this new naming. It seems like a terrible decision 
from Illumina because now both reads in a pair technically have the same 
ID (but a different description).


This is not quite the case. Here are two fastq header lines for a pair of 
reads produced by Illumina's CASAVA 1.8:


@XYZZY:123:D0ABCDEFG:7:1101:1445:2057 1:N:0:CTTGTA
@XYZZY:123:D0ABCDEFG:7:1101:1445:2057 2:N:0:CTTGTA

The two key things to note, relevant to this discussion are:

1. A space character is used to split the fields into two groups.
This is actually a good thing, because that particular character can NEVER 
appear in either a sequence or a quality line. This make it easy to detect 
name lines as those beginning with "@" (a valid quality character) and 
also having a space. If you are writing a parser for the new Illumina 
fastq format, please don't break the names on spaces!


2. Appart from the read number, encoded as the digit immediately following 
the space, the two lines are identical--as they were with earlier CASAVA 
versions.  Why is this worse than two lines differing by "/1" vs. "/2"?


An additional improvement with the new naming convention is that flowcell 
and run ID's, as well as a flag for not passing filters (where N means 
does PF), are now included.




Eric L. Cabot
Biotechnology Center
University of Wisconsin-Madison




___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] bed12 format for stitching blocks given a set of coding exon intervals

2011-10-20 Thread Daniel Blankenberg
Hi Amit,

Without taking a look at your history, I'll have to make a guess. When you 
retrieve regions from UCSC, on the second step, right before you click "Send 
query to Galaxy", make sure that you have  "Whole Gene" selected under "One 
record per", and that you are looking at a gene track and that the format was 
set to "BED" on the first page. Also be sure to Not include a track header.

Thanks for using Galaxy,

Dan


On Oct 19, 2011, at 11:10 PM, Amit Indap wrote:

> Hi Galaxy,
> 
> I am trying to stitch together MAF alignments for the coding sequence
> for a few genes of interest in Drosophila. I used UCSC to send the bed
> intervals of the coding exons of my gene and sent the output to
> Galaxy. But when I try and use the tool "Stitch Gene blocks" it
> complains that my bed is a bed3 and not a bed12.
> 
> I'm a bit rusty with my browser skills, but how can I send my output
> to Galaxy as a bed12 format so I can stitch my MAF blocks together?
> 
> Thanks for your help!
> 
> Amit
> 
> -- 
> Amit Indap
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
> 
>  http://lists.bx.psu.edu/listinfo/galaxy-dev
> 
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
> 
>  http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/