Re: [galaxy-user] Problem loading BAM into IGV browser - invalid GZIP header error message

2013-11-29 Thread Jim Robinson

Hi Sebastian,

Is it possible to share an example bam that exhibits this problem on a 
Galaxy server I can reach?   Also, which version of IGV are you using 
(select Help  About... to see the version).


-- Jim


Dear all,


sometimes I encouter a problem trying to load BAM files directly from Galaxy into the IGV 
browser. First I am starting the IGV browser locally, then clicking on the appropriate 
BAM file and on display with IGV _local_ in Galaxy. In most cases it works, 
but for some reasons not with specific files. The error message says

Error loading http://_URL-to-file_/galaxy_example.bam: An error occured while 
accessing http://_URL-to-file_/galaxy_example.bam
Invalid GZIP header

What does it mean? And why am I able to download the BAM file and load it from 
HDD into the IGV?
The problem comes with all BAM files of one sample cohort, but not with another 
(but same sample design and workflow used). Rerunning the workflow doesn't 
help...


I would be very thankful for every kind of help!


Best,
Sebastian

Helmholtz Zentrum München
Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH)
Ingolstädter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir´in Bärbel Brumme-Bothe
Geschäftsführer: Prof. Dr. Günther Wess, Dr. Nikolaus Blum, Dr. Alfons Enhsen
Registergericht: Amtsgericht München HRB 6466
USt-IdNr: DE 129521671

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

 http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] Problems with large gzipped fasta files

2013-02-13 Thread Jim Robinson
Sorry Nate, I misunderstood at first, you want a URL to the dataset here 
on my server?  I can definitely copy one up to an http server,  I still 
have Ricardo's files on a hard disk.   I'll start the copy now and let 
you know when its ready.


Jim


Hi Jim,

Could you send me a URL to the dataset so I can grab a copy and try to 
reproduce this problem?  Sorry for the trouble you've been having with the 
upload functionality and the delay in getting back to you.

--nate

On Feb 5, 2013, at 8:48 AM, Jim Robinson wrote:


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-user] Problems with large gzipped fasta files

2013-02-05 Thread Jim Robinson

Hi,

I am having a lot of difficulty uploading some large gzipped fastqs (~ 
10GB) to the public server.   I have tried both ftp and pulling by 
http URL.   The upload succeeds, however I get an error as it tries to 
gunzip it.I have tried more than 10 times now and succeeded once.  
These files are correct and complete, and gunzip properly locally.   The 
error shown is usually this


empty
format: txt, database: ?
Problem decompressing gzipped data

However on 2 occasions (both ftp uploads) I got the traceback below.   
Am I missing some obvious trick?   I searched the archives and see 
references to problems with large gzipped files but no solutions.


Thanks

Jim


Traceback (most recent call last):
  File /galaxy/home/g2main/galaxy_main/tools/data_source/upload.py, 
line 384, in module

__main__()
  File /galaxy/home/g2main/galaxy_main/tools/data_source/upload.py, 
line 373, in __main__

add_file( dataset, registry, json_file, output_path )
  File /galaxy/home/g2main/galaxy_main/tools/data_source/upload.py, 
line 270, in add_file
line_count, converted_path = sniff.convert_newlines( dataset.path, 
in_place=in_place )
  File /galaxy/home/g2main/galaxy_main/lib/galaxy/datatypes/sniff.py, 
line 106, in convert_newlines

shutil.move( temp_name, fname )
  File /usr/lib/python2.7/shutil.py, line 299, in move
copy2(src, real_dst)
  File /usr/lib/python2.7/shutil.py, line 128, in copy2
copyfile(src, dst)
  File /usr/lib/python2.7/shutil.py, line 84, in copyfile
copyfileobj(fsrc, fdst)
  File /usr/lib/python2.7/shutil.py, line 49, in copyfileobj
buf = fsrc.read(length)
IOError: [Errno 5] Input/output error
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] Problem with bam and/or bai files

2011-10-27 Thread Jim Robinson
 Its possible the sorting problem was a specific version and now gives 
an error.  The incorrect index caused by bad sequence lengths is a 
recurrent problem, but I do not know what tool produces such headers.  
Perhaps someone who has experienced this can chime in.


I'm not a samtools expert just sharing my experience on what has caused 
this error int the past.   It does seem that, as a general rule,  that 
these index problems result in errors from Picard (which the GATK uses), 
while samtools can fail silently and sometimes and give you an unrelated 
query region.


Jim


Sending to galaxy-dev ...

On Thu, Oct 27, 2011 at 5:51 AM, Jim Robinson
jrobi...@broadinstitute.org  wrote:

Hi Mike,

Someone from the Galaxy team can perhaps give some insight on
what went wrong,  I can comment on the error message from IGV.
That error is thrown from Picard, in every case I've investigated so
far it was traced to a problem with the index.

Useful background re: Error reading bam file. This usually indicates
a problem with the index (bai) file. ArrayIndexOutofBoundsException:
4682 (4682).


The most common causes are (1) a problem with the sequence
dictionary in the BAM header itself, specifically incorrect sequence
lengths,

Any idea what tools produce that kind of thing?


and (2) indexing an un-sorted BAM.  Apparently samtools will
make invalid indexes from such files without any complaints in
both cases.  You can even use samtools tview on such files,
it happily will show you some random region when you query.

That is news to me - I recall samtools index being recommended
as a way to determine if a BAM files was sorted or not (error on
unsorted, you get an index if it was sorted) and again from
memory this is what Galaxy uses internally as part of preparing
BAM files on upload.

Might this be tied to a specific version of samtools? e.g. a
possible regression?




I don't see a Sort step in your workflow, maybe that's the problem?

Please CC me on any reply,  I might miss it in the list.

Jim

Thanks,

Peter


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] Problem with bam and/or bai files

2011-10-26 Thread Jim Robinson

 Hi Mike,

Someone from the Galaxy team can perhaps give some insight on what went 
wrong,  I can comment on the error message from IGV.   That error is 
thrown from Picard, in every case I've investigated so far it was traced 
to a problem with the index.  The most common causes are (1) a problem 
with the sequence dictionary in the BAM header itself, specifically 
incorrect sequence lengths, and (2) indexing an un-sorted BAM.  
Apparently samtools will make invalid indexes from such files without 
any complaints in both cases.  You can even use samtools tview on such 
files,  it happily will show you some random region when you query.


I don't see a Sort step in your workflow, maybe that's the problem?

Please CC me on any reply,  I might miss it in the list.

Jim




Hello Galaxy Team,
I have been using Galaxy for SNP detection for with great success. 
Basically, I followed the screen-cast from Anton without any problems. 
The only change was to use the BWA instead of Bowtie. Until now, I 
have always assigned my raw read files to the hg19 format. Now I want 
to try the GATK pipeline to analyze my samples but I am running into a 
problem with the bam/bai files.
Here is what I did. I imported my Illumina paired end reads into 
Galaxy and assigned them to the hg_g1k_v37 format instead of the Hg19 
format. From there, I again followed the exact same process: FastQ 
Groomer, Summary Statistics, Boxplots, Align with BWA, filter on SAM, 
SAM-to-Bam, generate bai file. I made sure that hg_g1k_37 was chosen 
for the format for all of these steps that required that information.
Everything seemed to run successfully as all of the boxed turned 
green. When I tried to view the bam file in IGV (as a QC step before 
the GATK pipeline), I received the following error: Error reading bam 
file. This usually indicates a problem with the index (bai) file. 
ArrayIndexOutofBoundsException: 4682 (4682).
I did the exact same analysis using the Hg19 format and my bam/bai 
files worked perfectly fine in the IGV viewer. Can anyone tell me what 
the problem is and how to fix it?

Thanks,
Mike Dufault


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-user] RNA seq analysis

2011-05-06 Thread Jim Robinson

Hi Vasu,

I'm going to add the function to index BAM files soon, using Picard.
In the beginning  there was no java BAM reader, only SAM, and I  
added the index then.  Indexed BAMs came along later, but that's  
probably more than you want to know...I think most people will  
still use Galaxy to index as it can take a long time, but I agree with  
you on the convenience factor.


Jim


On May 6, 2011, at 9:36 PM, vasu punj wrote:

One of the problem is IGV dont have option of creating index file so  
one has to create index file in Galaxy first to  view in IGV. Jim I  
have been using IGV 2 beta version it is great work but How hard is  
to include index functionality with in IGV. I know we can use sam  
tools also but just for convinence if it is not that much of work.

Vasu

--- On Fri, 5/6/11, Sean Davis ssdav...@mail.nih.gov wrote:

From: Sean Davis sdav...@mail.nih.gov
Subject: Re: [galaxy-user] RNA seq analysis
To: Austin Paul austi...@usc.edu
Cc: galaxy-user@lists.bx.psu.edu galaxy-user@lists.bx.psu.edu, puvan...@umn.edu 
 puvan...@umn.edu

Date: Friday, May 6, 2011, 8:02 PM

IGV reads BAM files just fine; no need to convert to SAM.
Sean

On Fri, May 6, 2011 at 8:45 PM, Austin Paul austi...@usc.edu wrote:
There are many ways.  I typically use IGV.  It needs a sam file, so  
I first convert the bam to sam in galaxy, then download the sam  
file.  In IGV, I upload the reference and the sam file, then use  
IGVtools to index the sam file, then I can visualize the data.


Austin
On Fri, May 6, 2011 at 5:30 PM, puvan...@umn.edu wrote:
Hello

I was able to run RNA seq data against a custom build genome. How  
can I visualize the results. I tried via trackster and unfortunately  
I couldn't. Can you help me?



Thanks

Sumathy

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/



-Inline Attachment Follows-

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-user] get wig file after tophat

2011-04-20 Thread Jim Robinson
I can answer IGV questions,  sadly I'm still coming up to speed on  
Galaxy.


I've lost track of the original question,  but IGV computes a coverage  
histogram on the fly,  a bit like the Galaxy Track Browser,  but you  
have to be zoomed in.   However,  you can also precompute a coverage  
histogram for the whole genome with igvtools,  a command line  
package.   Its in a binary format (tdf) that can support viewing at  
any resolution in IGV.   Finally,  you can use igvtools to compute  
this as a standard wig file,  just supply .wig as the extension  
instead of .tdf. This is not as efficient as TDF but you can use  
it in other browsers,  such as Galaxy and IGB.


Best,

Jim



Hi Ying,

You're in luck because I've been working with genome browsers  
lately, so I think I can help you address your problem. What you're  
looking for is a visualization of a coverage histogram for the BAM  
reads produced by Tophat, yes?


It turns out that some genome browsers provide this automatically as  
part of their solution for visualizing BAM files b/c BAM files tend  
to be very large and hence visualizing aggregated data is often the  
best solution. Both IGV and the Galaxy Trackster Browser support  
this functionality. I think you'll have to do some simple file  
conversions to get the display you want in IGV; you can check out  
the IGV documentation or perhaps Jim can help. I'm not sure if IGB  
supports this visualization mode for BAM; Ann can chime in with  
additional information.


The Galaxy Track Browser supports coverage histograms when viewing  
large regions. When zoomed in, the reads are typically displayed  
individually, although there is a (very beta) option to create a  
histogram for the visible set of reads; this option may not work  
well (yet!) as Tophat reads often have large gaps.


The top track in this visualization shows a coverage histogram for a  
set of Tophat reads:


http://test.g2.bx.psu.edu/u/jeremy.goecks/v/assembly-of-h1-hesc-rna-seq-data

Please see my previous email to Vasu for details about setting up a  
visualization in the Galaxy Track Browser.


Best,
J.



On 4/20/11 5:16 PM, Ying Zhang ying.zhang.yz...@yale.edu wrote:


Dear Ann and Jeremy:

We have this discussion long time ago, and I am sorry that I  
brought it

up here
again. I am just thinking that as Ann said, can we add this tool  
which convert

bam into wig file into galaxy? Or make a workflow to generate a wig
file from a
bam file generate by tophat? In this way we can just easily get a  
wig

file from
galaxy and will be able to see it in IGB. I know this may seems
unnecessary for
the purpose of statistical analysis, but if we can see the  
coverage with IGB,
sometimes it is helpful to pick up interesting points quickly for  
specific
genes. This may seems a old fashion way but my boss is a big fan  
of using IGB
to see expression file(wig or sgr file) and do some analysis.  
THanks a lot!


BEst

Ying

Quoting Jeremy Goecks jeremy.goe...@emory.edu:


Hi all,

Ann is correct - Tophat does not produce .wig files when run  
anymore.

However, it's fairly easy to use Galaxy to make a wiggle-like
coverage file from a BAM file:

(a) run the pileup tool on your BAM to create a pileup file;
(b) cut columns 1 and 4 to get your coverage file.

A final note: it's often difficult to visualize coverage files
because they're so large. You might be better off visualizing the  
BAM

file and using the coverage file for statistics.

Best,
J.


Hello,

I think I know the answer (sort of) to this question.

This may be because newer versions of tophat stopped running the  
wiggles
program, which is still part of the tophat distribution and is  
the program

that makes the coverage.wig file.

A later version of tophat might bring this back, however -  
there's a note to

this effect in the tophat python code.

So if you can run wiggles, you can make the coverage.wig file  
on your own.


A student here at UNC Charlotte (Adam Baxter) made a few changes  
to the
wiggles source code that would allow you to use it with  
samtools to make a
coverage.wig file from the accepted_hits.bam file that  
TopHat creates.


If you (or anyone else) would like a copy, please email Adam,  
who is cc'ed

on this email.

We would be happy to help add it to Galaxy if this would be of  
interest to

you or other Galaxy users.

If there is any way we can be of assistance, please let us know!

Very best wishes,

Ann Loraine


On 2/21/11 3:39 PM, Ying Zhang ying.zhang.yz...@yale.edu  
wrote:



Hi:

I am using tophat in galaxy to analyze my paired-end RNA-seq data
and find out
that after the tophat analysis, we can not get the wig file  
from it anymore

which is used to be able to. Do you have any idea of how to still
be able to
get the wig file after tophat analysis? Thanks a lot!

Best

Ying Zhang, M.D., Ph.D.
Postdoctoral Associate
Department of Genetics,
Yale University School of Medicine
300 Cedar Street,S320
New Haven, CT 06519
Tel: 

Re: [galaxy-user] Sort index SAM-files automatically

2011-03-25 Thread Jim Robinson

Hi Jo,

To short-circuit confusion I'll jump in here.  I'm the developer of  
IGV and igvtools,  the sorting and indexing for SAM files was added  
long ago, even before indexed BAM files were possible from Java  
programs.   The recommendation now is to convert to BAM and index  
that,  although SAM files still work.   If the galaxy community would  
like the SAM option I'm happy to have igvtools wrapped as a module,  
and will help with that.


BTW,  I also have some code (xml) to wrap IGV itself as a Galaxy  
visualizer,  contributed to me by a user.  As I don't have a private  
Galaxy installation I'm unable to test it myself, but can make it  
available if anyone is interested.


Jim



Hello!

I convert SOLiD csfasta- and qual-files to fastq-files and map those  
against Hg19 (Bowtie).
I would like to use the resulting sam-files in the IGV browser  
(Broad Institute).
Therefore, the sam-file need to be sorted an indexed. This could be  
done using the “igvtools”.
However, it would be nicer if this sorting and indexing could be  
done automatically using GALAXY.

I guess that it is certainly possible – but I do not know how.
Could anybody let me know how it works and what function I have to  
use, respectively?

How can I sort the sam-file the correct way?

Thank you in advance.

Best regards

Jo
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using reply all in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

 http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

 http://lists.bx.psu.edu/