Re: [galaxy-user] Problems with large gzipped fasta files
Hi Jim, You message was misthreaded (perhaps a reply to another thread, with just the subject line changed?), but I was able to dig it out. A this time, there are no known issues with FTP Upload to the public Main server. Any issues you have have found prior were either related to a problem with the original file content (compression problem) or a transitory issue with the FTP server that has since been resolved (there has been a handful in the last few years). The instructions to follow are here: http://wiki.galaxyproject.org/FTPUpload I am not exactly sure what your issue is, but any chance that you have more than one file per archive? That will certainly cause an issue, but usually with just the first file loading the remainder not. Please send more details if this continues. Does the failure occur at the FTP stage or at the point where you move from the FTP holding area into a history? Thanks! Jen Galaxy team On 2/5/13 5:48 AM, Jim Robinson wrote: Hi, I am having a lot of difficulty uploading some large gzipped fastqs (~ 10GB) to the public server. I have tried both ftp and "pulling" by http URL. The upload succeeds, however I get an error as it tries to gunzip it.I have tried more than 10 times now and succeeded once. These files are correct and complete, and gunzip properly locally. The error shown is usually this empty format: txt, database: ? Problem decompressing gzipped data However on 2 occasions (both ftp uploads) I got the traceback below. Am I missing some obvious trick? I searched the archives and see references to problems with large gzipped files but no solutions. Thanks Jim Traceback (most recent call last): File "/galaxy/home/g2main/galaxy_main/tools/data_source/upload.py", line 384, in __main__() File "/galaxy/home/g2main/galaxy_main/tools/data_source/upload.py", line 373, in __main__ add_file( dataset, registry, json_file, output_path ) File "/galaxy/home/g2main/galaxy_main/tools/data_source/upload.py", line 270, in add_file line_count, converted_path = sniff.convert_newlines( dataset.path, in_place=in_place ) File "/galaxy/home/g2main/galaxy_main/lib/galaxy/datatypes/sniff.py", line 106, in convert_newlines shutil.move( temp_name, fname ) File "/usr/lib/python2.7/shutil.py", line 299, in move copy2(src, real_dst) File "/usr/lib/python2.7/shutil.py", line 128, in copy2 copyfile(src, dst) File "/usr/lib/python2.7/shutil.py", line 84, in copyfile copyfileobj(fsrc, fdst) File "/usr/lib/python2.7/shutil.py", line 49, in copyfileobj buf = fsrc.read(length) IOError: [Errno 5] Input/output error ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ -- Jennifer Hillman-Jackson Galaxy Support and Training http://galaxyproject.org ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-user] Error / Nebula and RepeatMasker
Hi all, > * For the tool FilterControl, our cluster is configured to kill jobs that use > more than 4Go of memory. > I don't achieve to modify qsub options in my Galaxy instance, so I'he changed > this option -Xmx6g in "-Xmx4g". Maybe some treatment won't run by lake of > memory .. The memory that you specify with -Xmx6g is the memory used by Java and NOT the memory used for the job launched by Galaxy on the cluster (see http://www.auditmypc.com/java-memory-xmx512.asp). So you have to specify in your java configuration the min and max memory usable. > If you have any idea on how to add option on Galaxy qsub, could you please > help me ? > I would like to add these options to qsub : qsub -l mem=6G -l h_vmem=8G Try by specifying in universe_wsgi.ini runner:///queue/-l nodes=1:ppn=1,mem=6gb,h_vmem=8gb/ If it's not understood by Galaxy,To add new qsub options, you have to make some modifications in the source code: 1. Identify your scheduler (pbs, drmaa, sge) 2. Edit the python script which creates jobs for your scheduler: GALAXY_INSTALL_DIR/galaxy-dist/lib/galaxy/jobs/runners/pbs.py GALAXY_INSTALL_DIR/galaxy-dist/lib/galaxy/jobs/runners/drmaa.py GALAXY_INSTALL_DIR/galaxy-dist/lib/galaxy/jobs/runners/sge.py 3. Search in the script the function which parse your scheduler options (ex: for pbs.py, the function is named "def determine_pbs_options( self, url ):") 4. Modify the parsing step to make this function understand the h_vmem option Hope it's help you, ++, Alban -- Alban Lermine Unité 900: INSERM - Mines ParisTech - Institut Curie " Bioinformatics and Computational Systems Biology of Cancer" 11-13 rue Pierre et Marie Curie (1er étage) - 75005 Paris - France Tel: +33 (0) 1 56 24 69 84 Le 5 févr. 2013 à 14:53, Sarah Maman a écrit : > Hi Alban, Marie-Stephane and Bjoern, > > > * For IntersectBed tool, running is OK. I've just deleted ";" in xml file and > your tool runs. My cluster add one ";" so a double ";;" gives an error. > > I thing that this new eroor is due to files tested : > > Differing number of VCF fields encountered at line: 38. Exiting... > > *For RepeatMasker tool, thanks to Marie-Stephane (CONGRATULATIONS > MARIE-STEPHANE !), we need to specify that bash commands are included iwith > Python. So xml file have been modify in this way : > - Specify "perl" before RepeatMasker command. > - Specify RepeatMasker path > - Include all bash comman in a Python script : #os.system("cp $gff_file > $output_gff;") instead of cp $gff_file $output_gff; > So the code is : > > ## The command is a Cheetah template which allows some Python based syntax. > ## Lines starting hash hash are comments. Galaxy will turn newlines into > spaces > ## create temp directory > #import tempfile, os > #set $dirname = os.path.abspath(tempfile.mkdtemp()) > #set $input_filename = os.path.split(str($query))[-1] > #set $output_basename = os.path.join($dirname, $input_filename) > perl /usr/local/bioinfo/bin/RepeatMasker -parallel 8 $nolow $noint $norna > #if str($species)!="all": > $species > #end if > -dir $dirname > #if $adv_opts.adv_opts_selector=="advanced": > #if str($adv_opts.gc)!="0": > -gc $adv_opts.gc > #end if > $adv_opts.gccalc > #set $output_files_list = str($adv_opts.output_files).split(',') > #if "gff" in $output_files_list: > -gff > #end if > #if "html" in $output_files_list: > -html > #end if > $adv_opts.slow_search > $adv_opts.quick_search > $adv_opts.rush_search > $adv_opts.only_alus > $adv_opts.is_only > #else: > ## Set defaults > -gff > ## End of advanced options: > #end if > $query > /dev/null 2> /dev/null; > ## Copy the output files to galaxy > #if $adv_opts.adv_opts_selector=="advanced": > #if "summary" in $output_files_list: > ## Write out the summary file (default) > #set $summary_file = $output_basename + '.tbl' > #os.system("cp $summary_file $output_summary;") > #end if > #if "gff" in $output_files_list: > ## Write out the gff file (default) > #set $gff_file = $output_basename + '.out.gff' > #os.system("cp $gff_file $output_gff"); > #end if > #if "html" in $output_files_list: > ## Write out the html file > #set $html_file = $output_basename + '.out.html' > #os.system("cp $html_file $output_html;") > #end if > #else: > > ## Write out the summary file (default) > #set $summary_file = $output_basename + '.tbl' > #os.system("cp $summary_file $output_summary;") > ## Write out the gff file (default) > #set $gff_file = $output_basename + '.out.gff' > #os.system("cp $gff_file $output_gff;") > ## End of advanced options: > #end if > ## Write out mask sequence file > #set $mask_sequence_file = $output_basename + '.masked' > #os.system("cp $mask_sequence_file $output_mask;") >
Re: [galaxy-user] Error / Nebula and RepeatMasker
Hi Alban, Marie-Stephane and Bjoern, * _*For IntersectBed tool,*_ running is OK. I've just deleted ";" in xml file and your tool runs. My cluster add one ";" so a double ";;" gives an error. I thing that this new eroor is due to files tested : Differing number of VCF fields encountered at line: 38. Exiting... *_*For RepeatMasker tool*_, thanks to Marie-Stephane (CONGRATULATIONS MARIE-STEPHANE !), we need to specify that bash commands are included iwith Python. So xml file have been modify in this way : - Specify "perl" before RepeatMasker command. - Specify RepeatMasker path - Include all bash comman in a Python script : #os.system("cp $gff_file $output_gff;") instead of cp $gff_file $output_gff; So the code is : ## The command is a Cheetah template which allows some Python based syntax. ## Lines starting hash hash are comments. Galaxy will turn newlines into spaces ## create temp directory #import tempfile, os #set $dirname = os.path.abspath(tempfile.mkdtemp()) #set $input_filename = os.path.split(str($query))[-1] #set $output_basename = os.path.join($dirname, $input_filename) //*perl /usr/local/bioinfo/bin/RepeatMasker*// -parallel 8 $nolow $noint $norna #if str($species)!="all": $species #end if -dir $dirname #if $adv_opts.adv_opts_selector=="advanced": #if str($adv_opts.gc)!="0": -gc $adv_opts.gc #end if $adv_opts.gccalc #set $output_files_list = str($adv_opts.output_files).split(',') #if "gff" in $output_files_list: -gff #end if #if "html" in $output_files_list: -html #end if $adv_opts.slow_search $adv_opts.quick_search $adv_opts.rush_search $adv_opts.only_alus $adv_opts.is_only #else: ## Set defaults -gff ## End of advanced options: #end if $query ///dev/null 2> /dev/null; / /## Copy the output files to galaxy #if $adv_opts.adv_opts_selector=="advanced": #if "summary" in $output_files_list: ## Write out the summary file (default) #set $summary_file = $output_basename + '.tbl' #os.system("cp $summary_file $output_summary;") #end if #if "gff" in $output_files_list: ## Write out the gff file (default) #set $gff_file = $output_basename + '.out.gff' #os.system("cp $gff_file $output_gff"); #end if #if "html" in $output_files_list: ## Write out the html file #set $html_file = $output_basename + '.out.html' #os.system("cp $html_file $output_html;") #end if #else: ## Write out the summary file (default) #set $summary_file = $output_basename + '.tbl' *#os.system("cp $summary_file $output_summary;") * ## Write out the gff file (default) #set $gff_file = $output_basename + '.out.gff' #os.system("cp $gff_file $output_gff;") ## End of advanced options: #end if ## Write out mask sequence file #set $mask_sequence_file = $output_basename + '.masked' #os.system("cp $mask_sequence_file $output_mask;") ## Write out standard file (default) ## The default '.out' file from RepeatMasker has a 3-line header and spaces rather ## than tabs. Remove the header and replace the whitespaces with tab #set $standard_file = $output_basename + '.out' #os.system("tail -n +4 $standard_file | tr -s ' ' '\t' > $output_std;") ## Delete all temporary files #os.system("rm $dirname -r;" / * _*For the tool FilterControl*_, our cluster is configured to kill jobs that use more than 4Go of memory. I don't achieve to modify qsub options in my Galaxy instance, so I'he changed this option -Xmx6g in "-Xmx4g". Maybe some treatment won't run by lake of memory .. If you have any idea on how to add option on Galaxy qsub, could you please help me ? I would like to add these options to qsub : qsub -l mem=6G -l h_vmem=8G * _*ChIPMunk*_ :Sorry, but all ChIPMunk files (xml, pl; sh) didn't have execution rights... So I just do "chmod a+x" on these files and ChIPMunk tool is OK in my Galaxy instance Thanks a lot for all your explanations, Alban, Marie-Stephane and Bjoern . Thanks in advance for qsub and IntersectBed tool (test file), Sarah alermine a écrit : Hi Sarah, I'll try to debug, point per point: - Here is the error I get when running the tool FilterControl *** glibc detected *** java: double free or corruption (!prev): 0x7fe56800ecd0 *** I think here you have a misconfiguration of the memory of your java install (according to the need of the tool) If you look at the FilterControlPeaks.sh file, the java is called with the option -Xmx6g. So your java install have to be allowed to use 6G as memory (by default it's 1024M) - Here is the error I get when running the tool IntersectBed /work/galaxy/database/pbs/galaxy_4129.sh: line 13: Erreur de syntaxe près du symbole inattendu « ;; » /work/galaxy/database/pbs/galaxy_4129.sh: line 13: `bedtools intersect -f 0.05 I don't understand this one, The intersectBed tool is only composed of a xml which simply call bedtools.. Check the command by typing 'bedtools inters
[galaxy-user] Problems with large gzipped fasta files
Hi, I am having a lot of difficulty uploading some large gzipped fastqs (~ 10GB) to the public server. I have tried both ftp and "pulling" by http URL. The upload succeeds, however I get an error as it tries to gunzip it.I have tried more than 10 times now and succeeded once. These files are correct and complete, and gunzip properly locally. The error shown is usually this empty format: txt, database: ? Problem decompressing gzipped data However on 2 occasions (both ftp uploads) I got the traceback below. Am I missing some obvious trick? I searched the archives and see references to problems with large gzipped files but no solutions. Thanks Jim Traceback (most recent call last): File "/galaxy/home/g2main/galaxy_main/tools/data_source/upload.py", line 384, in __main__() File "/galaxy/home/g2main/galaxy_main/tools/data_source/upload.py", line 373, in __main__ add_file( dataset, registry, json_file, output_path ) File "/galaxy/home/g2main/galaxy_main/tools/data_source/upload.py", line 270, in add_file line_count, converted_path = sniff.convert_newlines( dataset.path, in_place=in_place ) File "/galaxy/home/g2main/galaxy_main/lib/galaxy/datatypes/sniff.py", line 106, in convert_newlines shutil.move( temp_name, fname ) File "/usr/lib/python2.7/shutil.py", line 299, in move copy2(src, real_dst) File "/usr/lib/python2.7/shutil.py", line 128, in copy2 copyfile(src, dst) File "/usr/lib/python2.7/shutil.py", line 84, in copyfile copyfileobj(fsrc, fdst) File "/usr/lib/python2.7/shutil.py", line 49, in copyfileobj buf = fsrc.read(length) IOError: [Errno 5] Input/output error ___ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/