Re: [galaxy-dev] Creating a galaxy tool in R - You must not use 8-bit bytestrings
Hi Dan I added this to my Rscript_wrapper.sh script and all is well. I am happy to hear that and don't worry about answering my own question. There are always other people on the list which will learn from the comments and/or find the solution later in the mail archive ;) regards, Hans ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Creating a galaxy tool in R - You must not use 8-bit bytestrings
Hi Dan There seems to be several issues connected with each other or not, I don't know. - Let's start with the 'curiosity': Do you get this problem with any tool? Does it also happen with a 'simple' (ie not using R) tool you add? - When you execute your R script on the command line, are you running it as the same user as Galaxy executes the job? - we execute R scripts the following way: commandRscript --vanilla /full path to script/script.R -n $name -i $infile -o $outfile/command and in the R script we use the library 'getopt' I hope this get's you a little bit further. Regards, Hans On 04/24/2012 11:56 PM, Dan Tenenbaum wrote: Apologies for originally posting this to galaxy-user; now I realize it belongs here. Hello, I'm a galaxy newbie and running into several issues trying to adapt an R script to be a galaxy tool. I'm looking at the XY plotting tool for guidance (tools/plot/xy_plot.xml), but I decided not to embed my script in XML, but instead have it in a separate script file, that way I can still run it from the command line and make sure it works as I make incremental changes. (So my script starts with args- commandArgs(TRUE)). Also, if it doesn't work, this suggests to me that there is a problem with my galaxy configuration. First, I tried using the r_wrapper.sh script that comes with the XY plotting tool, but it threw away my arguments: An error occurred running this job: ARGUMENT '/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_4.dat' __ignored__ ARGUMENT '/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_3.dat' __ignored__ ARGUMENT 'Fly' __ignored__ ARGUMENT 'Tagwise' __ignored__ etc. So then I tried just switching to Rscript: command interpreter=bashRscript RNASeq.R $countsTsv $designTsv $organism $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2/command (My script produces as output a csv file and a pdf file. The final two arguments I'm passing are the names of those files.) But then I get an error that Rscript can't be found. So I wrote a little wrapper script, Rscript_wrapper.sh: #!/bin/sh Rscript $* And called that: command interpreter=bashRscript_wrapper.sh RNASeq.R $countsTsv $designTsv $organism $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2/command Then I got an error that RNASeq.R could not be found. So then I added the absolute path to my R script to thecommand tag. This seemed to work (that is, it got me further, to the next error), but I'm not sure why I had to do this; in all the other tools I'm looking at, the directory to the script to run does not have to be specified; I assumed that the command would run in the appropriate directory. So now I've specified the full path to my R script: command interpreter=bashRscript_wrapper.sh /Users/dtenenba/dev/galaxy-dist/tools/bioc/RNASeq.R $countsTsv $designTsv $organism $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2/command And I get the following long error, which includes all of the output of my R script: Traceback (most recent call last): File /Users/dtenenba/dev/galaxy-dist/lib/galaxy/jobs/runners/local.py, line 133, in run_job job_wrapper.finish( stdout, stderr ) File /Users/dtenenba/dev/galaxy-dist/lib/galaxy/jobs/__init__.py, line 725, in finish self.sa_session.flush() File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/scoping.py, line 127, in do return getattr(self.registry(), name)(*args, **kwargs) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/session.py, line 1356, in flush self._flush(objects) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/session.py, line 1434, in _flush flush_context.execute() File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py, line 261, in execute UOWExecutor().execute(self, tasks) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py, line 753, in execute self.execute_save_steps(trans, task) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py, line 768, in execute_save_steps self.save_objects(trans, task) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py, line 759, in save_objects task.mapper._save_obj(task.polymorphic_tosave_objects, trans) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/mapper.py, line 1413, in _save_obj c = connection.execute(statement.values(value_params), params) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py, line 824, in
Re: [galaxy-dev] Creating a galaxy tool in R - You must not use 8-bit bytestrings
Hello Hans-Rudolph, On Wed, Apr 25, 2012 at 1:37 AM, Hans-Rudolf Hotz h...@fmi.ch wrote: Hi Dan There seems to be several issues connected with each other or not, I don't know. - Let's start with the 'curiosity': Do you get this problem with any tool? No. And, I should be a bit more specific about what happens. When I click execute I see this: The following job has been successfully added to the queue: 36: RNASeq tool on data 3 and data 4 37: RNASeq tool on data 3 and data 4 Does it also happen with a 'simple' (ie not using R) tool you add? No. - When you execute your R script on the command line, are you running it as the same user as Galaxy executes the job? Yes. - we execute R scripts the following way: commandRscript --vanilla /full path to script/script.R -n $name -i $infile -o $outfile/command and in the R script we use the library 'getopt' Good idea. Taking a closer, look, I found my R script was not expecting arguments in the same order that the XML script provided them in. I fixed that and now the R script runs without warning or error, but I still get the 8-bit bytestrings error, and the error Unable to finish job. However, by printing out the arguments within R, I can see the name of the output files that are supposed to be generated, and it looks like they are generated. So why doesn't the job finish? I think the 8 bit bytestrings error might be because R outputs backticks (`) and they show up escaped in the galaxy error like this: \xe2\x80\x98. Is there some way that I can set my LOCALE so that R uses an encoding that galaxy is happier with? Or perhaps I can get R to stop outputting those backticks somehow. I hope this get's you a little bit further. Thanks much, Dan Regards, Hans On 04/24/2012 11:56 PM, Dan Tenenbaum wrote: Apologies for originally posting this to galaxy-user; now I realize it belongs here. Hello, I'm a galaxy newbie and running into several issues trying to adapt an R script to be a galaxy tool. I'm looking at the XY plotting tool for guidance (tools/plot/xy_plot.xml), but I decided not to embed my script in XML, but instead have it in a separate script file, that way I can still run it from the command line and make sure it works as I make incremental changes. (So my script starts with args- commandArgs(TRUE)). Also, if it doesn't work, this suggests to me that there is a problem with my galaxy configuration. First, I tried using the r_wrapper.sh script that comes with the XY plotting tool, but it threw away my arguments: An error occurred running this job: ARGUMENT '/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_4.dat' __ignored__ ARGUMENT '/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_3.dat' __ignored__ ARGUMENT 'Fly' __ignored__ ARGUMENT 'Tagwise' __ignored__ etc. So then I tried just switching to Rscript: command interpreter=bashRscript RNASeq.R $countsTsv $designTsv $organism $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2/command (My script produces as output a csv file and a pdf file. The final two arguments I'm passing are the names of those files.) But then I get an error that Rscript can't be found. So I wrote a little wrapper script, Rscript_wrapper.sh: #!/bin/sh Rscript $* And called that: command interpreter=bashRscript_wrapper.sh RNASeq.R $countsTsv $designTsv $organism $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2/command Then I got an error that RNASeq.R could not be found. So then I added the absolute path to my R script to thecommand tag. This seemed to work (that is, it got me further, to the next error), but I'm not sure why I had to do this; in all the other tools I'm looking at, the directory to the script to run does not have to be specified; I assumed that the command would run in the appropriate directory. So now I've specified the full path to my R script: command interpreter=bashRscript_wrapper.sh /Users/dtenenba/dev/galaxy-dist/tools/bioc/RNASeq.R $countsTsv $designTsv $organism $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2/command And I get the following long error, which includes all of the output of my R script: Traceback (most recent call last): File /Users/dtenenba/dev/galaxy-dist/lib/galaxy/jobs/runners/local.py, line 133, in run_job job_wrapper.finish( stdout, stderr ) File /Users/dtenenba/dev/galaxy-dist/lib/galaxy/jobs/__init__.py, line 725, in finish self.sa_session.flush() File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/scoping.py, line 127, in do return getattr(self.registry(), name)(*args, **kwargs) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/session.py, line 1356, in flush self._flush(objects) File
Re: [galaxy-dev] Creating a galaxy tool in R - You must not use 8-bit bytestrings
On Wed, Apr 25, 2012 at 9:53 AM, Dan Tenenbaum dtene...@fhcrc.org wrote: Hello Hans-Rudolph, On Wed, Apr 25, 2012 at 1:37 AM, Hans-Rudolf Hotz h...@fmi.ch wrote: Hi Dan There seems to be several issues connected with each other or not, I don't know. - Let's start with the 'curiosity': Do you get this problem with any tool? No. And, I should be a bit more specific about what happens. When I click execute I see this: The following job has been successfully added to the queue: 36: RNASeq tool on data 3 and data 4 37: RNASeq tool on data 3 and data 4 Does it also happen with a 'simple' (ie not using R) tool you add? No. - When you execute your R script on the command line, are you running it as the same user as Galaxy executes the job? Yes. - we execute R scripts the following way: commandRscript --vanilla /full path to script/script.R -n $name -i $infile -o $outfile/command and in the R script we use the library 'getopt' Good idea. Taking a closer, look, I found my R script was not expecting arguments in the same order that the XML script provided them in. I fixed that and now the R script runs without warning or error, but I still get the 8-bit bytestrings error, and the error Unable to finish job. However, by printing out the arguments within R, I can see the name of the output files that are supposed to be generated, and it looks like they are generated. So why doesn't the job finish? I think the 8 bit bytestrings error might be because R outputs backticks (`) and they show up escaped in the galaxy error like this: \xe2\x80\x98. Is there some way that I can set my LOCALE so that R uses an encoding that galaxy is happier with? Or perhaps I can get R to stop outputting those backticks somehow. Answering my own question here (but following up with more questions). R was ouputting not backticks but curly quotes, and I turned these off by putting: options(useFancyQuotes = FALSE) in my R script. However, my R script is still failing: Dataset generation errors Dataset 46: RNASeq tool on data 3 and data 4 Tool execution generated the following error message: BiocInstaller version 1.5.7, ?biocLite for help Loading required package: org.Dm.eg.db Loading required package: AnnotationDbi Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following object(s) are masked from 'package:stats': xtabs The following object(s) are masked from 'package:base': anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation(Biobase)', and for packages 'citation(pkgname)'. Loading required package: DBI Calculating library sizes from column totals. Running estimateCommonDisp() on DGEList object before proceeding with estimateTagwiseDisp(). The tool produced the following additional output: pdf_outfile= /Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_46.dat csv_outfile= /Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_47.dat pdf 2 The thing is, none of this is an error. This is the exact output the script is supposed to produce, and it prints out the filenames of the two output files it creates, and I can verify that those exist and have the correct contents. So why does it say there is an error? Could it be the return code? I added q(save=no, status=0) to the end of the R script to ensure that the script exits with status code 0, but that did not change the error I got. So I am not sure how to solve this. The script works exactly as it's supposed to and creates the desired output, but tells me there is an error, and gives no way through the web interface to retrive the created output file. Also, I found out why two jobs were started each time I click Execute. It's because I had two files in my outputs section: outputs data format=pdf name=out_file1 / data format=csv name=out_file2 / /outputs Because my script generates two output files, a csv and a PDF. I took out the first data line above, and stopped passing the corresponding argument to my script, and the script no longer creates the PDF. Now the job only runs once each time I click Execute. But I would like the script to once again produce both output files. So, perhaps it was doing the right thing all along...showing one dataset in the history panel for each output file produced. Looking through the galaxy distribution, I see at least one other tool that does the same thing. So, sorry for the noise on this. In any case, I still get the same error described above even when only one output file is produced. Thanks, Dan I hope this get's you a
Re: [galaxy-dev] Creating a galaxy tool in R - You must not use 8-bit bytestrings
On Wed, Apr 25, 2012 at 11:48 AM, Dan Tenenbaum dtene...@fhcrc.org wrote: On Wed, Apr 25, 2012 at 9:53 AM, Dan Tenenbaum dtene...@fhcrc.org wrote: Hello Hans-Rudolph, On Wed, Apr 25, 2012 at 1:37 AM, Hans-Rudolf Hotz h...@fmi.ch wrote: Hi Dan There seems to be several issues connected with each other or not, I don't know. - Let's start with the 'curiosity': Do you get this problem with any tool? No. And, I should be a bit more specific about what happens. When I click execute I see this: The following job has been successfully added to the queue: 36: RNASeq tool on data 3 and data 4 37: RNASeq tool on data 3 and data 4 Does it also happen with a 'simple' (ie not using R) tool you add? No. - When you execute your R script on the command line, are you running it as the same user as Galaxy executes the job? Yes. - we execute R scripts the following way: commandRscript --vanilla /full path to script/script.R -n $name -i $infile -o $outfile/command and in the R script we use the library 'getopt' Good idea. Taking a closer, look, I found my R script was not expecting arguments in the same order that the XML script provided them in. I fixed that and now the R script runs without warning or error, but I still get the 8-bit bytestrings error, and the error Unable to finish job. However, by printing out the arguments within R, I can see the name of the output files that are supposed to be generated, and it looks like they are generated. So why doesn't the job finish? Sorry for the noise of answering my own question, which I find has been asked and answered before, but perhaps this will help someone. Here's the answer: http://lists.bx.psu.edu/pipermail/galaxy-dev/2010-August/003161.html Basically, if any output goes to STDERR, galaxy considers that the job failed. The link above suggests to do this: jobcommand ARGS 2/dev/null But I changed that to: jobcommand ARGS 21 that way, STDERR is redirected to STDOUT and any important diagnostic information originally printed to STDERR is not lost. I added this to my Rscript_wrapper.sh script and all is well. Thanks, Dan I think the 8 bit bytestrings error might be because R outputs backticks (`) and they show up escaped in the galaxy error like this: \xe2\x80\x98. Is there some way that I can set my LOCALE so that R uses an encoding that galaxy is happier with? Or perhaps I can get R to stop outputting those backticks somehow. Answering my own question here (but following up with more questions). R was ouputting not backticks but curly quotes, and I turned these off by putting: options(useFancyQuotes = FALSE) in my R script. However, my R script is still failing: Dataset generation errors Dataset 46: RNASeq tool on data 3 and data 4 Tool execution generated the following error message: BiocInstaller version 1.5.7, ?biocLite for help Loading required package: org.Dm.eg.db Loading required package: AnnotationDbi Loading required package: BiocGenerics Attaching package: 'BiocGenerics' The following object(s) are masked from 'package:stats': xtabs The following object(s) are masked from 'package:base': anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find, get, intersect, lapply, Map, mapply, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int, rownames, sapply, setdiff, table, tapply, union, unique Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation(Biobase)', and for packages 'citation(pkgname)'. Loading required package: DBI Calculating library sizes from column totals. Running estimateCommonDisp() on DGEList object before proceeding with estimateTagwiseDisp(). The tool produced the following additional output: pdf_outfile= /Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_46.dat csv_outfile= /Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_47.dat pdf 2 The thing is, none of this is an error. This is the exact output the script is supposed to produce, and it prints out the filenames of the two output files it creates, and I can verify that those exist and have the correct contents. So why does it say there is an error? Could it be the return code? I added q(save=no, status=0) to the end of the R script to ensure that the script exits with status code 0, but that did not change the error I got. So I am not sure how to solve this. The script works exactly as it's supposed to and creates the desired output, but tells me there is an error, and gives no way through the web interface to retrive the created output file. Also, I found out why two jobs were started each time I click Execute. It's because I had two files in my outputs section: outputs data format=pdf name=out_file1 / data format=csv name=out_file2 /
[galaxy-dev] Creating a galaxy tool in R - You must not use 8-bit bytestrings
Apologies for originally posting this to galaxy-user; now I realize it belongs here. Hello, I'm a galaxy newbie and running into several issues trying to adapt an R script to be a galaxy tool. I'm looking at the XY plotting tool for guidance (tools/plot/xy_plot.xml), but I decided not to embed my script in XML, but instead have it in a separate script file, that way I can still run it from the command line and make sure it works as I make incremental changes. (So my script starts with args - commandArgs(TRUE)). Also, if it doesn't work, this suggests to me that there is a problem with my galaxy configuration. First, I tried using the r_wrapper.sh script that comes with the XY plotting tool, but it threw away my arguments: An error occurred running this job: ARGUMENT '/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_4.dat' __ignored__ ARGUMENT '/Users/dtenenba/dev/galaxy-dist/database/files/000/dataset_3.dat' __ignored__ ARGUMENT 'Fly' __ignored__ ARGUMENT 'Tagwise' __ignored__ etc. So then I tried just switching to Rscript: command interpreter=bashRscript RNASeq.R $countsTsv $designTsv $organism $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2/command (My script produces as output a csv file and a pdf file. The final two arguments I'm passing are the names of those files.) But then I get an error that Rscript can't be found. So I wrote a little wrapper script, Rscript_wrapper.sh: #!/bin/sh Rscript $* And called that: command interpreter=bashRscript_wrapper.sh RNASeq.R $countsTsv $designTsv $organism $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2/command Then I got an error that RNASeq.R could not be found. So then I added the absolute path to my R script to the command tag. This seemed to work (that is, it got me further, to the next error), but I'm not sure why I had to do this; in all the other tools I'm looking at, the directory to the script to run does not have to be specified; I assumed that the command would run in the appropriate directory. So now I've specified the full path to my R script: command interpreter=bashRscript_wrapper.sh /Users/dtenenba/dev/galaxy-dist/tools/bioc/RNASeq.R $countsTsv $designTsv $organism $dispersion $minimumCountsPerMillion $minimumSamplesPerTranscript $out_file1 $out_file2/command And I get the following long error, which includes all of the output of my R script: Traceback (most recent call last): File /Users/dtenenba/dev/galaxy-dist/lib/galaxy/jobs/runners/local.py, line 133, in run_job job_wrapper.finish( stdout, stderr ) File /Users/dtenenba/dev/galaxy-dist/lib/galaxy/jobs/__init__.py, line 725, in finish self.sa_session.flush() File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/scoping.py, line 127, in do return getattr(self.registry(), name)(*args, **kwargs) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/session.py, line 1356, in flush self._flush(objects) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/session.py, line 1434, in _flush flush_context.execute() File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py, line 261, in execute UOWExecutor().execute(self, tasks) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py, line 753, in execute self.execute_save_steps(trans, task) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py, line 768, in execute_save_steps self.save_objects(trans, task) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/unitofwork.py, line 759, in save_objects task.mapper._save_obj(task.polymorphic_tosave_objects, trans) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/orm/mapper.py, line 1413, in _save_obj c = connection.execute(statement.values(value_params), params) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py, line 824, in execute return Connection.executors[c](self, object, multiparams, params) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py, line 874, in _execute_clauseelement return self.__execute_context(context) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py, line 896, in __execute_context self._cursor_execute(context.cursor, context.statement, context.parameters[0], context=context) File /Users/dtenenba/dev/galaxy-dist/eggs/SQLAlchemy-0.5.6_dev_r6498-py2.7.egg/sqlalchemy/engine/base.py, line 950, in _cursor_execute self._handle_dbapi_exception(e, statement, parameters,