Re: [galaxy-dev] data_columns

2014-11-07 Thread Jorrit Boekel
No, I am not subclassing from Tabular. But when I do, it suddenly works. Thanks 
loads Dan!

cheers,
— 
Jorrit Boekel
Proteomics systems developer
BILS / Lehtiö lab
Scilifelab Stockholm, Sweden



On 07 Nov 2014, at 19:07, Daniel Blankenberg  wrote:

> Hi Jorrit,
> 
> Are you subclassing your tsv datatypes from tabular?
> 
> If you can post your tool xml and datatype_conf.xml additions then we should 
> be to provide more assistance.
> 
> 
> Thanks for using Galaxy,
> 
> Dan
> 
> 
> On Nov 7, 2014, at 12:27 PM, Jorrit Boekel  
> wrote:
> 
>> Dear list,
>> 
>> I tried to make a tool that takes tsv input and have some data_columns 
>> selection parameters in the xml definition. It seems to work, but I get 
>> ‘invalid option was selected’ in the browser interface for each of the 
>> data_columns. No logging errors or stacktraces. 
>> 
>> Digging a bit, I found out that the legal_values of the ColumnParameter (in 
>> lib/galaxy/tools/parameters/basic.py ) is empty. It is returned empty by 
>> get_column_list when this does 
>> if not dataset.metadata.columns: on line 1184 of the above mentioned file.
>> 
>> Now I’m a bit at a loss, how do I set these columns, or where are those 
>> columns set on the metadata? Is it stored, generated from the input file? My 
>> files are not official formats, just tsv subclasses.
>> 
>> 
>> I’m on changeset 14567:007f6a80629a in galaxy-dist. 
>> 
>> cheers,
>> — 
>> Jorrit Boekel
>> Proteomics systems developer
>> BILS / Lehtiö lab
>> Scilifelab Stockholm, Sweden
>> 
>> 
>> 
>> 
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>> http://lists.bx.psu.edu/
>> 
>> To search Galaxy mailing lists use the unified search at:
>> http://galaxyproject.org/search/mailinglists/
>> 
> 


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] data_columns

2014-11-07 Thread Jorrit Boekel
Dear list,

I tried to make a tool that takes tsv input and have some data_columns 
selection parameters in the xml definition. It seems to work, but I get 
‘invalid option was selected’ in the browser interface for each of the 
data_columns. No logging errors or stacktraces. 

Digging a bit, I found out that the legal_values of the ColumnParameter (in 
lib/galaxy/tools/parameters/basic.py ) is empty. It is returned empty by 
get_column_list when this does 
if not dataset.metadata.columns: on line 1184 of the above mentioned file.

Now I’m a bit at a loss, how do I set these columns, or where are those columns 
set on the metadata? Is it stored, generated from the input file? My files are 
not official formats, just tsv subclasses.


I’m on changeset 14567:007f6a80629a in galaxy-dist. 
 
cheers,
— 
Jorrit Boekel
Proteomics systems developer
BILS / Lehtiö lab
Scilifelab Stockholm, Sweden




___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] tool dependencies fail

2014-05-31 Thread Jorrit Boekel
Hi Isabelle,

Not sure about specifying the absolute path for the tool dependency dir, but I 
always use it. Also, my env.sh neither has a #!/bin/bash shebang line, nor 
quote marks around the tool path.
If you are in the hhsuite/default/ dir and source env.sh, can you then run 
hhblits yourself (as galaxy user, or whoever owns the tools)?

cheers,
— 
Jorrit Boekel
Proteomics systems developer
BILS / Lehtiö lab
Scilifelab Stockholm, Sweden



On 31 May 2014, at 01:53, Isabelle Phan  wrote:

> Hello,
> 
> I'm following "Managed Tool Dependencies" to the letter and still can't
> get Galaxy to find the executable. Is this page
> https://wiki.galaxyproject.org/Admin/Config/ToolDependencies up to date?
> If yes, what am I doing wrong?
> 
> Any help greatly appreciated!
> 
> 
> # universe_wsgi.ini
> # I also tried the absolute path, it made no difference
> tool_dependency_dir = tool_dependencies
> 
> # tool_dependencies is set like instructed:
> 
> $ ls tool_dependencies/hhsuite/
> 2.0.16/  default@
> $ ls tool_dependencies/hhsuite/2.0.16/bin
> 
> ffindex_build* ffindex_get*   hhalign*   hhblits*   hhconsensus*
> hhfilter*  hhmake*hhsearch*
> 
> 
> My tool:
> 
>   hhsuite
>   Searches single fasta iteratively against hmm db to build a
> MSA
>   hhblits -i $input_file -d $database -oa3m $outfile
> etc...
> 
> 
> Galaxy throws Error: hhblits not found.
> 
> 
> I've tried setting the PATH in the tool_dependencies/hhsuite/2.0.16/env.sh:
> #!/bin/bash
> # configure PATH to hhsuite binaries
> PATH="/opt/galaxy-dist/tool_dependencies/hhsuite/default/bin:$PATH"
> export PATH
> 
> 
> 
> still getting hhblits not found :-(
> What am I doing wrong?
> 
> Isabelle
> 
> 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] stdout in history

2014-05-31 Thread Jorrit Boekel
Hi Neil, 

I had some problems with this before too, so I created an tool config option a 
long time ago in my own galaxy fork. Never really bothered with submitting it, 
but it lives in these commits:

https://bitbucket.org/glormph/adapt/commits/4ba256a9b8782642429ecb5472a456584bed86d5
https://bitbucket.org/glormph/adapt/commits/86ec8b1cfeb737dfa31a1b6faf18d918bea7b4c3

I started implementing it in datatypes (first commit), but moved it to tool 
config instead. If this is really interesting I guess I can submit a pull 
request, but it felt a bit hacky at the time.

cheers,
— 
Jorrit Boekel
Proteomics systems developer
BILS / Lehtiö lab
Scilifelab Stockholm, Sweden



On 30 May 2014, at 14:35, neil.burd...@csiro.au wrote:

> Hi,
>  It seems that the first 4/5 lines that are printed in the source code 
> tools appear in the history (when expended) and in stdout link. Is there 
> anyway to stop "print" statements appearing in the history panel?
> 
> Thanks
> Neil
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] galaxy on ubuntu 14.04: hangs on metadata cleanup

2014-05-08 Thread Jorrit Boekel
It seems to be an NFS related issue. When I run a separate VM as an NFS server 
that hosts the galaxy data (files, job workdir, tmp, ftp), problems are gone. 
There’s probably an explanation for that, but I’m going to leave it at this.

cheers,
— 
Jorrit Boekel
Proteomics systems developer
BILS / Lehtiö lab
Scilifelab Stockholm, Sweden



On 07 May 2014, at 16:03, Jorrit Boekel  wrote:

> I should probably mention that the data filesystem is NFS, exported by the 
> master from /mnt/galaxy/data and mounted on the worker. No separate 
> fileserver. Master is the one that hangs.
> 
> 
> cheers,
> — 
> Jorrit Boekel
> Proteomics systems developer
> BILS / Lehtiö lab
> Scilifelab Stockholm, Sweden
> 
> 
> 
> On 07 May 2014, at 15:57, Jorrit Boekel  wrote:
> 
>> Dear all,
>> 
>> Has anyone tried running Galaxy on Ubuntu 14.04?
>> 
>> I’m trying a test setup on two virtual machines (worker+master) with a SLURM 
>> queue. Getting in strange problems when jobs finish, the master hangs, 
>> completely unresponsive with CPU at 100% (as reported by virt-manager, not 
>> by top). Only drmaa jobs seem to be affected. After hanging, a reboot shows 
>> the job is finished (and green in history).
>> 
>> It took me some debugging to figure out where things go wrong, but it seems 
>> it goes wrong when os.remove is called in lib/galaxy/datatypes/metadata.py 
>> in method cleanup_external_metadata. I can reproduce the problem by calling 
>> os.remove(metadatafile) by hand (in an interactive python shell) when using 
>> pdb to create a breakpoint just before the call. If I comment out the 
>> os.remove it runs on until it hits another delete call in 
>> lib/galaxy/jobs/__init__.py:
>> self.app.object_store.delete(self.get_job(), base_dir='job_work', 
>> entire_dir=True, dir_only=True, extra_dir=str(self.job_id))
>> It’s in the JobWrapper class in the cleanup() method. I should mention here 
>> that my galaxy version is a bit old since I’m running my own fork with local 
>> modifications on datatypes.
>> 
>> This object_store.delete also leads to a shutil.rmtree and os.remove 
>> function. So, remove calls to the filesystem seem to hang the whole thing, 
>> but only at this point in time. Rebooting and removing by hand is no 
>> problem, pdb-stepping also sometimes fixes it (but if I just press continue 
>> it hangs). I don’t know where to go from here with debugging, but has anyone 
>> seen anything similar? Right now it feels like it may be caused by timing 
>> rather than actual code problems.
>> 
>> cheers,
>> — 
>> Jorrit Boekel
>> Proteomics systems developer
>> BILS / Lehtiö lab
>> Scilifelab Stockholm, Sweden
>> 
>> 
>> 
> 


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] galaxy on ubuntu 14.04: hangs on metadata cleanup

2014-05-07 Thread Jorrit Boekel
I should probably mention that the data filesystem is NFS, exported by the 
master from /mnt/galaxy/data and mounted on the worker. No separate fileserver. 
Master is the one that hangs.


cheers,
— 
Jorrit Boekel
Proteomics systems developer
BILS / Lehtiö lab
Scilifelab Stockholm, Sweden



On 07 May 2014, at 15:57, Jorrit Boekel  wrote:

> Dear all,
> 
> Has anyone tried running Galaxy on Ubuntu 14.04?
> 
> I’m trying a test setup on two virtual machines (worker+master) with a SLURM 
> queue. Getting in strange problems when jobs finish, the master hangs, 
> completely unresponsive with CPU at 100% (as reported by virt-manager, not by 
> top). Only drmaa jobs seem to be affected. After hanging, a reboot shows the 
> job is finished (and green in history).
> 
> It took me some debugging to figure out where things go wrong, but it seems 
> it goes wrong when os.remove is called in lib/galaxy/datatypes/metadata.py in 
> method cleanup_external_metadata. I can reproduce the problem by calling 
> os.remove(metadatafile) by hand (in an interactive python shell) when using 
> pdb to create a breakpoint just before the call. If I comment out the 
> os.remove it runs on until it hits another delete call in 
> lib/galaxy/jobs/__init__.py:
> self.app.object_store.delete(self.get_job(), base_dir='job_work', 
> entire_dir=True, dir_only=True, extra_dir=str(self.job_id))
> It’s in the JobWrapper class in the cleanup() method. I should mention here 
> that my galaxy version is a bit old since I’m running my own fork with local 
> modifications on datatypes.
> 
> This object_store.delete also leads to a shutil.rmtree and os.remove 
> function. So, remove calls to the filesystem seem to hang the whole thing, 
> but only at this point in time. Rebooting and removing by hand is no problem, 
> pdb-stepping also sometimes fixes it (but if I just press continue it hangs). 
> I don’t know where to go from here with debugging, but has anyone seen 
> anything similar? Right now it feels like it may be caused by timing rather 
> than actual code problems.
> 
> cheers,
> — 
> Jorrit Boekel
> Proteomics systems developer
> BILS / Lehtiö lab
> Scilifelab Stockholm, Sweden
> 
> 
> 


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] galaxy on ubuntu 14.04: hangs on metadata cleanup

2014-05-07 Thread Jorrit Boekel
Dear all,

Has anyone tried running Galaxy on Ubuntu 14.04?

I’m trying a test setup on two virtual machines (worker+master) with a SLURM 
queue. Getting in strange problems when jobs finish, the master hangs, 
completely unresponsive with CPU at 100% (as reported by virt-manager, not by 
top). Only drmaa jobs seem to be affected. After hanging, a reboot shows the 
job is finished (and green in history).

It took me some debugging to figure out where things go wrong, but it seems it 
goes wrong when os.remove is called in lib/galaxy/datatypes/metadata.py in 
method cleanup_external_metadata. I can reproduce the problem by calling 
os.remove(metadatafile) by hand (in an interactive python shell) when using pdb 
to create a breakpoint just before the call. If I comment out the os.remove it 
runs on until it hits another delete call in lib/galaxy/jobs/__init__.py:
self.app.object_store.delete(self.get_job(), base_dir='job_work', 
entire_dir=True, dir_only=True, extra_dir=str(self.job_id))
It’s in the JobWrapper class in the cleanup() method. I should mention here 
that my galaxy version is a bit old since I’m running my own fork with local 
modifications on datatypes.

This object_store.delete also leads to a shutil.rmtree and os.remove function. 
So, remove calls to the filesystem seem to hang the whole thing, but only at 
this point in time. Rebooting and removing by hand is no problem, pdb-stepping 
also sometimes fixes it (but if I just press continue it hangs). I don’t know 
where to go from here with debugging, but has anyone seen anything similar? 
Right now it feels like it may be caused by timing rather than actual code 
problems.

cheers,
— 
Jorrit Boekel
Proteomics systems developer
BILS / Lehtiö lab
Scilifelab Stockholm, Sweden




___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] local_task_queue_workers

2013-11-07 Thread Jorrit Boekel

Hi list,

I would need a memory refresher about tasked jobs. When testing some 
larger analyses on a local installation, I thought the 
local_task_queue_workers setting in universe_wsgi.ini would be the 
limiting factor for how many tasks can be executed at the same time. In 
our setup, it is currently set to 2. However, 5 tasks are run 
simultaneously, leading to memory problems.


Am I overlooking something that anyone knows of?

cheers,
jorrit boekel

--
Scientific programmer
Mass spec analysis support @ BILS
Janne Lehtiö / Lukas Käll labs
SciLifeLab Stockholm

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Appending _task_%d suffix to multi files

2013-08-01 Thread Jorrit Boekel

Hi Piotr,

In our proteomics lab, a protein sample is fractionated (by e.g. pH) 
before analysis in a nr of sample fractions. The fractions are then run 
through the mass spectrometer one at a time. Each fraction yields a data 
file.


The mass spec data is then matched to peptides by searching a FASTA 
file, termed target, with protein sequences. Afterwards the matches are 
statistically scored by machine learning. To do this, the data is also 
matched with a scrambled FASTA file, termed decoy. Each fraction is 
matched to a target and decoy file, which yields two match-files per 
fraction.


The machine learning tool thus picks a target and a decoy matchfile and 
puts statistical significances on the matches. In order for this to be 
correct, it needs to pick matchfiles that correspond, ie that are 
derived from the same fraction.


In our lab, we have not yet looked at John Chilton's (I think) work with 
the m: data sets, and our parallel processing is done inside galaxy, 
using its split and merge functions to divide a job into tasks. Each 
task is sent as a separate job to sge, I think, but others may know more 
about this than I.


I really have to get back to my holiday now, cheers,
jorrit

On 08/01/2013 04:17 AM, piotr.s...@csiro.au wrote:


Hi Jorrit,

Thank you for your explanation. Would you be able to give us an 
example of what do you mean by fractions and when the task_%d are 
being used to pick files. Just want to make sure we have  good 
understanding of the problem that you solved.


Also, I vaguely remember seeing 'data parallelism" mentioned somewhere 
with relation to the m: data sets.  Do you currently support in any 
way automatic distribution of processing of such datasets to parallel 
environments (e.g. array jobs in sge or such?)


Cheers,

-Piotr

*From:*Jorrit Boekel [mailto:jorrit.boe...@scilifelab.se]
*Sent:* Wednesday, July 31, 2013 8:18 PM
*To:* Khassapov, Alex (CSIRO IM&T, Clayton)
*Cc:* p.j.a.c...@googlemail.com; jmchil...@gmail.com; 
galaxy-dev@lists.bx.psu.edu; Szul, Piotr (ICT Centre, Marsfield); 
Burdett, Neil (ICT Centre, Herston - RBWH)

*Subject:* Re: Appending _task_%d suffix to multi files

Hi Alex,

In our lab, files are often fractions of an experiments, but they are 
named by their creators in whatever way they like. I put that code in 
to standardize fraction naming, in case a tool needs input from two 
files that originate from the same fraction (but have been treated in 
different ways). In those cases, in my fork, Galaxy always picks the 
files with the same task_%d numbers.


I can't help you very much right now, as I'm currently away from work 
until October, but I hope this explains why its in there.


cheers,
jorrit

On 07/31/2013 04:15 AM, alex.khassa...@csiro.au 
<mailto:alex.khassa...@csiro.au> wrote:


Hi guys,

We've been using Galaxy for a year now, we created our own Galaxy
fork where we were making changes to adapt Galaxy to our
requirements.  As we need "multiple file dataset" - we were using
Johns' fork for that initially.

Now we are trying to use "The most updated version of the multiple
file dataset stuff" https://bitbucket.org/msiappdev/galaxy-extras/
directly as we don't want to maintain our own version.

One of the problems we have - when we upload multiple files -
their file names are changed (_task_%d suffix is added to their
names).

On our branch we simply removed the code which does it, but now we
wonder if it is possible to avoid this renaming somehow? I.e. make
it configurable?

Is it really necessary to change the file names?

-Alex

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu
<mailto:galaxy-dev-boun...@lists.bx.psu.edu>
    [mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Jorrit
Boekel
Sent: Thursday, 25 October 2012 8:35 PM
To: Peter Cock
Cc: galaxy-dev@lists.bx.psu.edu <mailto:galaxy-dev@lists.bx.psu.edu>
Subject: Re: [galaxy-dev] the multi job splitter

I keep the files matched by keeping a _task_%d suffix to their
names. So each task is matched with its correct counterpart with
the same number.

cheers,

jorrit



___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Appending _task_%d suffix to multi files

2013-07-31 Thread Jorrit Boekel

Hi Alex,

In our lab, files are often fractions of an experiments, but they are 
named by their creators in whatever way they like. I put that code in to 
standardize fraction naming, in case a tool needs input from two files 
that originate from the same fraction (but have been treated in 
different ways). In those cases, in my fork, Galaxy always picks the 
files with the same task_%d numbers.


I can't help you very much right now, as I'm currently away from work 
until October, but I hope this explains why its in there.


cheers,
jorrit

On 07/31/2013 04:15 AM, alex.khassa...@csiro.au wrote:


Hi guys,

We've been using Galaxy for a year now, we created our own Galaxy fork 
where we were making changes to adapt Galaxy to our requirements.  As 
we need "multiple file dataset" - we were using Johns' fork for that 
initially.


Now we are trying to use "The most updated version of the multiple 
file dataset stuff" https://bitbucket.org/msiappdev/galaxy-extras/ 
directly as we don't want to maintain our own version.


One of the problems we have - when we upload multiple files - their 
file names are changed (_task_%d suffix is added to their names).


On our branch we simply removed the code which does it, but now we 
wonder if it is possible to avoid this renaming somehow? I.e. make it 
configurable?


Is it really necessary to change the file names?

-Alex

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of Jorrit Boekel

Sent: Thursday, 25 October 2012 8:35 PM
To: Peter Cock
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] the multi job splitter

I keep the files matched by keeping a _task_%d suffix to their names. 
So each task is matched with its correct counterpart with the same number.


cheers,

jorrit



___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Improved Workflows through Multiple File Datasets

2012-12-06 Thread Jorrit Boekel

If one could upvote pull request, I'd be doing that now!

thanks loads John,
jorrit

On 12/05/2012 07:14 PM, John Chilton wrote:

Here is my video-based sales pitch for multiple file datasets
(specifically pull request 86 and 87 and subsequent future pull
requests).

http://www.youtube.com/watch?v=DxJzEkOasu4

I am still committed to helping develop a comprehensive tag-based
approach to enhanced workflows with normal datasets, but that seems
months off at best and this current approach has the great feature of
tracking "sample" names throughout complex workflows.

Thanks for your time,
-John


John Chilton
Senior Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
Bitbucket: https://bitbucket.org/jmchilton
Github: https://github.com/jmchilton
Web: http://jmchilton.net
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/



--
Scientific programmer
Mass spec analysis support @ BILS
Janne Lehtiö / Lukas Käll labs
SciLifeLab Stockholm

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Working with files outside galaxy without copying?

2012-11-14 Thread Jorrit Boekel
This may be implemented, I believe some people run Galaxy on the cluster,
but if not:

Galaxy normally operates on files stored in the database/files folder. If
your Galaxy instance has access to the files you need on your parallel fs,
I guess you could start by writing a tool that creates links to your files
when given the path to the input file. Another tool may then move the
dataset into the desired place while creating a link in the database folder.

Sounds like a hack to me, but may work.

cheers,
jorrit




On Wed, Nov 14, 2012 at 2:52 PM, Samuel Lampa wrote:

> Hi,
>
> We are looking into ways to integrate galaxy into the workflows of users
> at our cluster, with lots of NGS users running all and any kind of analyses
> on their typically huge amounts of data. For this we use a parallel file
> system, available on all compute nodes.
>
> This file system, although approx 1PB in size, is constantly filling up,
> and thus we are not very attracted by the idea of copying files into/out of
> galaxy for each analysis.
>
> Thus, we would be interested to know what are the options for working with
> existing/external(to galaxy) file systems?
>
> Eg. would it be possible to link files into some kind of galaxy file
> system (I'm not totally clear about how galaxy stores it's data, although I
> found out that stuff is created in database/files), from outside?
>
>  ... or is there any work going on for selecting any file system path as
> input in galaxy workflows?
>
> ... or any other hints?
>
> As said, I'm quite new to galaxy, trying to grok my head around how we can
> use it, so all hints are welcome.
>
> Best Regards
> // Samuel
>
>
> --
> Developer at SNIC-UPPMAX www.uppmax.uu.se
> Developer at Dept of Pharm Biosciences www.farmbio.uu.se
> Twitter - twitter.com/samuellampa
> Blog - saml.rilspace.org
> G+ - gplus.to/saml
>
> __**_
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>  http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] galaxy/cloudman failure handling

2012-11-12 Thread Jorrit Boekel
Dear list,

I would like to start using Amazon's spot pricing model for my
Galaxy/Cloudman instances (e.g. an on demand master node with spot instance
worker nodes). However, this means that Amazon at times of spot prices
higher than my set price limit will shutdown my instances without notion.

I was therefore looking for fault tolerance mechanisms in the galaxy
project, which I seem to remember existed. Somehow I can't find anything
about it right now though.

I've tested a little bit, and it seems that as soon as one reboots
instances or manually kills a job or task, the whole job is deleted and set
to error state. I am not that knowledgeable in cluster computing, so I
don't really know what handles what here, but this would be an ideal
starting point to learn something about SGE and queue handling. Is there
any mechanism in place that deals with node failure, network problems, etc?
If not, would it be hard to implement?

cheers,
jorrit boekel
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] the multi job splitter

2012-10-25 Thread Jorrit Boekel

On 10/25/2012 12:02 PM, Peter Cock wrote:

On Thu, Oct 25, 2012 at 10:35 AM, Jorrit Boekel
 wrote:

My question is still though if it would be bad to not raise an exception
when different filetypes are split in the same job.


In general splitting multiple files of different types seems dangerous.
That is presumably the point of the Galaxy exception.

In my example of splitting a pair of FASTQ files, they are the same
format, so Galaxy can make assumptions about how they will be
split. Note splitting into chunks based on the size on disk would
be wrong (e.g. if the forward reads in the first file are all longer
than the reverse reads in the second file).

In the case of splitting a paired FASTA + QUAL file, these are now
different file formats, so more caution is required. In fact both
can be split are the sequence/read level so can be processed.

I think the key requirement here for 'matched' splitting is each
file must have the same number of 'records' (in my example,
sequencing reads, in your case sub-files), and can be split into
a chunks of the same number of 'records'.

Perhaps different file type combinations could be special cases
in the splitter code? Then if there is no dedicated splitter for a
given combination, then that combination cannot be split.

Peter


I could imagine the multi splitter calling some sort of validating 
method of the different datatypes to gather information about the 
different datasets, e.g. split size, split numbers, matching file types, 
before executing a split. There may be more and better ways to get 
around it though. I'll settle for disabling the check now, if mainline 
galaxy would be interested we could look at it further I guess.


cheers,
jorrit


--
Scientific programmer
Mass spec analysis support @ BILS
Janne Lehtiö / Lukas Käll labs
SciLifeLab Stockholm

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] the multi job splitter

2012-10-25 Thread Jorrit Boekel

On 10/25/2012 11:25 AM, Peter Cock wrote:

On Thu, Oct 25, 2012 at 10:00 AM, Jorrit Boekel
 wrote:

On 10/25/2012 10:54 AM, Peter Cock wrote:

On Thu, Oct 25, 2012 at 9:36 AM, Jorrit Boekel
 wrote:

Dear list,

In my galaxy fork, I extensively use the job splitters. Sometimes though,
I
have to split to different file types for the same job. That raises an
exception in the lib/galaxy/jobs/splitters/multi.py module.

I have turned this behaviour off for my own work, but am now wondering
whether this is very bad practice. In other words, does somebody know why
the multi splitter does not support multiple file type splitting?

cheers,
jorrit

Could you clarify what you mean by showing some of your tool's XML
file. i.e. How is the input and its splitting defined.

Are you asking about splitting two input files at the same time?

Peter


Hi Peter,

Something like the following:

  bullseye.py $hardklor_results
$ms2_in.extension $ms2_in $output $use_nonmatch
  
  

The tool takes two datasets of different formats, which are to be split in
the same amount of files, which belong together as pairs.

So the inputs are $hardklor_results and $ms2_in (which should be split
in a paired manor) and there is one output $output to merge?

What is shared_inputs="config_file" for as that isn't in the
 tag anywhere.
Exactly. The tool uses results from a tool called hardklor to adjust the 
mass spectra contained in the ms2_input.
 And whoops, haven't taken out the now obsolete config file. thanks for 
spotting that.



Note that I have implemented an odd way of splitting, which is from a number
of files in the dataset.extra_files_path to symlinks in the task working
dirs. The number of files is thus equal to the number of parts resulting
from a split, and I have ensured that each part is paired correctly. I
assume this hasn't been necessary in the genomics field, but for proteomics,
at least in our lab, multiple-file datasets are the standard.

My fork is at http://bitbucket.org/glormph/adapt if you want to check more
closely.

I don't quite follow your example, but I can see some (simpler?) cases
for sequencing data - paired splitting of a FASTA + QUAL file, or
paired splitting of two FASTQ files (forward and reverse reads). Here
the sequence files can be broken up into any size (e.g. split in four,
or divided into batches of 1, but not split based on size on disk),
as long as the pairing is preserved.

i.e. Given FASTA and QUAL for read1, read2, , read10 then
if the FASTA file is split into read1, read2, , read1000 as the first
chunk, then the first QUAL chunk must also have the same one
thousand reads.

(In these examples the pairing should be verifiable via the read
names, so errors should be easy to catch - I don't know if you have
that luxury in your situation).

What you describe is pretty much the same as my situation, except that I 
don't have two large single input files as your fastq files, but two 
sets of the same number of files stored in the composite file 
directories (galaxy/database/files/000/dataset_x_files ). I keep the 
files matched by keeping a _task_%d suffix to their names. So each task 
is matched with its correct counterpart with the same number.


My question is still though if it would be bad to not raise an exception 
when different filetypes are split in the same job.


cheers,
jorrit

--
Scientific programmer
Mass spec analysis support @ BILS
Janne Lehtiö / Lukas Käll labs
SciLifeLab Stockholm

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] the multi job splitter

2012-10-25 Thread Jorrit Boekel

On 10/25/2012 10:54 AM, Peter Cock wrote:

On Thu, Oct 25, 2012 at 9:36 AM, Jorrit Boekel
 wrote:

Dear list,

In my galaxy fork, I extensively use the job splitters. Sometimes though, I
have to split to different file types for the same job. That raises an
exception in the lib/galaxy/jobs/splitters/multi.py module.

I have turned this behaviour off for my own work, but am now wondering
whether this is very bad practice. In other words, does somebody know why
the multi splitter does not support multiple file type splitting?

cheers,
jorrit

Could you clarify what you mean by showing some of your tool's XML
file. i.e. How is the input and its splitting defined.

Are you asking about splitting two input files at the same time?

Peter


Hi Peter,

Something like the following:

 bullseye.py $hardklor_results 
$ms2_in.extension $ms2_in $output $use_nonmatch
 shared_inputs="config_file" split_mode="from_composite" 
merge_outputs="output"/>

 

The tool takes two datasets of different formats, which are to be split 
in the same amount of files, which belong together as pairs.


Note that I have implemented an odd way of splitting, which is from a 
number of files in the dataset.extra_files_path to symlinks in the task 
working dirs. The number of files is thus equal to the number of parts 
resulting from a split, and I have ensured that each part is paired 
correctly. I assume this hasn't been necessary in the genomics field, 
but for proteomics, at least in our lab, multiple-file datasets are the 
standard.


My fork is at http://bitbucket.org/glormph/adapt if you want to check 
more closely.


cheers,
jorrit

--
Scientific programmer
Mass spec analysis support @ BILS
Janne Lehtiö / Lukas Käll labs
SciLifeLab Stockholm

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] the multi job splitter

2012-10-25 Thread Jorrit Boekel

Dear list,

In my galaxy fork, I extensively use the job splitters. Sometimes 
though, I have to split to different file types for the same job. That 
raises an exception in the lib/galaxy/jobs/splitters/multi.py module.


I have turned this behaviour off for my own work, but am now wondering 
whether this is very bad practice. In other words, does somebody know 
why the multi splitter does not support multiple file type splitting?


cheers,
jorrit

--
Scientific programmer
Mass spec analysis support @ BILS
Janne Lehtiö / Lukas Käll labs
SciLifeLab Stockholm

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] windows nodes on cloudman/ec2

2012-10-19 Thread Jorrit Boekel

Dear list,

Our analysis pipeline is normally fed with proprietary raw file types 
from mass spectrometry instruments. These can currently only be read on 
windows systems where vendor DLLs are installed.


Since Cloudman/Galaxy does not interface with Windows nodes out of the 
box, I implemented a conversion tool wrapper that instantiates windows 
nodes on EC2 and lets them convert the raw files. It currently only does 
this, but there would of course be room for a number of other tools on 
the same windows image.


I know there are limitations such as Amazon's 20 node limit and the 
nodes not being controlled in Cloudman, but I wonder if my 
implementation would cause any serious problems with running this. 
Problems like security, etc (after all, the tool wrapper curls the user 
data to get the AWS access/secret keys. If anyone has comments on it, 
I'd love to hear them.


Current code for the windows image can be found on:
https://bitbucket.org/glormph/galaxy-ec2-windows

And the tool that creates the windows node is on:
https://bitbucket.org/glormph/adapt/changeset/ca13e548b85132049a918af84cfb063e62615bcb

cheers,
jorrit

--
Scientific programmer
Mass spec analysis support @ BILS
Janne Lehtiö / Lukas Käll labs
SciLifeLab Stockholm

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] pass more information on a dataset merge

2012-10-16 Thread Jorrit Boekel
cker,
proteinpilot, Ira Cooke has X! Tandem, OMSSA, TPP, and NBIC has an
entire suite of label free quant tools). A third benefit would be that
people working in other -omicses could make use of the homogenous
composite datatype implementation without needing to rewrite their
wrappers and datatypes.

There is probably something that I am missing that makes this very
difficult, let me know if you think this is a good idea and what its
feasibility might be. I forked your repo and set off to try to
implement some of this stuff last week and I ended up with my galaxy
pull requests to improve batching workflows and multi-input tool
parameters instead, but I hope to eventually get around to it.

-John


John Chilton
Senior Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
Bitbucket: https://bitbucket.org/jmchilton
Github: https://github.com/jmchilton
Web: http://jmchilton.net

On Mon, Oct 1, 2012 at 8:24 AM, Jorrit Boekel
 wrote:

Dear list,

I thought I was working with fairly large datasets, but they have recently
started to include ~2Gb files in sets of >50. I have ran these sort of
things before as merged data by using tar to roll them up in one set, but
when dealing with >100Gb tarfiles, Galaxy on EC2 seems to get very slow,
although that's probably because of my implementation of dataset type
detection (untar and read through files).

Since tarring/untarring isn't very clean, I want to switch from tarring to
creating composite files on merge by putting a tool's results into the
dataset.extra_files_path. This doesn't seem to be supported yet, because we
currently pass in do_merge the output dataset.filename to the respective
datatype's merge method.

I would like to pass more data to the merge method (let's say the whole
dataset object) to be able to get the composite files directory and 'merge'
the files in there. Good idea, bad idea? If anyone has views on this, I'd
love to hear them.

cheers,
jorrit

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Gestion d'un nombre indéfini de fichiers output / Managment of undefined outputs

2012-10-02 Thread Jorrit Boekel

Hello-,

I have had similar problems, and you can possibly solve them by using 
composite datasets, which allow you to pass files to a directory whilst 
having a primary file that represents the dataset (I believe Rgenetics 
uses a primary HTML file containing hyperlinks to the files in the dir).


For example, in your tools command line:
script.pl $input1 $input2 $output
specifying:
script.pl $input1 $input2 $output.extra_files_dir

The extra_files_dir attribute is lets the script output files to a 
directory (don't change its name). The dir can in a next step be reached 
by specifying:

script2.pl $input.extra_files_dir

I am not quite in the position to explain everything, since I only use 
my composite files in in-between steps yet. In other words, I don't know 
how to create the primary file and let users download their composite files.


cheers,
jorrit




On 10/02/2012 05:05 PM, Sarah Maman wrote:

Bonjour,

Je souhaiterai intégrer un script perl (script quantifier de mirdeep2) 
dans un nouveau tool de mon instance locale de Galaxy.
Le problème est que ce script génère plusieurs outputs (fichiers 
images) mais le nombre d'outputs est variable d'une fois sur l'autre.
Il me semble que le nombre d'outputs doit être défini et fixe dans le 
fichier xml du tool ? Ou existe t'il un moyen pour que le nombre 
d'outputs ne soit pas précisé à l'avance ?


Merci d'avance pour votre aide,
Sarah Maman


**

Hello,

I would like to integrate a perl script (script quantifier of 
mirdeep2) in a new tool in my local instance of Galaxy.
The problem is that this script generates several outputs (image 
files), but the number of outputs varies from one time to another.
It seems that the number of outputs must be defined and fixed in the 
tool xml file ? Or is there a way to not specify the number of outputs ?


Thank you in advance for your help,
Sarah Maman
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] pass more information on a dataset merge

2012-10-01 Thread Jorrit Boekel

Dear list,

I thought I was working with fairly large datasets, but they have 
recently started to include ~2Gb files in sets of >50. I have ran these 
sort of things before as merged data by using tar to roll them up in one 
set, but when dealing with >100Gb tarfiles, Galaxy on EC2 seems to get 
very slow, although that's probably because of my implementation of 
dataset type detection (untar and read through files).


Since tarring/untarring isn't very clean, I want to switch from tarring 
to creating composite files on merge by putting a tool's results into 
the dataset.extra_files_path. This doesn't seem to be supported yet, 
because we currently pass in do_merge the output dataset.filename to the 
respective datatype's merge method.


I would like to pass more data to the merge method (let's say the whole 
dataset object) to be able to get the composite files directory and 
'merge' the files in there. Good idea, bad idea? If anyone has views on 
this, I'd love to hear them.


cheers,
jorrit

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] DRMAA: TypeError: check_tool_output() takes exactly 5 arguments (4 given)

2012-09-19 Thread Jorrit Boekel

Odd, it works for me on EC2/Cloudman.

jorrit

On 09/19/2012 03:29 PM, Peter Cock wrote:

On Tue, Sep 18, 2012 at 7:11 PM, Scott McManus  wrote:

Sorry - that's changeset 7714:3f12146d6d81

-Scott

Hi Scott,

The good news is this error does seem to be fixed as of that commit:

TypeError: check_tool_output() takes exactly 5 arguments (4 given)

The bad news is my cluster jobs still aren't working properly (using
a job splitter). The jobs seem to run, get submitted to the cluster,
and finish, and the data looks OK via the 'eye' view icon, but is
red in the history with:

0 bytes
An error occurred running this job: info unavailable

I will investigate - it is likely due to another change... perhaps in
the new stdout/stderr/return code support?

Peter
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] python egg cache exists error

2012-09-19 Thread Jorrit Boekel

I added this snippet to the top of my extract_dataset_part.py:

pkg_resources.require("simplejson")

# wait until this process' PID is the first PID of all processes with 
the same name, then import

while True:
with os.popen("ps ax|grep extract_dataset_part.py |grep -v grep|awk 
'{print $1}'") as allpids:

if os.getpid() == int(allpids.readline().strip() ):
break

import simplejson


The file will wait its turn based on its PID (lower PIDs show up higher 
in the table). Problems may however arise when an 
extract_dataset_part.py becomes a zombie or something, but since it's a 
small script, this may do the job. If anyone sees more problems, I'd be 
happy to know.


cheers,
jorrit

On 09/19/2012 09:16 AM, Jorrit Boekel wrote:
For completeness, here's two tracebacks (there were more similar ones) 
from the same job:


/mnt/galaxyData/tmp/job_working_directory/000/75/task_4:
Traceback (most recent call last):
  File "./scripts/extract_dataset_part.py", line 25, in 
import galaxy.model.mapping #need to load this before we unpickle, 
in order to setup properties assigned by the mappers
  File "/mnt/galaxyTools/galaxy-central/lib/galaxy/model/__init__.py", 
line 13, in 

import galaxy.datatypes.registry
  File 
"/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/registry.py", 
line 8, in 

from display_applications.application import DisplayApplication
  File 
"/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/display_applications/application.py", 
line 9, in 

from util import encode_dataset_user
  File 
"/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/display_applications/util.py", 
line 3, in 

from Crypto.Cipher import Blowfish
  File 
"/mnt/galaxyTools/galaxy-central/eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg/Crypto/Cipher/Blowfish.py", 
line 7, in 
  File 
"/mnt/galaxyTools/galaxy-central/eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg/Crypto/Cipher/Blowfish.py", 
line 4, in __bootstrap__
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 
882, in resource_filename

self, resource_name
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 
1351, in get_resource_filename

self._extract_resource(manager, self._eager_to_zip(name))
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 
1373, in _extract_resource

self.egg_name, self._parts(zip_path)
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 
962, in get_cache_path

self.extraction_error()
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 
928, in extraction_error

raise err
pkg_resources.ExtractionError: Can't extract file(s) to egg cache

The following error occurred while trying to extract file(s) to the 
Python egg

cache:

  [Errno 17] File exists: 
'/home/galaxy/.python-eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg-tmp'


The Python egg cache directory is currently set to:

  /home/galaxy/.python-eggs

Perhaps your account does not have write access to this directory?  
You can

change the cache directory by setting the PYTHON_EGG_CACHE environment
variable to point to an accessible directory.
/mnt/galaxyData/tmp/job_working_directory/000/75/task_5:
Traceback (most recent call last):
  File "./scripts/extract_dataset_part.py", line 22, in 
import simplejson
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/__init__.py", 
line 111, in 
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/decoder.py", 
line 7, in 
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/scanner.py", 
line 10, in 
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/scanner.py", 
line 6, in _import_c_make_scanner
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/_speedups.py", 
line 7, in 
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/_speedups.py", 
line 4, in __bootstrap__
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 
882, in resource_filename

self, resource_name
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 
1351, in get_resource_filename

self._extract_resource(manager, self._eager_to_zip(name))
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 
1373, in _extract_resource

self.egg_name, self._parts(zip_path)
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 
962, in get_cache_path

self.extraction_error()
  File &q

Re: [galaxy-dev] python egg cache exists error

2012-09-19 Thread Jorrit Boekel
For completeness, here's two tracebacks (there were more similar ones) 
from the same job:


/mnt/galaxyData/tmp/job_working_directory/000/75/task_4:
Traceback (most recent call last):
  File "./scripts/extract_dataset_part.py", line 25, in 
import galaxy.model.mapping #need to load this before we unpickle, in order 
to setup properties assigned by the mappers
  File "/mnt/galaxyTools/galaxy-central/lib/galaxy/model/__init__.py", line 13, in 

import galaxy.datatypes.registry
  File "/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/registry.py", line 8, in 

from display_applications.application import DisplayApplication
  File 
"/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/display_applications/application.py",
 line 9, in 
from util import encode_dataset_user
  File 
"/mnt/galaxyTools/galaxy-central/lib/galaxy/datatypes/display_applications/util.py", 
line 3, in 
from Crypto.Cipher import Blowfish
  File 
"/mnt/galaxyTools/galaxy-central/eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg/Crypto/Cipher/Blowfish.py",
 line 7, in 
  File 
"/mnt/galaxyTools/galaxy-central/eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg/Crypto/Cipher/Blowfish.py",
 line 4, in __bootstrap__
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 882, in 
resource_filename
self, resource_name
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1351, in 
get_resource_filename
self._extract_resource(manager, self._eager_to_zip(name))
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1373, in 
_extract_resource
self.egg_name, self._parts(zip_path)
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 962, in 
get_cache_path
self.extraction_error()
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 928, in 
extraction_error
raise err
pkg_resources.ExtractionError: Can't extract file(s) to egg cache

The following error occurred while trying to extract file(s) to the Python egg
cache:

  [Errno 17] File exists: 
'/home/galaxy/.python-eggs/pycrypto-2.5-py2.7-linux-x86_64-ucs4.egg-tmp'

The Python egg cache directory is currently set to:

  /home/galaxy/.python-eggs

Perhaps your account does not have write access to this directory?  You can
change the cache directory by setting the PYTHON_EGG_CACHE environment
variable to point to an accessible directory.
/mnt/galaxyData/tmp/job_working_directory/000/75/task_5:
Traceback (most recent call last):
  File "./scripts/extract_dataset_part.py", line 22, in 
import simplejson
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/__init__.py",
 line 111, in 
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/decoder.py",
 line 7, in 
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/scanner.py",
 line 10, in 
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/scanner.py",
 line 6, in _import_c_make_scanner
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/_speedups.py",
 line 7, in 
  File 
"/mnt/galaxyTools/galaxy-central/eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg/simplejson/_speedups.py",
 line 4, in __bootstrap__
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 882, in 
resource_filename
self, resource_name
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1351, in 
get_resource_filename
self._extract_resource(manager, self._eager_to_zip(name))
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 1373, in 
_extract_resource
self.egg_name, self._parts(zip_path)
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 962, in 
get_cache_path
self.extraction_error()
  File "/mnt/galaxyTools/galaxy-central/lib/pkg_resources.py", line 928, in 
extraction_error
raise err
pkg_resources.ExtractionError: Can't extract file(s) to egg cache

The following error occurred while trying to extract file(s) to the Python egg
cache:

  [Errno 17] File exists: '/home/galaxy/.python-eggs'

The Python egg cache directory is currently set to:

  /home/galaxy/.python-eggs

Perhaps your account does not have write access to this directory?  You can
change the cache directory by setting the PYTHON_EGG_CACHE environment
variable to point to an accessible directory.




On 09/18/2012 05:24 PM, James Taylor wrote:

Interesting. If I'm reading this correctly the problem is happening
inside pkg_resources? (galaxy.eggs unzips eggs, but I think it does so
on install [fetch_eggs] ti

Re: [galaxy-dev] python egg cache exists error

2012-09-18 Thread Jorrit Boekel

Hi again,

I have looked into this matter a little bit more, and it looks like this 
is happening:


- tasked job is split
- tasks commands are sent to workers (I am running 8-core high cpu extra 
large workers on EC2)

- per task, worker runs env.sh for the respective tool
- per task, worker runs scripts/extract_dataset_part.py
- this scripts issues import statements (ones forsimplejson and 
galaxy.model.mapping have caused me problems)
- which lead to unzipping .so libraries from python eggs into the nodes' 
/home/galaxy/.python-eggs
- this runs into lib/pkg_resources.py and its _bypass_ensure_directory 
method that creates the temporary dir for the egg unzip
- since there are 8 processes on the node, sometimes this method tries 
to mkdir a directory that was just made by the previous process after 
the isdir.


That last point is my guessing. I don't really know how to solve this in 
a non-hackish way, so until someone finds out, I may use reading from a 
'eggs_extracted.txt'  file to determine if the eggs have been extracted. 
And locking the file when writing to it of course.


cheers,
jorrit

On 09/14/2012 10:57 AM, Jorrit Boekel wrote:

Dear list,

I am running galaxy-dist on Amazon EC2 through Cloudman, and am using 
the enable_tasked_jobs to run jobs in parallel. Yes, I know it's not 
recommended in production. My jobs usually get split in 72 parts, and 
sometimes (but not always, maybe in 30-50% of cases), errors are 
returned concerning the python egg cache, usually:


[Errno 17] File exists: '/home/galaxy/.python-eggs'

or something like

[Errno 17] File exists: 
'/home/galaxy/.python-eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg-tmp'


The errors arise AFAIK from when scripts/extract_dataset_part.py is 
run. I am guessing that the tmp python egg dir is created for every 
task of the mentioned 72, that they sometimes coincide and that this 
leads to an error.


I would like to solve this problem, but before doing so, I'd like to 
know if someone else has already fixed it in a galaxy-central changeset.


cheers,
jorrit

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] DRMAA: TypeError: check_tool_output() takes exactly 5 arguments (4 given)

2012-09-18 Thread Jorrit Boekel
Is it possible that you are looking at different classes? TaskWrapper's 
finish method does not use the job variable in my recently merged code 
either (line ~1045), while JobWrapper's does around line 315.


cheers,
jorrit




On 09/18/2012 03:55 PM, Scott McManus wrote:

I have to admit that I'm a little confused as to why you would
be getting this error at all - the "job" variable is introduced
at line 298 in the same file, and it's used as the last variable
to check_tool_output in the changeset you pointed to.
(Also, thanks for pointing to it - that made investigating easier.)

Is it possible that there was a merge problem when you pulled the
latest set of code? For my own sanity, would you mind downloading
a fresh copy of galaxy-central or galaxy-dist into a separate
directory and see if the problem is still there? (I fully admit
that there could be a bug that I left in, but all job runners
should have stumbled across the same problem - the "finish" method
should be called by all job runners.)

Thanks again!

-Scott

- Original Message -

I'll check it out. Thanks.

- Original Message -

Hi all (and in particular, Scott),

I've just updated my development server and found the following
error when running jobs on our SGE cluster via DRMMA:

galaxy.jobs.runners.drmaa ERROR 2012-09-18 09:43:20,698 Job wrapper
finish method failed
Traceback (most recent call last):
   File
   "/mnt/galaxy/galaxy-central/lib/galaxy/jobs/runners/drmaa.py",
line 371, in finish_job
 drm_job_state.job_wrapper.finish( stdout, stderr, exit_code )
   File "/mnt/galaxy/galaxy-central/lib/galaxy/jobs/__init__.py",
   line
1048, in finish
 if ( self.check_tool_output( stdout, stderr, tool_exit_code )
 ):
TypeError: check_tool_output() takes exactly 5 arguments (4 given)

This looks to have been introduced in this commit:
https://bitbucket.org/galaxy/galaxy-central/changeset/f557b7b05fdd701cbf99ee04f311bcadb1ae29c4#chg-lib/galaxy/jobs/__init__.py

There should be an additional jobs argument, proposed fix:

$ hg diff lib/galaxy/jobs/__init__.py
diff -r 4007494e37e1 lib/galaxy/jobs/__init__.py
--- a/lib/galaxy/jobs/__init__.py   Tue Sep 18 09:40:19 2012 +0100
+++ b/lib/galaxy/jobs/__init__.py   Tue Sep 18 10:06:44 2012 +0100
@@ -1045,7 +1045,8 @@
  # Check what the tool returned. If the stdout or stderr
  matched
  # regular expressions that indicate errors, then set an
  error.
  # The same goes if the tool's exit code was in a given
  range.
-if ( self.check_tool_output( stdout, stderr,
tool_exit_code
) ):
+job = self.get_job()
+if ( self.check_tool_output( stdout, stderr,
tool_exit_code,
job ) ):
  task.state = task.states.OK
  else:
  task.state = task.states.ERROR


(Let me know if you want this as a pull request - it seems a lot of
effort for a tiny change.)

Regards,

Peter


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/



___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] python egg cache exists error

2012-09-14 Thread Jorrit Boekel

Dear list,

I am running galaxy-dist on Amazon EC2 through Cloudman, and am using 
the enable_tasked_jobs to run jobs in parallel. Yes, I know it's not 
recommended in production. My jobs usually get split in 72 parts, and 
sometimes (but not always, maybe in 30-50% of cases), errors are 
returned concerning the python egg cache, usually:


[Errno 17] File exists: '/home/galaxy/.python-eggs'

or something like

[Errno 17] File exists: 
'/home/galaxy/.python-eggs/simplejson-2.1.1-py2.7-linux-x86_64-ucs4.egg-tmp'

The errors arise AFAIK from when scripts/extract_dataset_part.py is run. 
I am guessing that the tmp python egg dir is created for every task of 
the mentioned 72, that they sometimes coincide and that this leads to an 
error.


I would like to solve this problem, but before doing so, I'd like to 
know if someone else has already fixed it in a galaxy-central changeset.


cheers,
jorrit
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] generating composite files

2012-08-31 Thread Jorrit Boekel

Dear list,

I'm on galaxy-dist. I have a tool that outputs an unknown amount of 
files in a directory, all of which I need for input to the next tool. I 
used to tar them, but after a tip from John Chilton, I tried to fit this 
into Galaxy's composite datatypes. The tool now works, but I wonder if 
there is a better way to do it than how I have implemented it. I ran 
into some trouble when my new datatype's generate_primary_file was never 
called. (Is this only used after an upload?)


My tool now outputs to the output.extra_files_path given to it on the 
xml command line, and writes an HTML file to the output file it is 
given. The next tool reads input.extra_files_path, and uses the files 
there. In other words, no use has been made of add_composite_file, 
generate_primary_file, etc. I can even set composite_file=None in the 
datatype, and it will still work.


The whole thing feels a bit like a hack, so I wanted to ask if there is 
a way for tools to generate composite files the same way uploads can.


cheers,
jorrit boekel


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] Deleted jobs on the cloud keep running

2012-08-29 Thread Jorrit Boekel

Dear list,

When running Galaxy/Cloudman on Amazon (on Ubuntu 12.04), I run into the 
following:


- I start a job with  in the tool.xml file.
- Since I have use_tasked_jobs=True in universe_wsgi.ini, the job is 
split and divided over the instances on EC2
- I change my mind, and click on the delete button in the Galaxy UI. 
This removes the job from the UI.
- I log in to an instance running my job, and see that it is still 
running (both its wrapper and the tool program). It keeps running, 
keeping my instances busy.
- When tasks of this job finish (I assume), it is reported to galaxy, 
and my logfile says things like:


galaxy.jobs DEBUG 2012-08-29 12:42:08,239 task 319 for job 25 ended
galaxy.jobs DEBUG 2012-08-29 12:42:08,265 The tool did not define exit 
code or stdio handling; checking stderr for success

galaxy.jobs DEBUG 2012-08-29 12:42:08,282 task 319 ended

So I wonder if the default behaviour when deleting jobs in the UI is 
that they keep running but hidden, or is there something wrong in my 
config? Is there a way to change this with a setting?


cheers,
jorrit boekel
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] uploading multiple files into one dataset

2012-06-01 Thread Jorrit Boekel

Dear list,

Our lab has been outputting data in multiple files that we currently 
merge (in galaxy) by tarring them. This works fine with the parallel 
processing that Galaxy offers.


The problem, see also below, was to create a user-friendly way to not 
having to create 50-200 datasets in Galaxy but one containing all the 
merged files. I do not know if this functionality is something that 
people want to use, or if it goes against Galaxy design principles, but 
I have implemented it for our lab.


I have enabled a  (may not work in IE 
though)  in a separate FileField subclass, and the list of files that 
subsequently uploads is persisted and passed to the upload tool, where 
they are merged according to a datatype (specified in the upload tool) 
merge-method. File type detection is done by using sniffers as file type 
is set to auto.


I don't have a very good view of the demand for this sort of function, 
but if anyone else would like to use/modify it, I can fork and issue a 
pull request.


cheers,
jorrit



On 03/06/2012 06:56 PM, Nate Coraor wrote:

On Feb 29, 2012, at 11:34 AM, Jorrit Boekel wrote:


Dear list,

Our lab's proteomics data is frequently outputted into>50 files containing 
different fractions of proteins. The files are locally stored and not present on 
the Galaxy server. We've planned to somehow (inside galaxy) merge these files and 
split them into tasks so they can be run on the cluster. We would either 
merge/split the files by concatenation, or untar/tar files at every job, depending 
on filetype and tool. No problems so far.

However, I have been looking around for a way to upload>50 files simultaneously to galaxy and 
convert to one dataset, and this does not seem to be supported. Before starting to create a hack to 
make this work, which doesn't seem especially trivial to me, I'd like to know if I should instead use 
libraries. From what I've seen, libraries are not treated as datasets in Galaxy but rather contain 
datasets. If there was a "tar all sets in library and import to history" I'd be using that, 
but I've only encountered "tar/zip sets and download locally" which would be a bit of a 
workaround.

Hi Jorrit,

It's not possible to do this all in one step, but you can definitely upload 
them all simultaneously and then concatenate them using the concatenate tool 
(or write a simple tool to tar them).

--nate


I haven't found much on this subject in the mailing list, has this 
functionality been requested before?

cheers,
jorrit boekel
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Uploading problem

2012-05-22 Thread Jorrit Boekel

Hi,

Hanging uploads have been a problem before with small and large files:
http://comments.gmane.org/gmane.science.biology.galaxy.devel/4583

I don't know if it's been fixed, but it seemed to depend on browser 
choice and on what server was running. I was recommended to use the 
nginx server IIRC. I still have the occasional problem though, but (I 
hope) less often.


cheers,
jorrit


On 05/22/2012 03:50 PM, J. Greenbaum wrote:

Hi Julie, All,

I have a similar problem on a test environment with a postgres 
backend.  The file that I'm uploading is a 50MB GTF file and it I'm 
getting the same behavior.  Any help would be appreciated.


Thanks,

Jason


--
*Jason Greenbaum, Ph.D.
*Manager, Bioinformatics Core | jgb...@liai.org 
La Jolla Institute for Allergy and Immunology





*From: *"julie dubois" 
*To: *galaxy-dev@lists.bx.psu.edu
*Sent: *Tuesday, May 22, 2012 12:12:06 AM
*Subject: *[galaxy-dev] Uploading problem

Hi,
I test a production environnement of Galaxy with apache proxy and
mysql database.
It work, but when I upload a file (it's not very big : 460 Mo),
Galaxy says that dataset is uploading and it has runned since
yesterday and it's not finish!
Where can the problem be ? Is it an installation problem or a
material limitation : it's run in a machine with 8 Go RAM and 4
cores processor.

Thank you

Julie

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/




___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] cloudman on ubuntu 12.04

2012-05-11 Thread Jorrit Boekel
I now see that there is already support for this in the current repo, 
but I was using the cm.tar.gz in the default cloudman bucket, which is 
older and doesn't have the ubuntu 12.04 support.


My mistake, pretend you never saw it.

cheers,
jorrit

On 05/11/2012 12:27 PM, Jorrit Boekel wrote:

Dear list,

I have tried to use cloudman on ubuntu 12.04 (with mi-deployment 
scripts). It didn't run very smooth. One of the issues seems to be 
that SGE depends on libc.so.6, which is not in the location where 
cloudman expects it.


If not fixed yet, I'll create a pull request for the fix (two lines or 
so in sge.py).


cheers,
jorrit boekel


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] cloudman on ubuntu 12.04

2012-05-11 Thread Jorrit Boekel

Dear list,

I have tried to use cloudman on ubuntu 12.04 (with mi-deployment 
scripts). It didn't run very smooth. One of the issues seems to be that 
SGE depends on libc.so.6, which is not in the location where cloudman 
expects it.


If not fixed yet, I'll create a pull request for the fix (two lines or 
so in sge.py).


cheers,
jorrit boekel
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] uploads stuck in history

2012-03-07 Thread Jorrit Boekel

Hi Nate,

I wasn't before, but I switched to nginx now. Non-uploading is still 
happening in firefox (10.0.2 on ubuntu 11.10).


I can't be 100% sure that my nginx install was correct, but it's 
definitely serving galaxy and not complaining. Is there any way to 
verify uploads are processed through nginx?


jorrit


On 03/05/2012 08:37 PM, Nate Coraor wrote:

By any chance, do you all happen to be using the nginx upload module?  I am 
guessing not.

--nate

On Feb 23, 2012, at 3:16 AM, Bossers, Alex wrote:


I confirm the same strange behaviour since some of the last updates on the 
central version. We are at the latest now.
It is also with medium (10Mb+ tarballs) AND large files! Furthermore its seems 
to be in FireFox only They upload fine using IE8 or 9. Didn't test other 
browsers though.

Annoying it is indeed.

Alex


-Oorspronkelijk bericht-
Van: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] Namens Jorrit Boekel
Verzonden: woensdag 22 februari 2012 10:15
Aan: galaxy-dev@lists.bx.psu.edu
Onderwerp: [galaxy-dev] uploads stuck in history

Dear list,

I have stumbled on some strange behaviour when uploading files to galaxy via 
the upload.py tool. At times, the upload seems to be stalled in history and is 
never actually performed, followed by a seemingly infinite history update (see 
log below). My system is Ubuntu 11.10 and runs Python 2.7.2. I find the 
behaviour in both my own modified galaxy install (based on galaxy-dist), and in 
a fresh clone from galaxy-central.

I have tried to upload different files, and all seem to sometimes trigger the 
behaviour, but not all the time. A restart of galaxy sometimes sorts things out. 
Common for the debug messages is that it seems there is never a job id generated as 
in "galaxy.tools.actions.upload_common INFO 2012-02-22 10:06:36,186 tool
upload1 created job id 2".

Has anyone seen similar things or can it be a problem with my system?

cheers,
jorrit






--Debug messa:

galaxy.web.framework DEBUG 2012-02-22 09:47:43,730 Error: this request returned 
None from get_history(): http://localhost:8080/
127.0.0.1 - - [22/Feb/2012:09:47:43 +0200] "GET / HTTP/1.1" 200 - "-"
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 
Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:44 +0200] "GET /root/tool_menu HTTP/1.1" 200 - 
"http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 
Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:44 +0200] "GET /history HTTP/1.1" 200 - 
"http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64;
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:44 +0200] "POST /root/user_get_usage HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:10.0.2) 
Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:50 +0200] "GET
/tool_runner?tool_id=upload1 HTTP/1.1" 200 - "http://localhost:8080/root/tool_menu"; 
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "POST /tool_runner/upload_async_create HTTP/1.1" 
200 - "http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64;
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "GET /tool_runner/upload_async_message HTTP/1.1" 
200 - "http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64;
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "GET /history HTTP/1.1" 200 - 
"http://localhost:8080/tool_runner/upload_async_message"; "Mozilla/5.0 (X11; Ubuntu; Linux 
x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "POST /root/user_get_usage HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:10.0.2) 
Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:13 +0200] "GET
/tool_runner?tool_id=upload1 HTTP/1.1" 200 - "http://localhost:8080/root/tool_menu"; 
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:15 +0200] "POST /root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64;
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:19 +0200] "POST /root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64;
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:23 +

[galaxy-dev] uploading multiple files into one dataset

2012-02-29 Thread Jorrit Boekel

Dear list,

Our lab's proteomics data is frequently outputted into >50 files 
containing different fractions of proteins. The files are locally stored 
and not present on the Galaxy server. We've planned to somehow (inside 
galaxy) merge these files and split them into tasks so they can be run 
on the cluster. We would either merge/split the files by concatenation, 
or untar/tar files at every job, depending on filetype and tool. No 
problems so far.


However, I have been looking around for a way to upload >50 files 
simultaneously to galaxy and convert to one dataset, and this does not 
seem to be supported. Before starting to create a hack to make this 
work, which doesn't seem especially trivial to me, I'd like to know if I 
should instead use libraries. From what I've seen, libraries are not 
treated as datasets in Galaxy but rather contain datasets. If there was 
a "tar all sets in library and import to history" I'd be using that, but 
I've only encountered "tar/zip sets and download locally" which would be 
a bit of a workaround.


I haven't found much on this subject in the mailing list, has this 
functionality been requested before?


cheers,
jorrit boekel
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] babel egg: epiphany locale setting fix

2012-02-24 Thread Jorrit Boekel

Dear list,

I encountered problems when using galaxy with ubuntu's epiphany browser. 
It seems that when using this browser, HTTP_ENVIRON's locale setting is 
sometimes 'en-us', but sometimes ' en-us'. That's a whitespace before 
the locale. I solved the problem by adding a simple .strip() in 
eggs/babel/core.py on line 763:


lang = parts.pop(0).lower().strip()

I assume the egg is not maintained by the galaxy team, so maybe I should 
forward this to Babel developers, but if anyone runs into the same 
problem, here's a solution.


cheers,
jorrit

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] uploads stuck in history

2012-02-22 Thread Jorrit Boekel

Hi,

I haven't been patient enough to wait really long, but I can give that a 
try.


As Leandro mentioned, I've also seen it in kilobyte to megabyte sized 
files. My Galaxy is forked from the latest galaxy-dist.


cheers,
jorrit

On 02/22/2012 04:47 PM, Leandro Hermida wrote:

Hi all,

We get the behavior mentioned some times too, its not reproducible
just like you mentioned.  Some times it happens with small files, some
times with large files and again as you said it doesn't happen all the
time.

Now I haven't seen this yet on our Galaxy dev server which is the
latest galaxy-dist, only see this on our Galaxy production which is an
older version of galaxy-dist from last winter.

regards,
Leandro

On Wed, Feb 22, 2012 at 3:59 PM, Hans-Rudolf Hotz  wrote:

Hi

I guess we see similar things: a new history item is created, it turns
purple, stays like this apparently forever while no job id is created (ie I
don't see the job in any of the report tools).

To be honest we ignore them, because:

a) it only (as far as I can tell) happens with big data files (we try to
avoid this anyway by using 'Data libraries') while there is heavy load on
the storage system

b) ...eventually, unless the user has deleted the history item, it does turn
green and the job is listed as 'ok'.


Do you experience this problem with all files, even small ones?


Regards, Hans



On 02/22/2012 10:14 AM, Jorrit Boekel wrote:

Dear list,

I have stumbled on some strange behaviour when uploading files to galaxy
via the upload.py tool. At times, the upload seems to be stalled in
history and is never actually performed, followed by a seemingly
infinite history update (see log below). My system is Ubuntu 11.10 and
runs Python 2.7.2. I find the behaviour in both my own modified galaxy
install (based on galaxy-dist), and in a fresh clone from galaxy-central.

I have tried to upload different files, and all seem to sometimes
trigger the behaviour, but not all the time. A restart of galaxy
sometimes sorts things out. Common for the debug messages is that it
seems there is never a job id generated as in
"galaxy.tools.actions.upload_common INFO 2012-02-22 10:06:36,186 tool
upload1 created job id 2".

Has anyone seen similar things or can it be a problem with my system?

cheers,
jorrit






--Debug messa:

galaxy.web.framework DEBUG 2012-02-22 09:47:43,730 Error: this request
returned None from get_history(): http://localhost:8080/
127.0.0.1 - - [22/Feb/2012:09:47:43 +0200] "GET / HTTP/1.1" 200 - "-"
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101
Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:44 +0200] "GET /root/tool_menu
HTTP/1.1" 200 - "http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu;
Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:44 +0200] "GET /history HTTP/1.1" 200 -
"http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64;
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:44 +0200] "POST /root/user_get_usage
HTTP/1.1" 200 - "http://localhost:8080/history"; "Mozilla/5.0 (X11;
Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:50 +0200] "GET
/tool_runner?tool_id=upload1 HTTP/1.1" 200 -
"http://localhost:8080/root/tool_menu"; "Mozilla/5.0 (X11; Ubuntu; Linux
x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "POST
/tool_runner/upload_async_create HTTP/1.1" 200 -
"http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64;
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "GET
/tool_runner/upload_async_message HTTP/1.1" 200 -
"http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64;
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "GET /history HTTP/1.1" 200 -
"http://localhost:8080/tool_runner/upload_async_message"; "Mozilla/5.0
(X11; Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "POST /root/user_get_usage
HTTP/1.1" 200 - "http://localhost:8080/history"; "Mozilla/5.0 (X11;
Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:13 +0200] "GET
/tool_runner?tool_id=upload1 HTTP/1.1" 200 -
"http://localhost:8080/root/tool_menu"; "Mozilla/5.0 (X11; Ubuntu; Linux
x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:15 +0200] "POST
/root/history_item_updates HTTP/1.1" 200 -
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64;
rv:10.0.2) Gecko/201

[galaxy-dev] uploads stuck in history

2012-02-22 Thread Jorrit Boekel

Dear list,

I have stumbled on some strange behaviour when uploading files to galaxy 
via the upload.py tool. At times, the upload seems to be stalled in 
history and is never actually performed, followed by a seemingly 
infinite history update (see log below). My system is Ubuntu 11.10 and 
runs Python 2.7.2. I find the behaviour in both my own modified galaxy 
install (based on galaxy-dist), and in a fresh clone from galaxy-central.


I have tried to upload different files, and all seem to sometimes 
trigger the behaviour, but not all the time. A restart of galaxy 
sometimes sorts things out. Common for the debug messages is that it 
seems there is never a job id generated as in 
"galaxy.tools.actions.upload_common INFO 2012-02-22 10:06:36,186 tool 
upload1 created job id 2".


Has anyone seen similar things or can it be a problem with my system?

cheers,
jorrit






--Debug messa:

galaxy.web.framework DEBUG 2012-02-22 09:47:43,730 Error: this request 
returned None from get_history(): http://localhost:8080/
127.0.0.1 - - [22/Feb/2012:09:47:43 +0200] "GET / HTTP/1.1" 200 - "-" 
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 
Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:44 +0200] "GET /root/tool_menu 
HTTP/1.1" 200 - "http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; 
Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:44 +0200] "GET /history HTTP/1.1" 200 - 
"http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:44 +0200] "POST /root/user_get_usage 
HTTP/1.1" 200 - "http://localhost:8080/history"; "Mozilla/5.0 (X11; 
Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:47:50 +0200] "GET 
/tool_runner?tool_id=upload1 HTTP/1.1" 200 - 
"http://localhost:8080/root/tool_menu"; "Mozilla/5.0 (X11; Ubuntu; Linux 
x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "POST 
/tool_runner/upload_async_create HTTP/1.1" 200 - 
"http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "GET 
/tool_runner/upload_async_message HTTP/1.1" 200 - 
"http://localhost:8080/"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "GET /history HTTP/1.1" 200 - 
"http://localhost:8080/tool_runner/upload_async_message"; "Mozilla/5.0 
(X11; Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:11 +0200] "POST /root/user_get_usage 
HTTP/1.1" 200 - "http://localhost:8080/history"; "Mozilla/5.0 (X11; 
Ubuntu; Linux x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:13 +0200] "GET 
/tool_runner?tool_id=upload1 HTTP/1.1" 200 - 
"http://localhost:8080/root/tool_menu"; "Mozilla/5.0 (X11; Ubuntu; Linux 
x86_64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:15 +0200] "POST 
/root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:19 +0200] "POST 
/root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:23 +0200] "POST 
/root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:27 +0200] "POST 
/root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:31 +0200] "POST 
/root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:35 +0200] "POST 
/root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:39 +0200] "POST 
/root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:43 +0200] "POST 
/root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"
127.0.0.1 - - [22/Feb/2012:09:48:47 +0200] "POST 
/root/history_item_updates HTTP/1.1" 200 - 
"http://localhost:8080/history"; "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; 
rv:10.0.2) Gecko/20100101 Firefox/10.0.2"


_

[galaxy-dev] which galaxy version is on the AMI?

2012-02-15 Thread Jorrit Boekel

Dear list,

As a first attempt to put my tools in the cloudman/galaxy environment, I 
have tried to just crudely copy them to their respective directories, 
along with tool_conf.xml. This worked, but some tools raise errors upon 
loading them that I cannot reproduce in my local Galaxy instance 
(neither in a galaxy-dist, nor in a galaxy-central fork). Something with 
required "value" fields in some  tags (some of which are 
optional="true").


I would thus like to know which Galaxy fork/branch/version is running on 
the most recent AMI (March 2011 I believe), to be able to troubleshoot 
this locally.


In case it would be a somewhat older Galaxy that is in the AMI, is there 
a way to update it (via hg), or a manual that shows how to check out a 
full cloudman/galaxy/SGE environment on an otherwise empty ubuntu-EC2 
instance?


cheers,
jorrit


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] cloudman/galaxy on EC2 - snapshot not found

2012-02-14 Thread Jorrit Boekel

Dear list,

I am trying to deploy Galaxy as an environment for proteomics tools in 
amazon's EC2 cloud, probably/preferably via Cloudman. I have used both 
the biocloudcentral portal and a manual approach to set up 
cloudman/galaxy, but run into the same error message which seems to 
start with a problem with a snapshot that does not exist:


 * 15:16:05 - PostgreSQL data directory '/mnt/galaxyData/pgsql/data'
   does not exist (yet?)
 * 15:16:26 - Error creating volume: EC2ResponseError: 400 Bad Request
   |InvalidSnapshot.NotFound|The snapshot 'snap-b28be9d5' does not
   exist.---some long identifier---
 * 15:16:26 - Error adding filesystem service 'Filesystem-galaxyTools':
   'NoneType' object has no attribute 'id'
 * 15:16:26 - STATUS CHECK: File system named 'galaxyTools' is not
   mounted. Error code 0
 * 15:16:37 - Error mounting file system '/mnt/galaxyData' from
   '/dev/xvdg2', running command '/bin/mount /dev/xvdg2
   /mnt/galaxyData' returned code '32' and following stderr: 'mount:
   you must specify the filesystem type '


My instance type is t1.micro (testing on AWS free tier), but I've tried 
m1.large, and that didn't help. I noted that the snapshot ID is always 
the same. Since the AMI is from March 2011 (at least the one I select 
manually), I assume that I have something wrong in my settings or that 
Amazon has changed something somewhere.


cheers,
Jorrit Boekel
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/