Re: [galaxy-dev] How to remove a broken toolshed install

2012-10-16 Thread Clare Sloggett
Hi Greg,

Thanks for this!

On 17 October 2012 01:17, Greg Von Kuster  wrote:

>> I managed to break a toolshed-installed tool by fiddling with the
>> files under shed_tools.
>
> As you've discovered, this is not a good thing to try.  Always use the Galaxy 
> interface features to perform tasks like this.
>

Actually, the reason I did this was because I didn't know how to solve
a different problem, so maybe I should ask you about that one as well.

We had some occasions where we'd try to install a tool from the
toolshed, and it would hang - it appeared that the hg pull was timing
out. In these cases the config files wouldn't get set up, but a
partial repository was pulled / directories were created, and the
repository files would then get in the way of trying to install the
tool (it seemed to think it was already there). The only way to fix it
seemed to be to manually delete the partially-pulled repository under
shed_tools. This worked fine for fixing failed installs. But, this
time, I thought (wrongly) that this had happened again and I deleted a
repository - then realised that it was actually installed and
registered in the database, etc.

So, if the hg pull times out, is there a right way to clean up the
resulting files? I got in the habit of doing it manually, which of
course is dangerous, because I didn't know any way to do it via the
admin interface.


>
> Depending on the changes you've made, you should be able to do the following:
>
> 1. Manually remove the installed repository subdirectory hierarchy from disk.
> 2. If the repository included any tools, manually remove entries for each of 
> them from the shed_tool-conf.xml file ( or the equivalent file you have 
> configured for handling installed repositories )
> 3. Manually update the database using the following command (assuming your 
> installed repository is named 'bcftools_view' and it is the only repository 
> you have installed with that name) - letter capitalization is required:
>
> The following assumes you're using postgres:
>
> update tool_shed_repository set deleted=True, uninstalled=True, 
> status='Uninstalled', error_message=Null  where name = 'bcftools_view';
>

Thanks very much! Yes it's postgres. I'll let you know if I succeed.

Clare

-- 

Clare Sloggett
Research Fellow / Bioinformatician
Life Sciences Computation Centre
Victorian Life Sciences Computation Initiative
University of Melbourne, Parkville Campus
187 Grattan Street, Carlton, Melbourne
Victoria 3010, Australia
Ph: 03 903 53357  M: 0414 854 759
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Number of outputs = number of inputs

2012-10-16 Thread Alex.Khassapov
I tried galaxy-central-homogeneous-composite-datatypes fork, works great. I 
have a similar problem, where number of output files varies, it seems that your 
approach might work for output files as well (not only input). Currently I'm 
trying to work out how to implement it, any help is appreciated.

Alex

-Original Message-
From: galaxy-dev-boun...@lists.bx.psu.edu 
[mailto:galaxy-dev-boun...@lists.bx.psu.edu] On Behalf Of John Chilton
Sent: Wednesday, 17 October 2012 12:49 AM
To: Sascha Kastens
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Number of outputs = number of inputs

I don't believe this is possible in Galaxy right now. Are the outputs 
independent or is information from all inputs used to produce all outputs? If 
they are independent, you can create a workflow containing just your tool with 
1 input and 1 output and use the batch workflow mode to run it on multiple 
files and get multiple outputs. This is not a beautiful solution but it gets 
the job done in some cases.

Another thing to look at might be the discussion we are having on the thread 
"pass more information on a dataset merge". We have a fork (its all work from 
Jorrit Boekel) of galaxy that creates composite datatypes for each explicitly 
defined type that can hold collections of a single type.

https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-datatypes/compare

This would hopefully let you declare that you can accept a collection of 
whatever your input type is and produce a collection of whatever your output 
is. Lots of downsides to this approach - not fully implemented, and not 
included in Galaxy proper, your outputs would be wrapped up in a composite 
datatype so they wouldn't be easily processable by downstream tools. It would 
be good to have additional people hacking on it though :)

-John


John Chilton
Senior Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
Bitbucket: https://bitbucket.org/jmchilton
Github: https://github.com/jmchilton
Web: http://jmchilton.net

On Tue, Oct 16, 2012 at 7:13 AM, Sascha Kastens  
wrote:
> Hi all!
>
>
>
> I have a tool which takes one ore more input files. For each input 
> file one output is created,
>
> i.e. 1 input file -> 1 output file, 2 input files -> 2 output files, etc.
>
>
>
> What is the best way to handle this? I used the directions for handlin 
> multiple output files where
>
> the 'Number of Output datasets cannot be determined until tool run' 
> which in my opinion is a bit
>
> inappropriate. BTW: The input files are added via the -Tag, so 
> maybe there is a similar
>
> thing for outputs?
>
>
>
> Thanks in advance!
>
>
>
> Cheers,
>
> Sascha
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this and other 
> Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this and other Galaxy 
lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Jobs crash only in workflow context

2012-10-16 Thread Todd Oakley

Hello,
We just did a few tweaks to improve Galaxy performance, and a new 
issue popped up that I would like advice on troubleshooting.


When we run workflows, we see that tools later in the workflow run 
and crash before the results they depend on have completed running.


We can re-run the crashed jobs later and they work fine, suggesting 
that they are only failing in the context of running workflows.


I'd appreciate any advice on how to start troubleshooting this problem.

Thanks much!
Todd


--

***
Todd Oakley, Professor
Ecology Evolution and Marine Biology
University of California, Santa Barbara
Santa Barbara, CA 93106 USA
***

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


[galaxy-dev] Error when creating first data library in new Galaxy installation

2012-10-16 Thread Peter Briggs

Hello

I've encountered an error when creating the first data library within a 
Galaxy installation.


The data library appears to be created but an error message appears 
("Error attempting to display contents of library (Test Data): No module 
named controllers.library_common" - see attached screenshot) and 
although the library name appears, it's not possible to interact further 
with the library e.g. to add datasets.


The original error occurred on our nascent production setup, but I can 
reproduce it on a brand new vanilla local galaxy instance (changeset 
7828:b5bda7a5c345) with the only configuration being to register as a 
user and set the "admin_users" parameter in universe_wsgi.ini. Then, go 
to "admin" -> "manage data libraries" -> "create new data library".


This error doesn't appear when creating a new data library on an 
instance which already has existing libraries, and debug output doesn't 
give me any additional clues about where the "missing module" is coming 
from.


I don't think this problem has been reported previously however 
apologies if it's already been addressed on the mailing list.


Thanks for any help

Best wishes, Peter

--
Peter Briggs peter.bri...@manchester.ac.uk
Bioinformatics Core Facility University of Manchester
B.1083 Michael Smith Bldg Tel: (0161) 2751482

<>___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] How to remove a broken toolshed install

2012-10-16 Thread Greg Von Kuster
Hello Clare,

On Oct 16, 2012, at 1:02 AM, Clare Sloggett wrote:

> Hi all,
> 
> I managed to break a toolshed-installed tool by fiddling with the
> files under shed_tools.


As you've discovered, this is not a good thing to try.  Always use the Galaxy 
interface features to perform tasks like this.


> This led to a situation in which the Galaxy
> admin interface claims the tool is still installed, but can't find any
> files for it. I manually put the repository files where I think they
> should go, but this didn't fix the situation, so what I really want to
> do is just get rid of it altogether and reinstall cleanly. I'm not
> certain that the tool was working properly before I fiddled with it,
> either.

Depending on the changes you've made, you should be able to do the following:

1. Manually remove the installed repository subdirectory hierarchy from disk.
2. If the repository included any tools, manually remove entries for each of 
them from the shed_tool-conf.xml file ( or the equivalent file you have 
configured for handling installed repositories )
3. Manually update the database using the following command (assuming your 
installed repository is named 'bcftools_view' and it is the only repository you 
have installed with that name) - letter capitalization is required:

The following assumes you're using postgres:

update tool_shed_repository set deleted=True, uninstalled=True, 
status='Uninstalled', error_message=Null  where name = 'bcftools_view';

> 
> Galaxy won't let me uninstall, deactivate or update it (because it
> can't find it properly) and it won't let me install it (because it
> thinks it's installed). It also seems (judging by the last of the
> errors below) to be unable to find some config information that it
> expects, but I don't really understand what's going on there.
> 
> So my question is: given a messy, screwed up install, how can I
> completely remove it and start from scratch? What are the different
> components and config files I need to remove it from and are they all
> manually accessible?
> 
> Thanks in advance for any help!
> 
> 
> If it's relevant to my question, here are some of the behaviours I see
> currently:
> 
> The tool appears as "Installed" under Admin -> Manage installed tool
> shed repositories, but doesn't show up in the tools panel.
> 
> If I try Repository Actions -> Get repository updates , I get the error:
> The directory containing the installed repository named
> 'bcftools_view' cannot be found.
> 
> But if I try Repository Actions -> Reset repository metadata , it
> apparently works, I get
> Metadata has been reset on repository bcftools_view.
> 
> And, if I try to 'Deactivate or uninstall' the apparently-installed
> repository, I get:
> 
> URL: 
> http://galaxy-tut.genome.edu.au/admin_toolshed/deactivate_or_uninstall_repository?id=a25e134c184d6e4b
> Module paste.exceptions.errormiddleware:144 in __call__
>>> app_iter = self.application(environ, sr_checker)
> Module paste.debug.prints:106 in __call__
>>> environ, self.app)
> Module paste.wsgilib:543 in intercept_output
>>> app_iter = application(environ, replacement_start_response)
> Module paste.recursive:84 in __call__
>>> return self.application(environ, start_response)
> Module paste.httpexceptions:633 in __call__
>>> return self.application(environ, start_response)
> Module galaxy.web.framework.base:160 in __call__
>>> body = method( trans, **kwargs )
> Module galaxy.web.framework:205 in decorator
>>> return func( self, trans, *args, **kwargs )
> Module galaxy.webapps.galaxy.controllers.admin_toolshed:452 in
> deactivate_or_uninstall_repository
>>> remove_from_tool_panel( trans, tool_shed_repository, shed_tool_conf, 
>>> uninstall=remove_from_disk_checked )
> Module galaxy.util.shed_util:1781 in remove_from_tool_panel
>>> tool_panel_dict = generate_tool_panel_dict_from_shed_tool_conf_entries( 
>>> trans, repository )
> Module galaxy.util.shed_util:942 in
> generate_tool_panel_dict_from_shed_tool_conf_entries
>>> tree = util.parse_xml( shed_tool_conf )
> Module galaxy.util:135 in parse_xml
>>> tree = ElementTree.parse(fname)
> Module elementtree.ElementTree:859 in parse
> Module elementtree.ElementTree:576 in parse
> TypeError: coercing to Unicode: need string or buffer, NoneType found
> 
> 
> 
> Thanks,
> Clare
> 
> -- 
> 
> Clare Sloggett
> Research Fellow / Bioinformatician
> Life Sciences Computation Centre
> Victorian Life Sciences Computation Initiative
> University of Melbourne, Parkville Campus
> 187 Grattan Street, Carlton, Melbourne
> Victoria 3010, Australia
> Ph: 03 903 53357  M: 0414 854 759
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in y

Re: [galaxy-dev] Number of outputs = number of inputs

2012-10-16 Thread John Chilton
I don't believe this is possible in Galaxy right now. Are the outputs
independent or is information from all inputs used to produce all
outputs? If they are independent, you can create a workflow containing
just your tool with 1 input and 1 output and use the batch workflow
mode to run it on multiple files and get multiple outputs. This is not
a beautiful solution but it gets the job done in some cases.

Another thing to look at might be the discussion we are having on the
thread "pass more information on a dataset merge". We have a fork (its
all work from Jorrit Boekel) of galaxy that creates composite
datatypes for each explicitly defined type that can hold collections
of a single type.

https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-datatypes/compare

This would hopefully let you declare that you can accept a collection
of whatever your input type is and produce a collection of whatever
your output is. Lots of downsides to this approach - not fully
implemented, and not included in Galaxy proper, your outputs would be
wrapped up in a composite datatype so they wouldn't be easily
processable by downstream tools. It would be good to have additional
people hacking on it though :)

-John


John Chilton
Senior Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
Bitbucket: https://bitbucket.org/jmchilton
Github: https://github.com/jmchilton
Web: http://jmchilton.net

On Tue, Oct 16, 2012 at 7:13 AM, Sascha Kastens
 wrote:
> Hi all!
>
>
>
> I have a tool which takes one ore more input files. For each input file one
> output is created,
>
> i.e. 1 input file -> 1 output file, 2 input files -> 2 output files, etc.
>
>
>
> What is the best way to handle this? I used the directions for handlin
> multiple output files where
>
> the ’Number of Output datasets cannot be determined until tool run’ which in
> my opinion is a bit
>
> inappropriate. BTW: The input files are added via the -Tag, so maybe
> there is a similar
>
> thing for outputs?
>
>
>
> Thanks in advance!
>
>
>
> Cheers,
>
> Sascha
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Galaxy data upload

2012-10-16 Thread Nate Coraor
Hi Oliver,

This should be fixed.

--nate

On Oct 15, 2012, at 9:07 AM, Greg Von Kuster wrote:

> Hello Oliver,
> 
> I've forwarded this to the galaxy-dev mail list since it is the contact point 
> for issues like this.  Please send questions like this to the Galaxy mail 
> lists (galaxy-dev, galaxy-user) in the future as there is no guarantee you'll 
> get timely answers from individual contacts.
> 
> Thanks very much,
> 
> Greg Von Kuster
> 
> Begin forwarded message:
> 
>> From: "Oliver Berkowitz" 
>> Date: October 14, 2012 11:47:13 PM EDT
>> To: 
>> Subject: Galaxy data upload
>> 
>> Dear Greg,
>>  
>> not sure if you are the right person to contact…
>>  
>> I have tried several times to upload data to the galaxy server via ftp. 
>> Trying to connect leads to error 530, max number of clients already 
>> connected, like this:
>>  
>> Status: Resolving address of main.g2.bx.psu.edu
>> Status: Connecting to .
>> Status: Connection established, waiting for welcome message...
>> Response:220 ProFTPD 1.3.4b Server (Galaxy Main Server FTP) 
>> [:::128.118.250.4]
>> Command:  USER xx
>> Response:331 Password required for xxx
>> Command:  PASS ***
>> Response:530 Sorry, the maximum number of clients (3) for this user 
>> are already connected.
>> Error:   Critical error
>> Error:   Could not connect to server
>>  
>>  
>> Do you have an idea what I am doing wrong  (or the server J) ?
>>  
>> Cheers
>> Oliver
>>  
>>  
>> 
>> Dr. Oliver Berkowitz
>> Murdoch University
>> Centre for Phytophthora Science and Management
>> School of Biological Sciences and Biotechnology
>> 90 South Street, Murdoch WA 6150, Australia
>> P: 61 - (0)8 9360 6335
>> E: o.berkow...@murdoch.edu.au
>> W: http://profiles.murdoch.edu.au/myprofile/oliver-berkowitz/
>>  
> 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Number of outputs = number of inputs

2012-10-16 Thread Sascha Kastens
Hi all!

 

I have a tool which takes one ore more input files. For each input file one 
output is created,

i.e. 1 input file -> 1 output file, 2 input files -> 2 output files, etc.

 

What is the best way to handle this? I used the directions for handlin multiple 
output files where

the ?Number of Output datasets cannot be determined until tool run? which in my 
opinion is a bit

inappropriate. BTW: The input files are added via the -Tag, so maybe 
there is a similar

thing for outputs?

 

Thanks in advance!

 

Cheers,

Sascha

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Group parameters

2012-10-16 Thread Joachim Jacob

+1 for the  tag.

Joachim Jacob, PhD

Rijvisschestraat 120, 9052 Zwijnaarde
Tel: +32 9 244.66.34
Bioinformatics Training and Services (BITS)
http://www.bits.vib.be
@bitsatvib

On 10/15/2012 06:50 PM, James Taylor wrote:

Be aware that this may not work in future versions of Galaxy, and
probably won't work in some places already (e.g. trackster).

If this is a common need, one option would be to create a new type of
grouping construct that would simply be a labeled group. In the config
this would be:


   ...
   ...


(collapsed/collapsable would configure whether the group can be hidden
and is hidden by default, this would be a more natural way to define
commonly used vs all parameters for example).

-- jt


On Mon, Oct 15, 2012 at 10:11 AM, Joachim Jacob  wrote:

You might have a look to the GMAP aligner wrapper in the toolshed
http://toolshed.g2.bx.psu.edu/. Apparently you have to use the html codes
for the symbols, instead of the symbols themselves
GMAP example:

label="

Input Sequences

Select an mRNA or EST dataset to map" /> ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/

Re: [galaxy-dev] pass more information on a dataset merge

2012-10-16 Thread Jorrit Boekel
No objections whatsoever, I'm really happy more people are interested! 
General fixes are definitely to be preferred over a 
whatever-field-specific solution if you ask me.


I am currently running a parallelism where I create symbolic links on 
split, and move result files (as opposed to copying) on a merge. Faster 
than copying back and forward, but it's limited to splitting to the 
amount of files in a set.


cheers,
jorrit

On 10/15/2012 04:07 PM, John Chilton wrote:

Here is an implementation of the implicit multi-file composite
datatypes piece of that idea. I think the implicit parallelism may be
harder.

https://bitbucket.org/galaxyp/galaxy-central-homogeneous-composite-datatypes/compare

Jorrit do you have any objection to me trying to get this included in
galaxy-central (this is 95% code I stole from you)? I made the changes
against a clean galaxy-central fork and included nothing proteomics
specific in anticipation of trying to do that. I have talked with Jim
Johnson about the idea and he believes it would be useful his mothur
metagenomics tools, so the idea is valuable outside of proteomics.

Galaxy team, would you be okay with including this and if so is there
anything you would like to see either at a high level or at the level
of the actual implementation.

-John


John Chilton
Senior Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223
Bitbucket: https://bitbucket.org/jmchilton
Github: https://github.com/jmchilton
Web: http://jmchilton.net

On Mon, Oct 8, 2012 at 9:24 AM, John Chilton  wrote:

Jim Johnson and I have been discussing that approach to handling
fractionated proteomics samples as well (composite datatypes, not the
specifics of the interface for parallelizing).

My perspective has been that Galaxy should be augmented with better
native mechanisms for grouping objects in histories, operating over
those groups, building workflows that involve arbitrary numbers of
inputs, etc... Composite data types are kindof a kludge, I think they
are more useful for grouping HTML files together when you don't care
about operating on the constituent parts you just want to view pages a
as a report or something. With this proteomic data we are working
with, the individual pieces are really interesting right? You want to
operate on the individual pieces with the full array of tools (not
just these special tools that have the logic for dealing with the
composite datatypes), you want to visualize the files, etc... Putting
these component pieces in the composite data type extra_files path
really limits what you can do with the pieces in Galaxy.

I have a vague idea of something that I think could bridge some of the
gaps between the approaches (though I have no clue on the
feasibility). Looking through your implementation on bitbucket it
looks like you are defining your core datatypes (MS2, CruxSequest) as
subclasses of this composite data type (CompositeMultifile). My
recommendation would be to try to define plain datatypes for these
core datatype (MS2, CruxSequest) and then have the separate composite
datatype sort of delegate to the plain datatypes.

You could then continue to explicitly declare subclasses of the
composite datatype (maybe MS2Set, CruxSequestSet), but also maybe
augement the tool xml so you can do implicit data type instances the
way you can with tabular data for instance (instead of defining
columns you would define the datatype to delegate to).

The next step would be to make the parallelism implicit (i.e pull it
out of the tool wrapper). Your tool wrappers wouldn't reference the
composite datatypes, they would reference the simple datatypes, but
you could add a little icon next to any input that let you replace a
single input with a composite input for that type. It would be kind of
like the run workflow page where you can replace an input with a
multiple inputs. If a composite input (or inputs) are selected the
tool would then produce composite outputs.

For the steps that actually combine multiple inputs, I think in your
case this is perculator maybe (a tool like interprophet or Scaffold
that merges peptide probabilities across runs and groups proteins),
then you could have the same sort of implicit replacement but instead
of for single inputs it could do that for multi-inputs (assuming the
Galaxy powers that be accept my fixes for multi-input tool parameters:
https://bitbucket.org/galaxy/galaxy-central/pull-request/76/multi-input-data-tool-parameter-fixes).

The upshot of all of that would be that then even if these composites
datatypes aren't used widely, other people could still use your
proteomics tools (my users are definitely interested in Crux for
instance) and you could then use other developers' proteomic tools
with your composite datatypes even though they weren't designed with
that use case in mind (I have msconvert, myrimatch, idpicker,
proteinpilot, Ira Cooke has X! T