Re: [galaxy-dev] Inform tool interface with data specific to selected dataset

Dooley, Damion Fri, 09 May 2014 18:29:19 -0700

Sure.  I'll try to be concise; approach was sketched out about a month ago on 
the board.   I'll be uploading our generalized reporting tool which can be an 
example of this once it has tests, but for now the bare bones:


Background: we wanted the ability to launch a Blast search of a number of fasta 
sequences, and then have the results displayed in an HTML form, by query and 
hits, and then allow a user to select hits for particular queries and have them 
show up in their own datasets, each ready to have a phylogenetic tree 
visualization pipeline of tools.  The reason an HTML form was called for is 
that one can then see for each hit various columns of information, that then 
allow you to make a decision about whether you want that hit or not in the next 
stage.  

So first we have a dataset containing choice information, say this combo of 
BLAST nucleotide sequence search and hit info. (search query row indicated by 
"1" in query column):

Accession ID    pident  length  sequence        Query   Row                     
Assembly_67_BCC1        -       -       AGGAC...TGCA    1       1               
                
gi|158343637|gb|EU057648.1|     99.55   442     AGGAC...TGCA    0       2
gi|158343987|gb|EU057686.1|     99.10   442     AGGAC...TGCA    0       3
gi|158343677|gb|EU057652.1|     98.87   387     TGGAC...TGCA    0       4 
...
Assembly_67_BCC8        -       -       ATGG...CCC      1       5
...

Tool A: "Selection Form": takes in above info, provides an HTML report in which 
an HTML form provides the necessary input to Tool B.

Tool B: "Selection Tool": takes in same dataset as above, but generates output 
file that includes only selected rows of data (and only desired columns).  (The 
nice thing about Tool B is that it can be set up to work directly on the above 
dataset without needing to be fed by Tool A, its just that when called up 
directly, it only offers a selection list as provided by its own XML form spec.)
                                                                                
        
Tool A:
 
Starting in tool XML, we indicate a) input type of data to select in history, 
b) html output file where form is built, c) some useful ids related to the 
input data file (don't confuse id with hid or dataset_id!).  
"tool_input_dataset_file.id" is the one we need to pass to Tool B. 

<tool id="bccdcBLASTreporting" name="BLAST Reporting" version="1.0.4">
        ...
        <command interpreter="python">
my_python.py $tool_input_dataset_file $html_file 
$tool_input_dataset_file.hid:$tool_input_dataset_file.dataset_id:$tool_input_dataset_file.id
-f "
        ...
        </command>
        ...
        <inputs>
                <param name="tool_input_dataset_file" type="data" format="[e.g. 
tabular, or whatever type in history]" label="My insightful results"/> 
        ...
        </inputs>       
        <outputs>
                ...
                <data format="html" name="html_file" label="HTML report for 
data $tool_input_dataset_file.hid" />
        </outputs>

Tool A builds the html form.  The only trick here is that you have to load the 
Tool B form in galaxy, and view its frame's source code to see the right values 
for tool_id and tool_state (an initial tool_state value seems to work fine).  I 
use a dictionary lookup to store these, and combine with string replacement in 
a multi-line string for simple html templating.  Below is code slightly adapted 
for this writeup. 
        
        in_file, out_html_file, selection_file_data = args
        sel_file_fields = selection_file_data.split(':')

        self.lookup = {
                'timestamp': time.strftime('%Y/%m/%d'),
                'tool_id': 'bccdcSelectSubset',
                'tool_state':'800.....................71002e',
                'select_row':0,
                'dataset_selection_id': sel_file_fields[2]
        }

        form_html = """

                <div style="float:right" id="buttonPrint" class="nonprintable">
                        <button onclick="window.print()">Print</button>
                </div>

                <form id="tool_form" name="tool_form" 
action="../../../tool_runner"  target="galaxy_main" method="post" 
enctype="application/x-www-form-urlencoded">
                        <input type="hidden" name="refresh" value="refresh"/>
                        <input type="hidden" name="tool_id" 
value="%(tool_id)s"/>
                        <input type="hidden" name="tool_state" 
value="%(tool_state)s">
                        <input type="hidden" name="input" 
value="%(dataset_selection_id)s"/>                    
                        <input type="hidden" name="incl_excl" value="1"/>

                        <input type="submit" class="btn btn-primary 
nonprintable" name="runtool_btn" value="Submit">

                        """ % self.lookup

        with open(html_file, 'w') as fp_out:
                fp_out.write(HTML_REPORT_HEADER_FILE)
                fp_out.write(form_html)
                ...
And now write out all the table stuff for each row in input file with a 
checkbox selector:
                with open(in_file) as f_in:
                        for line in f_in:
                                rowdata = line.split('\t')
                                self.lookup['select_row'] +=1
                                tdTags = ''
                                for (col, field) in 
enumerate(self.display_columns):
                                        lookup['value'] = rowdata[col]

                                        if (col == 0):
                                                tdTags += '<td><input 
type="checkbox" name="select" value="%(select_row)s" />%(value)s</td>' % 
self.lookup
                                        else:
                                                tdTags += '<td>%(value)s</td>' 
% self.lookup

                                fp_out.write("""\n\t\t\t<tr>%s</tr>""" % tdTags)
                ...

                fp_out.write(HTML_REPORT_FOOTER_FILE)


Tool B:

To keep it simple this one just does a single output dataset but I can show a 
multiple output datset one, one for each set of query hits selected above if 
you want.  ' force_history_refresh="True" ' is supposed to refresh the history 
list after this executes all of its file writing but for some reason that 
doesn't seem to work on my galaxy.

 <tool id="bccdcSelectSubset" name="Select subsets" 
force_history_refresh="True">
        <command interpreter="python">
        select_subset.py $input $output1 $output1.id $__new_file_path__ 
$incl_excl $select
        </command>
        <inputs>
                <param name="input" type="data" format="tabular" 
label="Numbered tabular input file"/>
                <param name="incl_excl" type="select" format="text" 
label="Include or exclude selection?">
                        <option value="1">Include selection</option>
                        <option value="0">Exclude selection</option>
                </param>
                <param name="select" type="select" multiple="true" 
display="checkboxes" label="Select lines below">
                                <options from_dataset="input">
                                        <column name="name" index="0"/>
                                        <column name="value" index="-1"/>
                                </options>
                </param>
        </inputs>
        <outputs>
                <data name="output1" format="tabular" metadata_source="input" 
label="$tool.name on data $input.hid"/>
        </outputs>
        <help>

.. class:: infomark

**What it does**

This tool produces a tabular file with a subset of the lines in its input 
tabular file.
        </help>
</tool>

And the python:
'''
python select_subset.py $input $output $incl_excl $select
'''

def stop_err( msg ):
    sys.stderr.write("%s\n" % msg)
    sys.exit(1)

import sys

try:
    input, output, incl_excl, select = sys.argv[1:]
except:
    stop_err('you must provide the arguments input, output, incl_excl and 
select.')
    
lines = {}
try:
    lines = dict([(int(num), '') for num in select.split(',')])
except:
    stop_err('Did you remember to number the input dataset?')

include = bool(int(incl_excl))
if include:
    print 'Including selected lines...'
else:
    print 'Excluding selected lines...'

f_out = open(output, 'w')
with open(input) as f_in:
    for line in f_in:
        cols = line.split('\t')
        try:
            num = int(cols[-1])
        except:
            stop_err('Did you remember to number the input dataset?')
        if include:
            if num in lines:
                f_out.write('\t'.join(cols[:-1])+'\n')
        else:
            if not num in lines:
                f_out.write('\t'.join(cols[:-1])+'\n')
f_in.close()
f_out.close()

print 'Done.'
            

________________________________________
From: Igor Topcin [igortop...@gmail.com]
Sent: Friday, May 09, 2014 1:05 PM
To: Dooley, Damion
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Inform tool interface with data specific to selected 
dataset

Hi Damion,
Would you mind sharing your approach with us all?
Thanks!
Igor

On May 9, 2014 1:51 PM, "Dooley, Damion" 
<damion.doo...@bccdc.ca<mailto:damion.doo...@bccdc.ca>> wrote:
Hello, Eric,

If the dynamic filters approach doesn't work out I can send you an approach 
that worked for me.  It involves creating a tool-generated html report that 
contains a form which provides selection choices; and the form is set to submit 
to a 2nd tool of your choice tool (it contains the necessary fields to prime 
the tool).  Not sure if it works on every breed of galaxy out there though.

d.
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Inform tool interface with data specific to selected dataset

Reply via email to