Hi Everyone

I'm throwing this out there for some feedback and recommendations.


Objective: Facilitate transferring large files (> 2GB) from an HPC cluster (and 
its associated fast tier storage) to galaxy for my clients.  I enabled the FTP 
upload option in galaxy but it involves users learning to copy files over FTP.

So, I created a galaxy folder in each users' home directory on the HPC Cluster 
that symbolically links to the FTP upload folder for galaxy.  Hence, users can 
use either FTP to upload files (drag and drop in windows) or simply copy files 
into this folder from an ssh session on the cluster.  The problem with that 
strategy was that galaxy had to be the owner of the file (similar to the 
ProFTPd configuration that sets the UID and GID of uploads files to galaxy's 
UID/GID).  Otherwise, galaxy threw errors when it tried deleting the original 
file from the FTP upload folder.  I could have added the galaxy user to the 
same group as all user but this meant that users would have to ensure the 
correct permissions are set on files so that galaxy can read and delete the 
file thereafter.  The alternative involved modifying the upload.py tool to 
chown/chmod files that were being uploaded.  Upload.py now sudo executes an 
external script that sets ownership to the galaxy user and corrects the 
permissions if required (see attachment for code modification).  The galaxy 
user has sudo rights on this script and the script restricts chown/chmod to the 
ftp folder path for security reasons.

I was planning to clean up the code and make it production ready by adding an 
option in universe_wsgi.ini for this "feature", but I thought I would check 
with the galaxy devs first. Am I taking the wrong approach?  Is there a better 
alternative?

As an alternative, I thought about locating the handler code for dataset.type 
== file and possibly making it support the SETGID sticky bits on folders.  In 
that case, the FTP upload folder would have the sticky bit set for UID and can 
assume the role of the user to upload that file.

Your input is much appreciated.

Iyad Kandalaft

Bioinformatics Application Developer
Agriculture and Agri-Food Canada | Agriculture et Agroalimentaire Canada
KW Neatby Bldg | éd. KW Neatby
960 Carling Ave| 960, avenue Carling
Ottawa, ON | Ottawa (ON) K1A 0C6
E-mail Address / Adresse courriel: 
iyad.kandal...@agr.gc.ca<mailto:iyad.kandal...@agr.gc.ca>
Telephone | Téléphone 613- 759-1228
Facsimile | Télécopieur 613-759-1701
Government of Canada | Gouvernement du Canada

diff -r 29ce93a13ac7 tools/data_source/upload.py
--- a/tools/data_source/upload.py       Mon Feb 10 13:22:47 2014 -0500
+++ b/tools/data_source/upload.py       Tue May 27 10:54:42 2014 -0400
@@ -4,7 +4,7 @@
 # WARNING: Changes in this tool (particularly as related to parsing) may need
 # to be reflected in galaxy.web.controllers.tool_runner and galaxy.tools
 
-import urllib, sys, os, gzip, tempfile, shutil, re, gzip, zipfile, codecs, 
binascii
+import urllib, sys, os, gzip, tempfile, shutil, re, gzip, zipfile, codecs, 
binascii, subprocess
 from galaxy import eggs
 # need to import model before sniff to resolve a circular import dependency
 import galaxy.model
@@ -352,6 +352,19 @@
                      stdout = 'uploaded %s file' % dataset.file_type )
         json_file.write( to_json_string( info ) + "\n" )
 
+def _chown( dataset ):
+    if ( dataset.type != 'file' ):
+        return
+
+    try:
+        cmd = [ '/usr/bin/sudo', '-E', 
'/home/galaxy/server-conf/chown_script.pl', dataset.path ]
+        sys.stdout.write( 'Changing ownership of %s with: %s' % ( 
dataset.path, ' '.join( cmd ) ) )
+        p = subprocess.Popen( cmd, shell=False, stdout=subprocess.PIPE, 
stderr=subprocess.PIPE )
+        sys.stdout.write( p.stdout.read() )
+        sys.stderr.write( p.stderr.read() )
+    except Exception, e:
+        sys.stderr.write( 'Changing ownership of uploaded file %s failed: %s' 
% ( dataset.path, str( e ) ) )
+
 def __main__():
 
     if len( sys.argv ) < 4:
@@ -372,6 +385,9 @@
         except:
             print >>sys.stderr, 'Output path for dataset %s not found on 
command line' % dataset.dataset_id
             sys.exit( 1 )
+
+        _chown( dataset )   
+
         if dataset.type == 'composite':
             files_path = output_paths[int( dataset.dataset_id )][1]
             add_composite_file( dataset, registry, json_file, output_path, 
files_path )
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to