On May 15, 2012, at 10:15 AM, Jean-Christophe Ducom wrote:

> Thank you for your email Peter.
> We have implemented Galaxy to interface with our HPC cluster via PBS/Torque. 
> Thanks to DRMAA (not PBS python) all user cpu usage can be accounted.The 
> motivation is indeed what you describe besides managing cost/disk performance 
> on user/project basis as we have a tiered storage.
> Our filesystem is GPFS which as you might know has one (amongst many) nice 
> feature called fileset: it's basically a data bucket that reports usage 
> disregarding the Unix ownership. It works great for project type directory. 
> The file name length is a legitimate one indeed for command line limitation 
> (GPFS has same length name limit as ext3/4). The current filename can remain 
> unmodified: the requested schema would only introduce the user database ID 
> (usually 3-4 digits) in the path e.g.
> ~/galaxy-dist/database/files/000/dataset_0001.dat
> ~/galaxy-dist/database/files/001/dataset_0002.dat
> ~/galaxy-dist/database/files/002/dataset_0003.dat
> ~/galaxy-dist/database/files/000/dataset_0004.dat
> ~/galaxy-dist/database/files/000/dataset_0005.dat
> ~/galaxy-dist/database/files/000/dataset_0006.dat
> ~/galaxy-dist/database/files/002/dataset_0007.dat
> [...]
> Any thoughts?
> Thanks again
> JC

Hi JC,

As Peter mentions, there's no clear way to determine ownership when data is 
shared.  The best you could do is identify the user that originally created a 
dataset.  If you wanted to go this route, the best place to start would be an 
enhancement of the Object Store framework, at 
galaxy-dist/lib/galaxy/objectstore/__init__.py

If you're not aware, Galaxy does have internal disk accounting and quota 
features:

    http://wiki.g2.bx.psu.edu/Admin/Disk%20Quotas

--nate

> 
> ________________________________________
> From: Peter Cock [p.j.a.c...@googlemail.com]
> Sent: Tuesday, May 15, 2012 1:38 AM
> To: Jean-Christophe Ducom
> Cc: galaxy-dev@lists.bx.psu.edu
> Subject: Re: [galaxy-dev] user data upload directory structure
> 
> On Mon, May 14, 2012 at 10:22 PM, Jean-Christophe Ducom
> <jcdu...@scripps.edu> wrote:
>> All-
>> Is there a way to change the upload default directory structure
>> (/database/files) to organize files per user_id instead?
>> something along the following lines
>> ~galaxy-dist/database/files/postgresql_user_id0
>> ~galaxy-dist/database/files/postgresql_user_id1
>> 
>> Thank you
>> JC
> 
> I can see that being useful for a quick way to look at per user
> disk usage - although you'd have problems counting with
> shared data. Is that your motivation?
> 
> Another concern would be overly long filenames, which has
> a direct impact on the command line lengths used to call the
> tools - there are OS limits on this.
> 
> Peter
> 
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to