Re: [galaxy-dev] More meaningful dataset names/easier method of identifying?

Hans-Rudolf Hotz Wed, 25 Apr 2012 01:05:11 -0700

Hi Josh

Are you running the additional "reports web site"?
see:  'run_reports.sh' and 'reports_wsgi.ini'

We use this extra web site a lot for debugging. It helps tracking whatan individual user is doing - kind of 'big brother is watching you'



Regards, Hans

On 04/24/2012 10:51 PM, Dannon Baker wrote:

In changeset 7013:dae7eefe2f71 I added the full file path to the dataset "View 
Details" page.  Galaxy administrators will always see this, and if you set 
expose_dataset_path to True in your universe_wsgi.ini, users will see it as well.  
Hopefully that's what you're looking for, but let me know if I've misunderstood what 
you're after and I can take another look.

-Dannon

On Apr 24, 2012, at 4:41 PM, Josh Nielsen wrote:

Hello,

For a while now with the Galaxy mirror that we have I have found on many occasions a need to identify which dataset_*.dat 
files on the file system (in the "[galaxy_dist]/database/files/000/" directory) belong to which user, and even 
for the same user to distinguish between their various datasets. Files directly uploaded by the user will have a Galaxy 
job&  dataset file name which match - like a Galaxy job name of "data 18" (for example) which actually is 
reflective of the file name 'dataset_18.dat' on the file system. However any analysis on that file thereafter that produces 
another dataset does not give you a clue of the corresponding file name. For example, a "Clip on data 18" run 
some time later may be called 'dataset_44.dat' on the filesystem, and a "Map with Bowtie on data 18" that runs on 
the clipped 'dataset_44.dat' may produce an output file of 'dataset_53.dat'.

When debugging failed jobs, and after the user has rerun them for the umpteenth 
time, there may be dozens of identical or near-identical files to weed through, 
and the generic naming scheme is not helpful even though it is sequential (also 
not easy to keep track of/match up unless you are watching the file writes in 
the directory live). The current implementation makes sense for internal usage 
and the code that uses it, but it is difficult for a human to distinguish which 
files match the jobs in Galaxy.

It would be useful to have more meaningful dataset file names or an easier way to identify them (a record that matches 
the "internal" and "external" names) for administrative maintenance reasons so that I can delete 
files, or possibly even export those .dat files to a network share where our users can perform manual analysis on them. 
Could anyone point me to where in the code I could look to make the dataset names more meaningful? Or perhaps I should 
request of the Galaxy developers (as a feature) a way for the users themselves to see under the "metadata 
name" of their job (like "Map with Bowtie on data 18") in the right side pane the *actual* corresponding 
file and location on the file system path to it (dataset_53.dat, for example). Or if not for users at least something 
for Administrators. Even a database that has four columns for the internal/filesystem dataset name, the job metadata 
name, the Galaxy job number (that the user sees), and the user that the dataset belong


  s to, would be helpful. A lot of our users are heavy into informatics though 
and would probably prefer that the user be able to see that information. Does 
anyone have any suggestions or thoughts about this?


Thanks,
Josh Nielsen
___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Re: [galaxy-dev] More meaningful dataset names/easier method of identifying?

Reply via email to