Re: [galaxy-dev] More meaningful dataset names/easier method of identifying?

Dannon Baker Tue, 24 Apr 2012 13:52:25 -0700

In changeset 7013:dae7eefe2f71 I added the full file path to the dataset "View 
Details" page.  Galaxy administrators will always see this, and if you set 
expose_dataset_path to True in your universe_wsgi.ini, users will see it as 
well.  Hopefully that's what you're looking for, but let me know if I've 
misunderstood what you're after and I can take another look.


-Dannon

On Apr 24, 2012, at 4:41 PM, Josh Nielsen wrote:

> Hello,
> 
> For a while now with the Galaxy mirror that we have I have found on many 
> occasions a need to identify which dataset_*.dat files on the file system (in 
> the "[galaxy_dist]/database/files/000/" directory) belong to which user, and 
> even for the same user to distinguish between their various datasets. Files 
> directly uploaded by the user will have a Galaxy job & dataset file name 
> which match - like a Galaxy job name of "data 18" (for example) which 
> actually is reflective of the file name 'dataset_18.dat' on the file system. 
> However any analysis on that file thereafter that produces another dataset 
> does not give you a clue of the corresponding file name. For example, a "Clip 
> on data 18" run some time later may be called 'dataset_44.dat' on the 
> filesystem, and a "Map with Bowtie on data 18" that runs on the clipped 
> 'dataset_44.dat' may produce an output file of 'dataset_53.dat'. 
> 
> When debugging failed jobs, and after the user has rerun them for the 
> umpteenth time, there may be dozens of identical or near-identical files to 
> weed through, and the generic naming scheme is not helpful even though it is 
> sequential (also not easy to keep track of/match up unless you are watching 
> the file writes in the directory live). The current implementation makes 
> sense for internal usage and the code that uses it, but it is difficult for a 
> human to distinguish which files match the jobs in Galaxy. 
> 
> It would be useful to have more meaningful dataset file names or an easier 
> way to identify them (a record that matches the "internal" and "external" 
> names) for administrative maintenance reasons so that I can delete files, or 
> possibly even export those .dat files to a network share where our users can 
> perform manual analysis on them. Could anyone point me to where in the code I 
> could look to make the dataset names more meaningful? Or perhaps I should 
> request of the Galaxy developers (as a feature) a way for the users 
> themselves to see under the "metadata name" of their job (like "Map with 
> Bowtie on data 18") in the right side pane the *actual* corresponding file 
> and location on the file system path to it (dataset_53.dat, for example). Or 
> if not for users at least something for Administrators. Even a database that 
> has four columns for the internal/filesystem dataset name, the job metadata 
> name, the Galaxy job number (that the user sees), and the user that the 
> dataset belong!
 s to, would be helpful. A lot of our users are heavy into informatics though 
and would probably prefer that the user be able to see that information. Does 
anyone have any suggestions or thoughts about this?
> 
> Thanks,
> Josh Nielsen
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] More meaningful dataset names/easier method of identifying?

Reply via email to