Re: [galaxy-dev] [CONTENT] Re: Re: Re: Re: Unable to remove old datasets

2014-04-04 Thread Nate Coraor
Hi Ravi,

I believe admin_cleanup_datasets.py only works on database times. The rest
of your assumptions are likely correct, although without looking at more
details of the database I can't confirm.

--nate


On Fri, Mar 28, 2014 at 5:12 PM, Sanka, Ravi rsa...@jcvi.org wrote:

 Hi Nate,

 I checked and there are 3 rows of dataset 301 in the
 history_dataset_association table (none in
 library_dataset_dataset_association):

dataset_id create_time update_time deleted  301 2/14/14 18:49 3/25/14
 20:27 TRUE  301 3/6/14 15:48 3/25/14 18:41 TRUE  301 3/6/14 20:11 3/6/14
 20:11 FALSE

 The one with the most recent create_time has its deleted status set to
 false. The other 2, older ones are true.

 I would have guessed that the most recent create_time instance is still
 false due being created within 30 days, but the second most recent is only
 5 hours older and is set to true. Perhaps that instance was deleted by its
 user. That would cause its deleted status to become true, correct?

 I assume that if I were to wait until all 3 instances' create_times are
 past 30 days, my process will work, as admin_cleanup_datasets.py will set
 all 3 instances to false.

 Perchance, is there any setting on admin_cleanup_datasets.py that would
 cause it to judge datasets by their physical file's timestamp instead?

 --
 Ravi Sanka
 ICS - Sr. Bioinformatics Engineer
 J. Craig Venter Institute
 301-795-7743
 --

 From: Nate Coraor n...@bx.psu.edu
 Date: Friday, March 28, 2014 1:56 PM

 To: Ravi Sanka rsa...@jcvi.org
 Cc: Carl Eberhard carlfeberh...@gmail.com, Peter Cock 
 p.j.a.c...@googlemail.com, galaxy-dev@lists.bx.psu.edu 
 galaxy-dev@lists.bx.psu.edu
 Subject: [CONTENT] Re: Re: [galaxy-dev] Re: Re: Unable to remove old
 datasets

 Hi Ravi,

 Can you check whether any other history_dataset_association or
 library_dataset_dataset_association rows exist which reference the
 dataset_id that you are attempting to remove?

 When you run admin_cleanup_datasets.py, it'll set
 history_dataset_association.deleted = true. After that is done, you need to
 run cleanup_datasets.py with the `-6 -d 0` option to mark dataset.deleted =
 true, followed by `-3 -d 0 -r ` to remove the dataset file from disk and
 set dataset.purged = true. Note that the latter two operations will not do
 anything until *all* associated history_dataset_association and
 library_dataset_dataset_association rows are set to deleted = true.

 --nate


 On Fri, Mar 28, 2014 at 1:52 PM, Sanka, Ravi rsa...@jcvi.org wrote:

 Hi Nate,

 I checked the dataset's entry in history_dataset_association, and the
 value in field deleted is true.

 But if this does not enable the cleanup scripts to remove the dataset
 from disk, then how can I accomplish that? As an admin, my intention is to
 completely remove datasets that are past a certain age from Galaxy,
 including all instances of the dataset that may exist, regardless of
 whether or not the various users who own said instances have deleted them
 from their histories.

 Can this be done with admin_cleanup_datasets.py? If so, how?

 --
 Ravi Sanka
 ICS - Sr. Bioinformatics Engineer
 J. Craig Venter Institute
 301-795-7743
 --

 From: Nate Coraor n...@bx.psu.edu
 Date: Friday, March 28, 2014 9:59 AM
 To: Ravi Sanka rsa...@jcvi.org
 Cc: Carl Eberhard carlfeberh...@gmail.com, Peter Cock 
 p.j.a.c...@googlemail.com, galaxy-dev@lists.bx.psu.edu 
 galaxy-dev@lists.bx.psu.edu
 Subject: [CONTENT] Re: [galaxy-dev] Re: Re: Unable to remove old datasets

 Hi Ravi,

 If you take a look at the dataset's entry in the
 history_dataset_association table, is that marked deleted?
 admin_cleanup_datasets.py only marks history_dataset_association rows
 deleted, not datasets.

 Running the cleanup_datasets.py flow with -d 0 should have then caused
 the dataset to be deleted and purged, but this may not be the case if there
 is more than one instance of the dataset you are trying to purge (either
 another copy in a history somewhere, or in a library).

 --nate


 On Tue, Mar 25, 2014 at 5:12 PM, Sanka, Ravi rsa...@jcvi.org wrote:

 I have now been able to successfully remove datasets from disk. After
 deleting the dataset or history from the front-end interface (as the user),
 I then run the cleanup scripts as admin:

 python ./scripts/cleanup_datasets/cleanup_datasets.py
 ./universe_wsgi.ini -d 0 -1 $@ 
 ./scripts/cleanup_datasets/delete_userless_histories.log
 python ./scripts/cleanup_datasets/cleanup_datasets.py
 ./universe_wsgi.ini -d 0 -2 -r $@ 
 ./scripts/cleanup_datasets/purge_histories.log
 python ./scripts/cleanup_datasets/cleanup_datasets.py
 ./universe_wsgi.ini -d 0 -3 -r $@ 
 ./scripts/cleanup_datasets/purge_datasets.log
 python ./scripts/cleanup_datasets/cleanup_datasets.py
 ./universe_wsgi.ini -d 0 -5 -r $@ 
 ./scripts/cleanup_datasets/purge_folders.log

Re: [galaxy-dev] [CONTENT] Re: Re: Re: Re: Unable to remove old datasets

2014-03-28 Thread Sanka, Ravi
Hi Nate,

I checked and there are 3 rows of dataset 301 in the 
history_dataset_association table (none in library_dataset_dataset_association):

dataset_id  create_time update_time deleted
301 2/14/14 18:49   3/25/14 20:27   TRUE
301 3/6/14 15:483/25/14 18:41   TRUE
301 3/6/14 20:113/6/14 20:11FALSE

The one with the most recent create_time has its deleted status set to false. 
The other 2, older ones are true.

I would have guessed that the most recent create_time instance is still false 
due being created within 30 days, but the second most recent is only 5 hours 
older and is set to true. Perhaps that instance was deleted by its user. That 
would cause its deleted status to become true, correct?

I assume that if I were to wait until all 3 instances' create_times are past 30 
days, my process will work, as admin_cleanup_datasets.py will set all 3 
instances to false.

Perchance, is there any setting on admin_cleanup_datasets.py that would cause 
it to judge datasets by their physical file's timestamp instead?

--
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
301-795-7743
--

From: Nate Coraor n...@bx.psu.edumailto:n...@bx.psu.edu
Date: Friday, March 28, 2014 1:56 PM
To: Ravi Sanka rsa...@jcvi.orgmailto:rsa...@jcvi.org
Cc: Carl Eberhard carlfeberh...@gmail.commailto:carlfeberh...@gmail.com, 
Peter Cock p.j.a.c...@googlemail.commailto:p.j.a.c...@googlemail.com, 
galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu 
galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu
Subject: [CONTENT] Re: Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

Can you check whether any other history_dataset_association or 
library_dataset_dataset_association rows exist which reference the dataset_id 
that you are attempting to remove?

When you run admin_cleanup_datasets.py, it'll set 
history_dataset_association.deleted = true. After that is done, you need to run 
cleanup_datasets.py with the `-6 -d 0` option to mark dataset.deleted = true, 
followed by `-3 -d 0 -r ` to remove the dataset file from disk and set 
dataset.purged = true. Note that the latter two operations will not do anything 
until *all* associated history_dataset_association and 
library_dataset_dataset_association rows are set to deleted = true.

--nate


On Fri, Mar 28, 2014 at 1:52 PM, Sanka, Ravi 
rsa...@jcvi.orgmailto:rsa...@jcvi.org wrote:
Hi Nate,

I checked the dataset's entry in history_dataset_association, and the value in 
field deleted is true.

But if this does not enable the cleanup scripts to remove the dataset from 
disk, then how can I accomplish that? As an admin, my intention is to 
completely remove datasets that are past a certain age from Galaxy, including 
all instances of the dataset that may exist, regardless of whether or not the 
various users who own said instances have deleted them from their histories.

Can this be done with admin_cleanup_datasets.py? If so, how?

--
Ravi Sanka
ICS – Sr. Bioinformatics Engineer
J. Craig Venter Institute
301-795-7743tel:301-795-7743
--

From: Nate Coraor n...@bx.psu.edumailto:n...@bx.psu.edu
Date: Friday, March 28, 2014 9:59 AM
To: Ravi Sanka rsa...@jcvi.orgmailto:rsa...@jcvi.org
Cc: Carl Eberhard carlfeberh...@gmail.commailto:carlfeberh...@gmail.com, 
Peter Cock p.j.a.c...@googlemail.commailto:p.j.a.c...@googlemail.com, 
galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu 
galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu
Subject: [CONTENT] Re: [galaxy-dev] Re: Re: Unable to remove old datasets

Hi Ravi,

If you take a look at the dataset's entry in the history_dataset_association 
table, is that marked deleted? admin_cleanup_datasets.py only marks 
history_dataset_association rows deleted, not datasets.

Running the cleanup_datasets.py flow with -d 0 should have then caused the 
dataset to be deleted and purged, but this may not be the case if there is more 
than one instance of the dataset you are trying to purge (either another copy 
in a history somewhere, or in a library).

--nate


On Tue, Mar 25, 2014 at 5:12 PM, Sanka, Ravi 
rsa...@jcvi.orgmailto:rsa...@jcvi.org wrote:
I have now been able to successfully remove datasets from disk. After deleting 
the dataset or history from the front-end interface (as the user), I then run 
the cleanup scripts as admin:

python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 
-1 $@  ./scripts/cleanup_datasets/delete_userless_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 
-2 -r $@  ./scripts/cleanup_datasets/purge_histories.log
python ./scripts/cleanup_datasets/cleanup_datasets.py ./universe_wsgi.ini -d 0 
-3 -r $@  ./scripts/cleanup_datasets/purge_datasets.log
python ./scripts/cleanup_datasets