Hi Geert, In addition, do you think he should run manual merge after deleting 20M to free the space and improve performance?
Regards, Indy On Mon, Sep 19, 2016 at 8:58 PM, Geert Josten <[email protected]> wrote: > Hi Qambar, > > I think it makes sense to discuss this in more detail here first, and then > see if we can summarize conclusions on SO.. > > In general there are several ways to get rid of a large group of files. It > generally comes down to either: > > 1. xdmp:collection-delete and xdmp:directory-delete > 2. or a batch delete approach. > > This roughly matches the two answers on SO. > > The ‘benefit' of approach 1 is that it happens in one transaction, which > could be important to you. But you are right that a collection-delete can > take time. I would not necessarily say it will flood servers, but deleting > 20 mln docs could take up to minutes. How much exactly depends a lot on > factors like how many forests, how fast your disks are, how many MarkLogic > instances you have in your cluster, how the docs are spread across those, > etc. Deleting 20 mln docs could just as well take 10 sec, provided right > configuration, and right circumstances are met. Right circumstances also > includes things like not having triggers, not having enabled auditing etc.. > > The second approach has kind of the opposite. You won’t have the deletion > happening in one transaction (unless you care to handle transactions > yourself), but you have more control to manage load, and can take as long > as needed. There are several tools that can help spawning deletion tasks. > Corb/Corb2 is one, Taskbot is another. > > Which answer fits your case best, depends firstly on whether or not it is > important to do the collection-delete in one transaction. Secondly, the > volume of the average deletion counts, and how often you need to perform > it. It might be good to run a test on a similar environment that allows > estimating whether you can run the delete in an acceptable timeframe. > > We could go into more detail about xdmp:collection-delete, but I don’t > think that will be of much help to you. > > Instead I’d prefer returning to your description on SO, you are talking > about ‘expired’ collection items. Have you considered giving documents an > expiry date, and running a schedule that will periodically remove expired > documents? If the schedule runs for instance every hour, and would delete a > reasonable sized batch of files on average, that could help spread load for > keeping your system clean.. > > Cheers, > Geert > > From: <[email protected]> on behalf of Qambar Raza < > [email protected]> > Reply-To: MarkLogic Developer Discussion <[email protected]> > Date: Monday, September 19, 2016 at 1:00 PM > To: "[email protected]" <[email protected]> > Subject: [MarkLogic Dev General] How does "xdmp:collection-delete" work? > > Hello, > > Can anyone answer my question on stack overflow, I couldn't find a > documentation about how https://docs.marklogic.com/xdmp:collection-delete > works. > > For more details, see : > http://stackoverflow.com/questions/39571215/how-does- > marklogics-xdmpcollection-delete-work > > Thanks, > > Qambar. > > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > >
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
