Hi Steven, collection2item contains 2 values, collection_id and item_id -- it maps all items to any collections they are in (a many to many relationship between item and collection.
The owning collection is in the item record. So a mapped item is one that appears twice or more in collection2item, and in all but one entry in collection2item, the corresponding collection_id != the item.owning_collection. Another way to put it is c2i has every item in it and every collection in it. Owning collection (in the item record) only has the original collection it was uploaded to. Mapped collections appear in c2i but so do unmapped items. It may shed some light on it to run the two queries to at least narrow it down. You could maybe look at them via the UI after that? Joseph -----Original Message----- From: Hayles, Steven [mailto:[email protected]] Sent: 30 January 2015 16:16 To: '[email protected]' Subject: RE: Best approach to bulk delete empty collections Hi Joseph I was thinking that I'd need to look in the following tables to fully understand what was going on with a collection collection collection2item workflowitem workspaceitem collection_item_count item Is collection2item what handles mappings, while item.owning_collection identifies to 'real' parent collection of an item? Thanks Steven Steven Hayles Systems Analyst IT Services, University of Leicester, Propsect House, 94 Regent Rd, Leicester, LE1 7DA, UK T: +44 (0)116 229 7950 E: [email protected] The Queen's Anniversary Prizes 1994, 2002 & 2013 THE Awards Winners 2007-2013 Elite without being elitist Follow us on Twitter http://twitter.com/uniofleicester or visit our Facebook page https://facebook.com/UniofLeicester -----Original Message----- From: Joseph Greene [mailto:[email protected]] Sent: 30 January 2015 11:40 To: [email protected] Cc: Hayles, Steven Subject: RE: Best approach to bulk delete empty collections Actually that query was a little restrictive. This may be better: select collection_id, name from collection where collection_id not in (select owning_collection from item where in_archive = true and withdrawn = false); So any collection that shows 0 items live, but may have withdrawn items and may also have items in submitters' workflows (ie cancelled uploads). Problem is 'virtual' collections with only mapped items, they will appear as having no items. Combining the results of the two queries may help...I'm sure there's a way to output the count of mapped items into the results. Joseph -----Original Message----- From: Joseph Greene [mailto:[email protected]] Sent: 30 January 2015 11:28 To: [email protected] Cc: [email protected] Subject: RE: Best approach to bulk delete empty collections Hello Steven, As a start here is a query that will identify empty collections: select collection_id, name from collection where collection_id not in (select distinct collection_id from collection2item); To my knowledge that should work on 1.4.2's database (was on 1.4.2 recently enough). Best regards, Joseph Joseph Greene Research Repository Librarian James Joyce Library University College Dublin (353 0)1 716 7398 [email protected] http://researchrepository.ucd.ie Message: 2 Date: Fri, 30 Jan 2015 09:26:41 +0000 (GMT) From: Steven Hayles <[email protected]> Subject: [Dspace-tech] Best approach to bulk delete empty collections (DSpace 4.2) To: [email protected] Message-ID: <[email protected]> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Hi We have a DSpace 4.2 instance with many empty collections. What's the best approach to finding and deleting them? Thanks Steven Hayles Systems Analyst IT Services, University of Leicester, Propsect House, 94 Regent Rd, Leicester, LE1 7DA, UK T: +44 (0)116 229 7950 E: [email protected] The Queen's Anniversary Prizes 1994, 2002 & 2013 THE Awards Winners 2007-2013 Elite without being elitist Follow us on Twitter http://twitter.com/uniofleicester or visit our Facebook page https://facebook.com/UniofLeicester ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

