Hi Steven,

collection2item contains 2 values, collection_id and item_id -- it maps all
items to any collections they are in (a many to many relationship between
item and collection.

The owning collection is in the item record. So a mapped item is one that
appears twice or more in collection2item, and in all but one entry in
collection2item, the corresponding collection_id != the
item.owning_collection.

Another way to put it is c2i has every item in it and every collection in
it. Owning collection (in the item record) only has the original collection
it was uploaded to. Mapped collections appear in c2i but so do unmapped
items.

It may shed some light on it to run the two queries to at least narrow it
down. You could maybe look at them via the UI after that?

Joseph



-----Original Message-----
From: Hayles, Steven [mailto:[email protected]] 
Sent: 30 January 2015 16:16
To: '[email protected]'
Subject: RE: Best approach to bulk delete empty collections

Hi Joseph

I was thinking that I'd need to look in the following tables to fully
understand what was going on with a collection

collection
collection2item
workflowitem
workspaceitem
collection_item_count
item

Is collection2item what handles mappings, while item.owning_collection
identifies to 'real' parent collection of an item?

Thanks

Steven

Steven Hayles
Systems Analyst

IT Services, University of Leicester,
Propsect House, 94 Regent Rd, Leicester, LE1 7DA, UK

T: +44 (0)116 229 7950
E: [email protected]

The Queen's Anniversary Prizes 1994, 2002 & 2013 THE Awards Winners
2007-2013

Elite without being elitist

Follow us on Twitter http://twitter.com/uniofleicester or visit our Facebook
page https://facebook.com/UniofLeicester 




-----Original Message-----
From: Joseph Greene [mailto:[email protected]] 
Sent: 30 January 2015 11:40
To: [email protected]
Cc: Hayles, Steven
Subject: RE: Best approach to bulk delete empty collections

Actually that query was a little restrictive. This may be better:

select collection_id, name from collection where collection_id not in
    (select owning_collection from item where in_archive = true and
withdrawn = false);

So any collection that shows 0 items live, but may have withdrawn items and
may also have items in submitters' workflows (ie cancelled uploads). Problem
is 'virtual' collections with only mapped items, they will appear as having
no items. Combining the results of the two queries may help...I'm sure
there's a way to output the count of mapped items into the results.

Joseph

-----Original Message-----
From: Joseph Greene [mailto:[email protected]] 
Sent: 30 January 2015 11:28
To: [email protected]
Cc: [email protected]
Subject: RE: Best approach to bulk delete empty collections

Hello Steven,

As a start here is a query that will identify empty collections:

select collection_id, name from collection where collection_id not in 
    (select distinct collection_id from collection2item);

To my knowledge that should work on 1.4.2's database (was on 1.4.2 recently
enough).

Best regards,
Joseph

Joseph Greene
Research Repository Librarian
James Joyce Library
University College Dublin
(353 0)1 716 7398
[email protected]
http://researchrepository.ucd.ie




Message: 2
Date: Fri, 30 Jan 2015 09:26:41 +0000 (GMT)
From: Steven Hayles <[email protected]>
Subject: [Dspace-tech] Best approach to bulk delete empty collections
        (DSpace 4.2)
To: [email protected]
Message-ID: <[email protected]>
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed


Hi

We have a DSpace 4.2 instance with many empty collections. What's the best
approach to finding and deleting them?

Thanks

Steven Hayles
Systems Analyst

IT Services, University of Leicester,
Propsect House, 94 Regent Rd, Leicester, LE1 7DA, UK

T: +44 (0)116 229 7950
E: [email protected]

The Queen's Anniversary Prizes 1994, 2002 & 2013 THE Awards Winners
2007-2013

Elite without being elitist

Follow us on Twitter http://twitter.com/uniofleicester or visit our Facebook
page https://facebook.com/UniofLeicester







------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to