We are in the process of "moving" a Collection of documents out of our main
DSpace repository, and into another separate DSpace instance. Our reasons for
doing this are as follows:
1. We currently have 139,000 documents in this Collection and will eventually
have approximately 900,000. The 139,000 documents currently comprises 56% of
our entire repository.
2. This particular Collection is being added to DSpace as part of a mass
digitization effort, however it does not contain any metadata and is not nearly
as useful to our researchers as the remainder of the repository.
3. The processing times for our filter-media, browse/search index build and
updates, etc are significantly increased due to the large number of documents
in this collection and online performance is negatively affected.
4. The document ingest process is slowly degrading in direct proportion to
the number of documents in our repository.
I just came across the documentation on "Registration" in DSpace 1.5.1 vs. the
physical export/import from one database/instance to another, separate one, and
I'm thinking this would be a much easier and quicker way to migrate these
documents from our main repository to the new one we've created just to house
this one Collection. My understanding is that basically you are exporting the
metadata from one instance to another, you leave the documents in their
original location in /<dspace>/assetstore(n), and the new Item in the new
repository contains a pointer (in the bitstream table...?) to the original
location of the document.
Is my understanding correct? If so, given that we are leaving the original
assetstore directory(s) as is, but moving the metadata to a separate database
(in the same postgreSQL environment on the same server) and deploying our web
application in a separate directory on the same machine, are we really going to
see a significant improvement in processing time and performance in the
original repository? I am thinking we will, but I would like to hear from
anyone else who may have had a similar situation or has an idea of what we can
expect.
Also, I'm trying to figure out exactly how the filter-media, browse/search
index build and updates, etc are going to work if we end up using the
"Registration" technique? Will we end up with a single /<dspace>/search
directory with combined indexes, or will we have two separate ones? How will
our searching and browsing be impacted?
Any thoughts or recommendations are welcome!
Thanks in advance,
Sue
Sue Walker-Thornton
ConITS Contract
NASA Langley Research Center
Integrated Library Systems Application & Database Administrator
130 Research Drive
Hampton, VA 23666
Office: (757) 224-4074
Fax: (757) 224-4001
Pager: (757) 988-2547
Email: susan.m.thorn...@nasa.gov<mailto:susan.m.thorn...@nasa.gov>
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech