On Fri, May 24, 2019 at 10:22:28AM -0700, Gabriel Galson wrote: > Group- > > We're populating an XMLUI DSpace instance with test data in preparation for > its launch. I'm exporting metadata from a collection containing a mix of > items. Certain fields' contents are, when viewed in the sheet exported, > followed by a range of bracketed qualifiers. > > We have... > > dc.description.abstract > and > dc.description.abstract[] > > also... > > dc.publisher[] > and > dc.publisher[en_US] > > We also have... > > dc.description.provenance[en]
Oh, you have it easy then. I recently had to normalize three or four variations on "en_US", some almost standard-conforming and some rather fanciful, as well as cleaning out quite a few "[]"s. > My questions are, > 1) On this documentation page > <https://wiki.duraspace.org/display/DSDOC5x/Localization+L10n>on > localization it states "Metadata localization: DSpace associates each > metadata field value with a language code (though it may be left empty, > e.g. for numeric values)." Is this the process governing the creation of > these bracketed values? Is there a more robust description of how this > works? It is, and I don't think so. It seems that, if you don't assign a language, from the GUI the language is set to NULL (which is IMHO correct, and will result in no brackets in the export), but some of the batch tools store "no language given" as the zero-length string, which is not the same and gives rise to the empty brackets. > 2) How are they assigned per-field? Through a configuration (if so, > which)? Based on the field's contents? The language is specified by the submitter. You might be collecting items with titles in various languages, for example. You might even have an item titled in more than one language. I see that there are no language controls for individual fields on the submission form, but you can change the language of a metadata value by editing a completed item (Context | Edit this item) and selecting the Item Metadata tab. > 3) The essential question: why are we getting different versions of the > same field? Note that some items were ingested through SWORD, while others > were manually uploaded, or Bulk uploaded in Simple Archive Format. That is probably why. > 4) In terms of bulk editing > <https://wiki.duraspace.org/display/DSDOC6x/Batch+Metadata+Editing#BatchMetadataEditing-EditingtheCSV> > through > the UI via CSV, can values in, say, dc.publisher[] and dc.publisher[en_US], > be left in separate columns for reupload, or must they be consolidated? That would be a local policy decision. I don't think that DSpace cares. > 5) is there a way to prevent these bracketed values from being > assigned/created? Having divergent versions of the same field in the edit > metadata CSV complicates our metadata administration workflows. The zero-length string values are caused by a bug (or perhaps several). We need to track down all the ways that can happen, and fix it. I found these issues which seem to be related to the problem: https://jira.duraspace.org/browse/DS-2174 https://jira.duraspace.org/browse/DS-4169 https://jira.duraspace.org/browse/DS-3479 https://jira.duraspace.org/browse/DS-2548 Both "en" and "en_US" are legitimate values. Which one is correct for a given item would be a local judgment. DSpace could (IMHO should) be more helpful by rejecting values which are not standard-conforming, but first we need to specify which standard we use for these values. -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu -- All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/20190528133249.GB26983%40IUPUI.Edu.
signature.asc
Description: PGP signature