On Fri, May 24, 2019 at 10:22:28AM -0700, Gabriel Galson wrote:
> Group-  
> 
> We're populating an XMLUI DSpace instance with test data in preparation for 
> its launch.  I'm exporting metadata from a collection containing a mix of 
> items.  Certain fields' contents are, when viewed in the sheet exported, 
> followed by a range of bracketed qualifiers. 
> 
> We have...
> 
> dc.description.abstract
> and 
> dc.description.abstract[]
> 
> also...
> 
> dc.publisher[]
> and 
> dc.publisher[en_US]
> 
> We also have...
> 
> dc.description.provenance[en]

Oh, you have it easy then.  I recently had to normalize three or four
variations on "en_US", some almost standard-conforming and some rather
fanciful, as well as cleaning out quite a few "[]"s.

> My questions are, 
> 1)  On this documentation page  
> <https://wiki.duraspace.org/display/DSDOC5x/Localization+L10n>on 
> localization it states "Metadata localization: DSpace associates each 
> metadata field value with a language code (though it may be left empty, 
> e.g. for numeric values)."  Is this the process governing the creation of 
> these bracketed values?  Is there a more robust description of how this 
> works?  

It is, and I don't think so.  It seems that, if you don't assign a
language, from the GUI the language is set to NULL (which is IMHO
correct, and will result in no brackets in the export), but some of
the batch tools store "no language given" as the zero-length string,
which is not the same and gives rise to the empty brackets.

> 2)  How are they assigned per-field?  Through a configuration (if so, 
> which)?   Based on the field's contents? 

The language is specified by the submitter.  You might be collecting
items with titles in various languages, for example.  You might even
have an item titled in more than one language.

I see that there are no language controls for individual fields on the
submission form, but you can change the language of a metadata value
by editing a completed item (Context | Edit this item) and selecting
the Item Metadata tab.

> 3)  The essential question: why are we getting different versions of the 
> same field?  Note that some items were ingested through SWORD, while others 
> were manually uploaded, or Bulk uploaded in Simple Archive Format.  

That is probably why.

> 4)  In terms of bulk editing 
> <https://wiki.duraspace.org/display/DSDOC6x/Batch+Metadata+Editing#BatchMetadataEditing-EditingtheCSV>
>  through 
> the UI via CSV, can values in, say, dc.publisher[] and dc.publisher[en_US], 
> be left in separate columns for reupload, or must they be consolidated?  

That would be a local policy decision.  I don't think that DSpace cares.

> 5) is there a way to prevent these bracketed values from being 
> assigned/created?  Having divergent versions of the same field in the edit 
> metadata CSV complicates our metadata administration workflows.  

The zero-length string values are caused by a bug (or perhaps
several).  We need to track down all the ways that can happen, and fix
it.  I found these issues which seem to be related to the problem:

https://jira.duraspace.org/browse/DS-2174
https://jira.duraspace.org/browse/DS-4169
https://jira.duraspace.org/browse/DS-3479
https://jira.duraspace.org/browse/DS-2548

Both "en" and "en_US" are legitimate values.  Which one is correct for
a given item would be a local judgment.  DSpace could (IMHO should) be
more helpful by rejecting values which are not standard-conforming,
but first we need to specify which standard we use for these values.

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/20190528133249.GB26983%40IUPUI.Edu.

Attachment: signature.asc
Description: PGP signature

Reply via email to