Re: [Dspace-general] Uploading 600+ CSV Records With "local" Definitions

stuart yeates Tue, 20 Aug 2013 13:25:08 -0700

On 21/08/13 07:43, Thomas Ronayne wrote:
> Hi,
>
> I have one question about the Metadata Schema Registry, specifically
> about "local" Namespace Elements. I have a bunch of date data types that
> are not found in the DC codes. What I'm wondering is is it possible to
> create a local element and qualifier? Something like
>
>      local:date:datepurchased or local:date:purchased
>
> without causing trouble (there's a bunch of date fields to be loaded)?


I suggest that you not use 'local' but something that is likely to be 
unique. We use 'vuwschema' but if I were doing it again would use 'vuw' 
the initials of my institution:

http://researcharchive.vuw.ac.nz/handle/10063/2896?show=full

> I'm going to be using AWK to "rewrite" the CSV data I have (exported
> from an old FoxBase data base) is proper form; i.e., strings in double
> quotes, put author names in first and last name fields and the like.
> That's not a big deal, it's actually pretty easy but I'd really like to
> know what the gotchas are beforehand (don't want to do this another 6 or
> 60 times). For example, all the publication dates are the year only --
> like 1375, 1749, 1810, etc. I'm going to append 06-30 (e.g., 1375-06-30)
> to every year-only date (so there's no problem with ISO date or the
> Gregorian Calendar switch at various times; just thinking ahead here.

We use 1375, 1749, 1810, for manually entered dates without a problem.

Linefeeds embedded in records can be tricky. dc.description.* fields 
often contain miscellaneous whitespace. Compare:

http://researcharchive.vuw.ac.nz/handle/10063/2896
http://researcharchive.vuw.ac.nz/handle/10063/2896?show=full

We had to change the CSS to make the whitespace appear in the first:

.simple-item-view-description div {white-space: pre-wrap;}

My understanding is that awk is not 8 bit clean, you may need to switch 
to gawk or perl if your data contains non-ascii data.

> As an aside, I have preferred to use vertical bars (|) as separators in
> CSV files for bulk loading data; e.g., a 10,000 row file of geographic
> names (nothing to do with DSpace). Vertical bars do not appear in any
> known language and there's no need to enclose string in double quotes
> (with any DBMS I've ever used, including PostgreSQL). I'm wondering if
> there is some way to define the field separator with the CSV loading
> utility just to make my life a little, teeny bit easier?

The standard for CSV is at http://tools.ietf.org/html/rfc4180 you can do 
anything permitted by that standard. Alternatively use pipes and convert 
the data to CSV using libreoffice / openoffice / whatever.

If you're using Excel, be aware that by default windows defaults to 
non-ascii file encoding, but can be beaten into submission, consult 
google or your local windows expert.

cheers
stuart
-- 
Stuart Yeates
Library Technology Services http://www.victoria.ac.nz/library/


------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Dspace-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-general

Re: [Dspace-general] Uploading 600+ CSV Records With "local" Definitions

Reply via email to