Ah, macOS! Yes, I also spent two days troubleshooting this, chasing
issues with Spanish accent characters while working with SAF bundles.
I had been generating the SAF bundle on my Mac for testing and then
rsync'ing the bundle to our Linux server for import. It turns out that
the macOS filesystem is not UTF-8 but uses Unicode equivalence, and
you need to rsync the bitstreams to the Linux server and generate the
SAF bundle on the Linux filesystem.

Here's a good blog post describing UTF-8 issues on macOS HFS+:

https://blog.vrypan.net/2012/11/13/hfsplus-unicode-and-accented-chars/

Good luck!

On Mon, Dec 12, 2016 at 10:28 AM, Sidoroff, Ilja
<ilja.sidor...@helsinki.fi> wrote:
> It seems that there is something fishy going on with macOS's UTF-8 handling. 
> I am not entirely sure what are the details of the underlying 
> incompatibility, but I got the transfer working when I copied the files from 
> my Mac to RH Linux server with
>
> $ rsync --iconv=UTF-8-MAC,UTF-8
>
> Ilja
>
>> On 12 Dec 2016, at 09:37, Sidoroff, Ilja <ilja.sidor...@helsinki.fi> wrote:
>>
>> Hmm... something strange going on. I replaced spaces with underscores in the 
>> filenames, and now I get the following:
>>
>> when importing:
>>
>> java.io.FileNotFoundException: 
>> ark/item_003/Helsingin_yliopisto_on_tietoaineistojen_hallinnan_edelläkävijä.pdf
>>  (No such file or directory)
>>
>> copy-pasting the path to terminal:
>>
>> $ ls 
>> ark/item_003/Helsingin_yliopisto_on_tietoaineistojen_hallinnan_edelläkävijä.pdf
>> ls: cannot access 
>> ark/item_003/Helsingin_yliopisto_on_tietoaineistojen_hallinnan_edelläkävijä.pdf:
>>  No such file or directory
>>
>> but listing the directory contents:
>>
>> $ ls ark/item_003/
>> contents  dublin_core.xml  
>> Helsingin_yliopisto_on_tietoaineistojen_hallinnan_edelläkävijä.pdf
>>
>> or here:
>>
>> $ ls -l ark/item_003/
>> total 16
>> -rw-r--r-- 1 sidoroff sidoroff   70 Dec 12 09:16 contents
>> -rw-r--r-- 1 sidoroff sidoroff  807 Dec 12 09:16 dublin_core.xml
>> -rw-r--r-- 1 sidoroff sidoroff 6109 Dec 12 09:16 
>> Helsingin_yliopisto_on_tietoaineistojen_hallinnan_edelläkävijä.pdf
>>
>> So the problem seems to be somewhere in the Unicode/UTF-8 handling of RHEL 7 
>> or my Mac, where I prepared the import package.
>>
>> Ilja
>>
>>> On 12 Dec 2016, at 08:49, Sidoroff, Ilja <ilja.sidor...@helsinki.fi> wrote:
>>>
>>> Hi Tom,
>>>
>>> my locale is
>>>
>>> LANG=en_US.UTF-8
>>> LC_CTYPE="en_US.UTF-8"
>>> LC_NUMERIC="en_US.UTF-8"
>>> LC_TIME="en_US.UTF-8"
>>> LC_COLLATE="en_US.UTF-8"
>>> LC_MONETARY="en_US.UTF-8"
>>> LC_MESSAGES="en_US.UTF-8"
>>> LC_PAPER="en_US.UTF-8"
>>> LC_NAME="en_US.UTF-8"
>>> LC_ADDRESS="en_US.UTF-8"
>>> LC_TELEPHONE="en_US.UTF-8"
>>> LC_MEASUREMENT="en_US.UTF-8"
>>> LC_IDENTIFICATION="en_US.UTF-8"
>>> LC_ALL=
>>>
>>> and I get the same errors with LC_ALL="" or "en_US.UTF-8". I think I'll try 
>>> next to see if this is a something happening in OS or Java-level.
>>>
>>>
>>> Ilja
>>>> On 07 Dec 2016, at 14:57, Tom Desair <tom.des...@atmire.com> wrote:
>>>>
>>>> Hi Ilja,
>>>>
>>>> One of our clients had a similar problem. Can you give me the output of 
>>>> the "locale" command on your DSpace server?
>>>>
>>>> Can you also try setting the "LC_ALL" environment variable to an empty 
>>>> string or "en_US.UTF-8" before running the import:
>>>> $ export LC_ALL=""
>>>> $ bin/dspace import ...
>>>> or
>>>> $ export LC_ALL="en_US.UTF-8"
>>>> $ bin/dspace import ...
>>>>
>>>> More information on this can be found here: 
>>>> http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4733494
>>>>
>>>> Best regards,
>>>> Tom
>>>>
>>>>
>>>>
>>>>     Tom Desair
>>>> 250-B Suite 3A, Lucius Gordon Drive, West Henrietta, NY 14586
>>>> Esperantolaan 4, Heverlee 3001, Belgium
>>>> www.atmire.com
>>>>
>>>> 2016-12-07 13:43 GMT+01:00 Sidoroff, Ilja <ilja.sidor...@helsinki.fi>:
>>>> Hello,
>>>>
>>>> I noticed some weird behaviour when trying to import items into DSpace 
>>>> using command line and simple archive format. I noticed that if I have 
>>>> bitstreams, whose names contain both SPACEs and scandinavian special 
>>>> characters, import fails, when the OS cannot find the bitstream in 
>>>> question.
>>>>
>>>> For instance, a bitstream name with space is ok:
>>>>
>>>> Adding item from directory item_002
>>>>       Loading dublin core from ark/item_002/dublin_core.xml
>>>>       ...
>>>>       Processing contents file: ark/item_002/contents
>>>>       Bitstream: Digigraduille uusi prosessi.pdf
>>>>
>>>> Bitstream name with 'ä' (a+uml) is ok:
>>>>
>>>> Adding item from directory item_005
>>>>       Loading dublin core from ark/item_005/dublin_core.xml
>>>>       ...
>>>>       Bitstream: Käisä1.pdf
>>>>
>>>> But this is not ok:
>>>>
>>>> Adding item from directory item_006
>>>>       Loading dublin core from ark/item_006/dublin_core.xml
>>>>       ...
>>>> java.io.FileNotFoundException: ark/item_006/Kirjastoelämää Bolognassa.pdf 
>>>> (No such file or directory)
>>>> ...
>>>> java.io.FileNotFoundException: ark/item_006/Kirjastoelämää Bolognassa.pdf 
>>>> (No such file or directory)
>>>>
>>>>
>>>> stracing the import gives the underlying error:
>>>>
>>>> 21515 open("ark/item_006/Kirjastoel\303\244m\303\244\303\244 
>>>> Bolognassa.pdf", O_RDONLY) = -1 ENOENT (No such file or directory)
>>>>
>>>> I'm using RHEL 7.2, with LANG=en_US.UTF-8. I'm not sure whether is some 
>>>> operating system (or even filesystem? XFS) specific behaviour, or if the 
>>>> java is the culprit, or if this could be helped with some Java IO magic 
>>>> (and thus worth opening a ticket). I tested this with DSpace 6.0, but I 
>>>> think this would happen with other versions as well.
>>>>
>>>>
>>>> Ilja Sidoroff
>>>> Information Systems Specialist
>>>> Helsinki University Library
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "DSpace Technical Support" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to dspace-tech+unsubscr...@googlegroups.com.
>>>> To post to this group, send email to dspace-tech@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/dspace-tech.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "DSpace Technical Support" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to dspace-tech+unsubscr...@googlegroups.com.
>>>> To post to this group, send email to dspace-tech@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/dspace-tech.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "DSpace Technical Support" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to dspace-tech+unsubscr...@googlegroups.com.
>>> To post to this group, send email to dspace-tech@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/dspace-tech.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "DSpace Technical Support" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to dspace-tech+unsubscr...@googlegroups.com.
>> To post to this group, send email to dspace-tech@googlegroups.com.
>> Visit this group at https://groups.google.com/group/dspace-tech.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to dspace-tech+unsubscr...@googlegroups.com.
> To post to this group, send email to dspace-tech@googlegroups.com.
> Visit this group at https://groups.google.com/group/dspace-tech.
> For more options, visit https://groups.google.com/d/optout.



-- 
Alan Orth
alan.o...@gmail.com
https://englishbulgaria.net
https://alaninkenya.org
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche
GPG public key ID: 0x8cb0d0acb5cd81ec209c6cdfbd1a0e09c2f836c0

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to