Thanks Richard. It looks like this is what I had better do! Lucy Sent from my iPad
> On 30 Jan 2016, at 20:21, Richard Jennings <[email protected]> wrote: > > Hi Lucy, > > I just wanted to say that I agree with Koen in that I use OpenOffice to > manipulate my authority documents, resource graphs and in preparing my > .arches files and find that it works very well in terms of handling issues > such as yours. I can't recommend it highly enough after having similar > problems with Excel. > > Best wishes, > > Richard > > > >> On Monday, January 25, 2016 at 9:39:35 AM UTC, Koen Van Daele wrote: >> Hi Lucy, >> >> >> character encodings are one of those nasty issues in computing that nobody >> likes tackling. If you want a detailed, yet fairly easy to follow analysis >> on why that is, see http://www.joelonsoftware.com/articles/Unicode.html >> (Cthulhu is waiting for you there though...) >> >> Basically, what Arches does is the best thing possible. That way most human >> languages can be integrated in Arches, and all you need to do is make sure >> your data is UTF-8. Unfortunately Excel makes that bloody impossible. I >> think Excel saves that file in the ISO-8859-1 encoding. That encoding just >> doesn't know the characters you're trying to save (ISO-8859-1 only contains >> 191 characters). So, it's not just Arches. I can't read them either. Excel >> should be telling you when saving as CSV that you will lose information), it >> still wouldn't work since your csv file already contains illegal ISO-8859-1 >> characters. >> >> And it's not just Excel, the whole Windows ecosystem is fundamentelly flawed >> in that regard. I myself run Linux where character encoding is handled >> correctly and UTF-8 is the default. No idea how they do it on a Mac. >> >> >> So, I think using OpenOffice is your best bet. Or just open the csv file you >> have in Notepad++ (or similar text editor), save the file as UTF-8 and fix >> the problems manually. But then you'd have to do that every time you want to >> change something. >> >> >> Cheers, >> >> Koen >> >> Van: [email protected] <[email protected]> namens Lucy FJ >> <[email protected]> >> Verzonden: zondag 24 januari 2016 12:28 >> Aan: Arches Project >> Onderwerp: Re: [Arches] Diacriticals in authority and .Arches files problems >> >> Hi Koen, >> >> Thank you for this information. I did tryout some of the suggestions on >> Google for using Excel to create UTF-8 files, because I like using Excel and >> know it well, but I have tried some and they are over complicated and >> produce a CVS file in UTF-BOM format which I believe will not work in >> Arches. It looks like I will need to download the Openoffice version as you >> suggest. Must all files loading into Arches be UTF-8 only? >> >> Lucy >> >>> On Friday, January 22, 2016 at 4:24:42 PM UTC+2, Koen Van Daele wrote: >>> Hi Lucy, >>> >>> >>> as far as I know Excel (all versions) are notoriously bad at handling >>> things like character encodings. This rather old Stackoverflow question >>> seems to confirm that: >>> >>> http://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding >>> It does offer some workarounds, but none of them are very nice. >>> >>> I would suggest writing your CSV files with Libreoffice/Openoffice. You >>> should be able to install it and it's free. While it's not always an exact >>> replacement for Excel, when it comes to character encodings, it just works. >>> By default it will save things as UTF-8 (at least under Linux it does) and >>> it will ask you if you want to save in a different encoding. >>> >>> >>> Cheers, >>> >>> Koen >>> >>> >>> Op vrijdag 22 januari 2016 15:05:52 UTC+1 schreef Lucy FJ: >>>> >>>> Hi Adam and Alexei, >>>> >>>> I forgot to add that the diacriticals are in the altnames at rows 132 to >>>> 136 when editing in Excel. >>>> >>>> Lucy >>>> ----- Original Message ----- >>>> From: Adam Cox >>>> To: Lucy Fletcher-Jones >>>> Cc: Alexei Peters ; Arches Project >>>> Sent: Thursday, January 21, 2016 5:36 PM >>>> Subject: Re: [Arches] Diacriticals in authority and .Arches files problems >>>> >>>> Hi Lucy, you can check the encoding in Notepad ++. Open your authority >>>> document with that program, and click the Encoding menu. Your file should >>>> be in "UTF-8" or "UTF-8 without BOM" (depends on the version of Notepad ++ >>>> you have). The î character should work as far as I know... >>>> >>>>> On Thu, Jan 21, 2016 at 7:18 AM, 'Lucy Fletcher-Jones' via Arches Project >>>>> <[email protected]> wrote: >>>>> Hi Alexei, >>>>> >>>>> Thank you for looking into this. I am glad to hear that Arches should >>>>> support diacriticals. >>>>> >>>>> Here is the error message on loading the 'Ruler' Authority document: >>>>> >>>>> RULER_AUTHORITY_DOCUMENT.csv >>>>> >>>>> ERRORS IN FILE: RULER_AUTHORITY_DOCUMENT.values.csv >>>>> >>>>> ERRORS IN FILE: RULER_AUTHORITY_DOCUMENT.csv >>>>> >>>>> ERROR: Make sure the file is saved with UTF-8 encoding >>>>> 'utf8' codec can't decode byte 0xea in position 30: invalid continuation >>>>> byte >>>>> Traceback (most recent call last): >>>>> File >>>>> "/opt/projects/ENV/lib/python2.7/site-packages/arches/management/commands/package_utils/authority_files.py", >>>>> line 112, in load_authority_file >>>>> for row in rows: >>>>> File "/opt/projects/ENV/lib/python2.7/site-packages/unicodecsv/py2.py", >>>>> line 217, in next >>>>> row = csv.DictReader.next(self) >>>>> File "/usr/local/lib/python2.7/csv.py", line 104, in next >>>>> row = self.reader.next() >>>>> File "/opt/projects/ENV/lib/python2.7/site-packages/unicodecsv/py2.py", >>>>> line 128, in next >>>>> for value in row] >>>>> File "/opt/projects/ENV/lib/python2.7/encodings/utf_8_sig.py", line 22, >>>>> in decode >>>>> (output, consumed) = codecs.utf_8_decode(input, errors, True) >>>>> UnicodeDecodeError: 'utf8' codec can't decode byte 0xea in position 30: >>>>> invalid continuation byte >>>>> >>>>> ERROR in row 31 (Legacyoid (RULER_UID:30) not found. Make sure your >>>>> ParentConceptid in the >>>>> >>>>> This caused further errors in the Ruler Values files as can be seen from >>>>> above. >>>>> I do not have a copy of the authority file that caused the error asI have >>>>> since corrected it and changed it in a few places. But the alternative >>>>> name was >>>>> >>>>> Ptolemaîos Philadelphos >>>>> >>>>> and I believe it was the circumflex above the 'i' that caused the >>>>> problem. Certainly when I removed the circumflex, the file loaded OK. >>>>> >>>>> Thank you, >>>>> Lucy >>>>> >>>>> >>>>> ----- Original Message ----- >>>>> From: Alexei Peters >>>>> To: Lucy FJ >>>>> Cc: Arches Project >>>>> Sent: Wednesday, January 20, 2016 8:24 PM >>>>> Subject: Re: [Arches] Diacriticals in authority and .Arches files problems >>>>> >>>>> Hi Lucy, >>>>> The .arches file should support diacritics. I'm actually surprised that >>>>> the authority files don't. I just tested a local file and I was able to >>>>> add these records: >>>>> >>>>> conceptid,PrefLabel,AltLabels,ParentConceptid,ConceptType,Provider >>>>> 20000001-0000-0000-0000-000000000000,Portland,,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI >>>>> 20000002-0000-0000-0000-000000000000,San Francisco,The Bay >>>>> Area,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI >>>>> 20000003-0000-0000-0000-000000000000,San Jose,San >>>>> José,CITY_AUTHORITY_DOCUMENT.csv,Index,GCI >>>>> >>>>> Notice that the alt label for San Jose, is San José >>>>> >>>>> Can you share the authority file that you're having trouble with? >>>>> Cheers, >>>>> Alexei >>>>> >>>>> >>>>> Director of Web Development - Farallon Geographics, Inc. - 971.227.3173 >>>>> >>>>>> On Wed, Jan 20, 2016 at 12:32 AM, Lucy FJ <[email protected]> wrote: >>>>>> Hi all, >>>>>> We have been loading customised authority files and have noticed that >>>>>> Arches rejects words with diacriticals (accents etc). This is not a >>>>>> problem for us as we were happy to remove them and if we really want >>>>>> them we can enter then through the RDM. But will this problem occur when >>>>>> loading resource data through .arches? We need to input place names as >>>>>> alternative names using diacriticals and it would be much easier if we >>>>>> can do this via .arches files. We know we can input them using the >>>>>> resource data manager but obviously when dealing with about 3000 >>>>>> entries,,this is time consuming. >>>>>> Any ideas? >>>>>> Lucy >>>>>> >>>>>> -- >>>>>> -- To post, send email to [email protected]. To unsubscribe, >>>>>> send email to [email protected]. For more information, >>>>>> visit https://groups.google.com/d/forum/archesproject?hl=en >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Arches Project" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> -- >>>>> -- To post, send email to [email protected]. To unsubscribe, >>>>> send email to [email protected]. For more information, >>>>> visit https://groups.google.com/d/forum/archesproject?hl=en >>>>> --- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Arches Project" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>> >> >> -- >> -- To post, send email to [email protected]. To unsubscribe, send >> email to [email protected]. For more information, visit >> https://groups.google.com/d/forum/archesproject?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Arches Project" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. > > -- > -- To post, send email to [email protected]. To unsubscribe, > send email to [email protected]. For more > information, visit https://groups.google.com/d/forum/archesproject?hl=en > --- > You received this message because you are subscribed to a topic in the Google > Groups "Arches Project" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/archesproject/3l6N7KuEpXY/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. -- -- To post, send email to [email protected]. To unsubscribe, send email to [email protected]. For more information, visit https://groups.google.com/d/forum/archesproject?hl=en --- You received this message because you are subscribed to the Google Groups "Arches Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
