Re: [LUGOS-SLO] Code for Bohorič alphabet?

Tomaz Erjavec Sun, 15 Apr 2012 12:08:47 -0700

Dear Deborah,

thanks a lot for your informative mail – and, of course, to Toma, who steered 
me in the right direction; we've had some exchange off the list, and settled on 
sl-x-Boho.


 

But even now, reading rfc 5646, I still think it is a script, rather than 
variant:

2.2.3.  Script Subtag are used to indicate the script or _writing system 
variations_ that distinguish the written forms of a language

2.2.5.  Variant Subtags are used to indicate additional, well-recognized 
variations that define _a language or its dialects_

 

I'd say Bohorič is clearly a writing system variation, rather than a language – 
that didn’t change suddenly with the switch to Gajica (~1850), which is what we 
use today (a-z + čšž). 

 

If scripts, not only variants can be registered with IANA, I’d certainly like 
to do it – except, while I’m at it, I’d also propose two others, which were 
briefly in vogue in Slovenia the mid-19th century.

 

I agree that back-changing sl-x-Boho to sl-Boho is a pain, and time is actually 
tight, as I’m presenting the corpus in about a month – is an IANA that fast?

 

My question was also CCed to the Linux localisation user group of Slovenia, 
where I got links to ISO-15924, in particular:

Notice of changes from ISO of that standard:  
<http://unicode.org/iso15924/codechanges.html> 
http://unicode.org/iso15924/codechanges.html 

Rules about adding new scripts (see article A.3.3):  
<http://www.unicode.org/iso15924/standard/index.html#annex> 
http://www.unicode.org/iso15924/standard/index.html#annex  

This does look more complicated though.

 

All the best,

Tomaž

 

 

From: Deborah W. Anderson [mailto:[email protected]] 
Sent: Sunday, April 15, 2012 8:14 PM
To: [email protected]; [email protected]
Cc: [email protected]
Subject: RE: Code for Bohorič alphabet?

 

Dear Tomaž (and Toma),

To add a bit to what Toma has written…

 

>From my reading of your message, you want to identify the Bohoričica 
>*orthography*, since the script is Latin (as is clear in the Wikipedia page, 
>which lists a, b, d, e, f, g, h, etc.), being used for Slovenian (ISO language 
>name). 

 

The way to proceed is to propose a variant subtag (via 
[email protected]), particularly if you want to have your data be 
available for general use.

 

See RFC 5646, especially sections 2.2.5, 3.5, and 3.6: 
http://www.inter-locale.com/ID/rfc5646.html.

 

(Useful background reading: 
http://www.w3.org/International/articles/language-tags/Overview.en.php and 
http://www.w3.org/International/questions/qa-choosing-language-tags )

 

I wrote to Doug Ewell, who is directly involved with IANA subtag registry, to 
verify this. He recommended you propose a variant subtag but noted:


> Until something is registered, a private-use tag like "sl-x-bohoric" (or 
> "sl-x-boh" if brevity [is 

> preferred] over readability) should work. 

> Remember that if a variant is later registered, he will want to use the 
> "official" tag instead of the 

> private one, and changing tags on existing data can be a headache. 

 

I hope this helps.  (Doug can assist you in making a request, if you decide to 
go that route.)

 

With best wishes,

Deborah Anderson

Researcher, Dept. of Linguistics

UC Berkeley

 

 

From: TEI (Text Encoding Initiative) public discussion list 
[mailto:[email protected]] On Behalf Of Toma Tasovac
Sent: Friday, April 13, 2012 5:21 PM
To: [email protected]
Subject: Re: Code for Bohorič alphabet?

 

Dear Tomaž,

 

So far I've just been using @xml:lang="sl-boh" but I know this is sinful - but 
I'm not sure how it should be encoded.

 

Wouldn't this actually be a good candidate for the x-subtag? Since ISO doesn't 
really recognize Bohoričica, using xml:lang="sl-x-boh" would stress that fact 
without sacrificing the readability of the attribute value.  And with private 
use subtags you are pretty much free to do whatever you want ("Private use 
subtags are used to indicate distinctions in language that are important in a 
given context by private agreement.")

 

Then to be perfectly safe you could use <langUsage> and <language> in the 
header: 

 

<langUsage>

<language ident="sl-x-boh">Slovenian written using the Bohorič 
alphabet</language>

</langUsage>

 

All best,

Toma

—————————————————————
Toma Tasovac
Center for Digital Humanities (Belgrade, Serbia) 
http://humanistika.org • http://transpoetika.org

 

13.04.2012, в 22:42, Tomaz Erjavec написал(а):

 

Dear all,
in the context of a historical corpus of Slovene I'd want to mark texts that 
are written in the Bohorič alphabet 
(http://en.wikipedia.org/wiki/Bohori%C4%8D_alphabet).
So far I've just been using @xml:lang="sl-boh" but I know this is sinful - but 
I'm not sure how it should be encoded.
First, I'm not sure if it even qualifies as a "script", as e.g. I can't find a 
script for old English which used the long s, but maybe because this only 
substitutes one character for another - with Bohorič  it's more complicated. 
Even taking it as a script (so I could write sl-Boho), 
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CH.html#CHSH does say that 
they should be taken from ISO 15924, 
http://unicode.org/iso15924/iso15924-codes.html and there is no Boho there; I 
also can't find an extension mechanism as there is with languages.
Any tips gratefully received.
Best,
Tomaž
-- 
Tomaž Erjavec, http://nl.ijs.si/et/
Dept. of Knowledge Technologies, Jožef Stefan Institute, Ljubljana

_______________________________________________
lugos-slo mailing list
[email protected]
http://liste2.lugos.si/cgi-bin/mailman/listinfo/lugos-slo

Re: [LUGOS-SLO] Code for Bohorič alphabet?

Одговори путем е-поште