Re: Comparing Raw Values of the Age Property

2017-05-23 Thread Janusz S. Bien via Unicode
Quote/Cytat - Manuel Strehl via Unicode  (Tue 23  
May 2017 11:33:24 AM CEST):



The rising standard in the world of web development (and others) is called
»Semantic Versioning« [1], that many projects adhere to or sometimes must
actively explain, why they don't.

The structure of a »semantic version« string is a set of three integers,
MAJOR.MINOR.PATCH, where the »sematics« part lies in a kind of contract
between author and user, when to increment which part.



Perhaps I am missing something, but I don't understand this thread. Cf.

http://unicode.org/versions/

Version numbers for the Unicode Standard consist of three fields,  
denoting the major version, the minor version, and the update version,  
respectively.


The differences between major, minor, and update versions are as follows:

[...]

Best regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: New tool unidump

2017-03-17 Thread Janusz S. Bien
Quote/Cytat - Manuel Strehl  (Fri 17 Mar 2017  
09:44:15 PM CET):



Hi,

for my work on codepoints.net and Emojipedia I found myself repeatedly
in a place, where I needed some tool like hexdump to inspect the content
of a string. However, instead of raw bytes I am more interested in the
code points that the string is composed of. So I wrote this tool.


Is somebody maintaining a list of such utilities?

There is a page

http://www.unicode.org/resources/online-tools.html

but I remember that earlier a page on the site used to be links to the  
programs mentioned in 2012 "Tool to convert characters to character  
names", in particular to Bill Poser's uniutils  
(http://billposer.org/Software/unidesc.html) and the orphaned unihist  
by a student of mine (https://bitbucket.org/jsbien/unihistext). I'm  
unable to find them now.


Best regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Janusz S. Bien

Quote/Cytat - J Decker  (Mon 13 Mar 2017 06:55:18 PM CET):


texel looks to be defined as a graphic element already.  TEXture ELement.


I'm aware of it, but homonymy/polysemy is something we have to live  
with. I think there is no risk of confusing texture elements with text  
elements, despite the fact that 'texture' and 'text' have similar  
origin.


Best regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Janusz S. Bien
Quote/Cytat - Asmus Freytag  (Mon 13 Mar 2017  
06:00:08 PM CET):


[...]

This (or similar) scenarios indicate the impossibility to come to a
single, universal definition of a "textel" -- the main reason why this
term is of lower utility than "pixel".

I agree that it is impossible  to come to a single, universal  
definition of text elements, but it seems possible to reach a  
consensus on a kind of the least common denominator of them and call  
it "textel" or something else.


Best regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Janusz S. Bien
Quote/Cytat - William_J_G Overington  (Mon  
13 Mar 2017 12:24:13 PM CET):



Prof. Janusz S. Bień wrote:


Just yet another reason for introducing the notion of textel?


I opine that it would be a good idea to introduce several new words,  
of which textel would be one, with each such new word having a  
precisely-defined meaning so that in precise discussions of  
programming techniques people could discuss the situation without  
needing to use any of the words character, code point, grapheme  
cluster.


How many such new words would be needed?


In my paper (in Polish)

http://bc.klf.uw.edu.pl/480/

I propose also the term "texton" meaning a code point from a specific  
subset, not yet fully defined, but including at least the components  
of composite characters.


Best regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: "A Programmer's Introduction to Unicode"

2017-03-13 Thread Janusz S. Bien
Quote/Cytat - Richard Wordingham <richard.wording...@ntlworld.com>  
(Sun 12 Mar 2017 09:10:22 PM CET):



On Sun, 12 Mar 2017 20:02:28 +0100
"Janusz S. Bien" <jsb...@mimuw.edu.pl> wrote:


If the basic notion has to be referred in a cumbersome way as
"extended grapheme cluster" then it is easier to talk about "Unicode
characters" despite the fact that they have a rather loose relation
to real-life/user-perceived characters.


The notion that extended grapheme clusters corresponds to
user-perceived characters is also rather dodgy.


The idea is not mine, but it appears from time to time on the list in  
a more or less explicit way.



Whereas it may work
for French, it is getting very dubious by the time one adds Hebrew
cantillation marks or Vedic accentuation.  The Thais revolted when
their preposed vowels were joined with the following consonant in the
same extended grapheme cluster, and Unicode had to revoke that union.


Just yet another reason for introducing the notion of textel?

Best regards

Janusz


--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: "A Programmer's Introduction to Unicode"

2017-03-12 Thread Janusz S. Bien
Quote/Cytat - Manish Goregaokar  (Sun 12 Mar 2017  
07:43:22 PM CET):



This is just another confirmation that the present Unicode terminology

is confusing.

I find this to be a symptom of our pedagogy around "characters" in
programming; most folks get taught that characters are bytes are code
points, especially because many languages try to make this the case.
The name "grapheme cluster" could be improved upon, but it's not the
primary source of this confusion.


I agree that it's not the primary source. However the pedagogy depends  
on the terminology used.


If the basic notion has to be referred in a cumbersome way as  
"extended grapheme cluster" then it is easier to talk about "Unicode  
characters" despite the fact that they have a rather loose relation to  
real-life/user-perceived characters.


Best regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Invisible letter (was Re: a character for an unknown character)

2016-12-21 Thread Janusz S. Bien
Quote/Cytat - David Corbett  (Wed 21 Dec  
2016 05:56:27 PM CET):



Couldn’t you use U+1D52 MODIFIER LETTER SMALL O?


In our corpus COMBINING LATIN SMALL LETTER O sometimes occurs in its  
combining function, it seemed more elegant to use a uniform encoding.  
But you are right, in the example quoted MODIFIER LETTER SMALL O could  
be also used.


Regards

Janusz


(I changed the subject line because the invisible letter proposal is not
relevant to the question about a lacuna character.)


I strongly support this. In our historical corpus of Polish

http://korpusy.klf.uw.edu.pl/en/IMPACT_GT_2/

we have in particular words ending with 'COMBINING LATIN SMALL LETTER
O' (U+0366).

We had to precede the character with NBSP as the vase, but to preserve
the correct segmentation into words we had to treat NBSP as a letter.






--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: a character for an unknown character

2016-12-21 Thread Janusz S. Bien
Quote/Cytat - Michael Everson  (Wed 21 Dec 2016  
05:25:30 PM CET):


I still believe that we need INVISIBLE LETTER  
http://unicode.org/review/pr-41-invisible.pdf


I think that for the display of combining characters without a base  
character that the recommended NBSP makes no sense. NBSP is supposed  
to glue the characters on either side of it to itself. It makes  
sense that the following character, say COMBINING ACUTE ACCENT,  
would be glued to it. But why should the two of those be glued to  
whatever precedes?


I strongly support this. In our historical corpus of Polish

http://korpusy.klf.uw.edu.pl/en/IMPACT_GT_2/

we have in particular words ending with 'COMBINING LATIN SMALL LETTER  
O' (U+0366).


We had to precede the character with NBSP as the vase, but to preserve  
the correct segmentation into words we had to treat NBSP as a letter.


Best regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: "textels"

2016-09-18 Thread Janusz S. Bien
Quote/Cytat - Christoph Päper  (pią, 16  
wrz 2016, 23:51:38):



Janusz S. Bień :


1. Graphemes, if I understand correctly, are language dependent, …


That’s true in linguistic terminology – well, at least within the  
more popular schools of thought –, but not in technical (i.e.  
Unicode) jargon.


From the Unicode glossary:

Grapheme. (1) A minimally distinctive unit of writing in the context  
of a particular writing system.[...] (2) What a user thinks of as a  
character.


As for (2), cf.

User-Perceived Character. What everyone thinks of as a character in  
their script.


So we have "a user" versus "everyone...in their script" - is the  
difference intentional? Probably not. Anyway the definitions are  
language/locale dependent.


Regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: "textels"

2016-09-16 Thread Janusz S. Bien
Quote/Cytat - Eric Muller <eric.mul...@efele.net> (pią, 16 wrz 2016,  
17:47:27):



On 9/16/2016 8:30 AM, Janusz S. Bien wrote:
Quote/Cytat - Eric Muller <eric.mul...@efele.net> (pią, 16 wrz  
2016, 17:03:54):



On 9/16/2016 6:52 AM, Janusz S. Bień wrote:

(when working on a corpus of historical Polish we
noticed some cases where standard Unicode equivalence was not
convenient).


I'm very interested to know more about those cases.


For our search engine we were unable to use compatibility  
equivalence "out of the box" for splitting the ligature because it  
also converted long s to short s while we wanted to preserve the  
distinction.


I am interested in the problems with *canonical* equivalence. I  
thought that you were talking about those before.


I apologize for the confusion, that was my fault. I tend to answer too  
quickly and not precisely enough :-(


On the other hand I'm not sure canonical equivalence is always what I  
want and expect, but I don't have specific examples at hand.


Regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: "textels"

2016-09-16 Thread Janusz S. Bien
Quote/Cytat - Eric Muller  (pią, 16 wrz 2016,  
17:03:54):



On 9/16/2016 6:52 AM, Janusz S. Bień wrote:

(when working on a corpus of historical Polish we
noticed some cases where standard Unicode equivalence was not
convenient).


I'm very interested to know more about those cases.


For our search engine we were unable to use compatibility equivalence  
"out of the box" for splitting the ligature because it also converted  
long s to short s while we wanted to preserve the distinction.


Regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: NamesList.txt as data source

2016-03-30 Thread Janusz S. Bien
Quote/Cytat - Andrew West  (Tue 29 Mar 2016  
06:15:15 PM CEST):



On 29 March 2016 at 16:19, Janusz S. Bień  wrote:


> All documents submitted to WG2 and to L2 by individuals are copyright
> of the author(s) of the document.  Documents do not need to carry a
> copyright notice to have copyright, and submitting the documents to
> Unicode Consortium and/or ISO does not affect the copyright status of
> documents.
>
>> http://www.unicode.org/policies/ipr_policy.html

Do you happen to know an analogical link for the ISO submissions? I was
unable to find one quickly.


ISO/IEC Directives Part 1 (6th ed., 2015)

Section 2.13:


In ISO and IEC, there is an understanding that original material
contributed to become a part of an ISO,
IEC or ISO/IEC publication can be copied and distributed within the
ISO and/or IEC systems (as relevant)
as part of the consensus building process, this being without
prejudice to the rights of the original
copyright owner to exploit the original text elsewhere. Where material
is already subject to copyright,
the right should be granted to ISO and/or IEC to reproduce and
circulate the material. This is frequently
done without recourse to a written agreement, or at most to a simple
written statement of acceptance.
Where contributors wish a formal signed agreement concerning copyright
of any submissions they
make to ISO and/or IEC, such requests must be addressed to ISO Central
Secretariat or the IEC Central
Office, respectively.


Andrew


Thanks again!

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Joined "ti" coded as "Ɵ" in PDF

2016-03-20 Thread Janusz S. Bien
Quote/Cytat - Andrew Cunningham  (Sun 20 Mar  
2016 12:06:29 AM CET):



Hi Don,

Latin is fine if you keep to simple well made fonts and avoid using more
sophisticated typographic features available in some fonts.

Dumb it down typographically and it works fine. PDF, despite all the
current rhetoric coming from PDF software developers, is a preprint format.
Not an archival format.


What about PDF/A, ISO 19005-1:2005 Document Management – Electronic  
document file format for long term preservation?


Best regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



RE: annotations

2016-03-14 Thread Janusz S. Bien

Quote/Cytat - Doug Ewell  (pon, 14 mar 2016, 19:22:14):


Ken Whistler wrote:


The trick is this: the status of annotational data in NamesList.txt
is different than that of normative data like the code points, names,
formal name aliases, decomposition mappings, and standardized
variation sequences.


I get that. I am FAR more comfortable with that type of guideline:

• the data isn't normative (at least not all of it)
• the format isn't set in stone
• don't ask for additions or changes


What about reporting possible mistakes?

Regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Character folding in text editors

2016-02-20 Thread Janusz S. Bien
Quote/Cytat - Philippe Verdy  (Sat 20 Feb 2016  
06:27:41 PM CET):



Unless we have case folding tailored by language, you cannot do that based
on the Unicode database alone.

However CLDR provides tailored data about collation.

From my point of view, it is just a matter or selecting the collation
strength to use for searches using collation.


Exactly. The POSIX equivalent classes are defined by the locale collation.

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Character folding in text editors

2016-02-20 Thread Janusz S. Bien
Quote/Cytat - Elias Mårtenson  (Sat 20 Feb 2016  
11:23:13 AM CET):



Hello Unicode,

I have been involved in a rather long discussion on the Emacs-devel mailing
list[1] concerning the right way to do character folding and we've reached
a point where input from Unicode experts would be welcome.

The problem is the implementation of equivalence when searching for
characters. For example, if I have a buffer containing the following
characters (both using the precomposed and canonical forms):

o ö ø ó n ñ

The character folding feature in Emacs allows a search for "o" to mach some
or even all of these characters. The discussion on the mailing list has
circulated around both the fact that the correct behaviour here is
locale-dependent, and also on the correct way to implement this matching
absent any locale-specific exceptions.


What about just using the POSIX equivalent classes in regular expression?

From

http://www.regular-expressions.info/posixbrackets.html

A POSIX locale can define character equivalents that indicate that  
certain characters should be considered as identical for sorting. In  
French, for example, accents are ignored when ordering words. élève  
comes before être which comes before événement. é and ê are all the  
same as e, but l comes before t which comes before v. With the locale  
set to French, a POSIX-compliant regular expression engine matches e,  
é, è and ê when you use the collating sequence [=e=] in the bracket  
expression [[=e=]].


Regards

Janusz
(an Emacs user)


--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Unicode in the Curriculum?

2015-12-30 Thread Janusz S. Bien
Quote/Cytat - Andre Schappo  (śro, 30 gru 2015,  
17:16:09):



Does anyone on this list teach Unicode at an Educational  
Establishment, School, or College or University?


In a sense:

https://usosweb.uw.edu.pl/kontroler.php?_action=katalog2/przedmioty/pokazPrzedmiot=3322-TUS-OG


Regards

Janusz


--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Acquiring DIS 10646

2015-10-03 Thread Janusz S. Bien
Quote/Cytat - Doug Ewell  (Sat 03 Oct 2015 08:00:12  
PM CEST):



Sean Leonard wrote:


What I understand is that Draft 1 got shot down because it was at
variance with the nascent Unicode effort;


If I remember correctly, Draft 1 looked a lot like an updated and  
expanded version of ISO 2022, much more than it did like today's  
Unicode/10646.


Rob Pike, Ken Thompson
Hello World

http://plan9.bell-labs.com/sys/doc/utf.html

The draft of ISO 10646 was not very attractive to us. It defined a  
sparse set of 32-bit characters, which would be hard to implement and  
have punitive storage requirements. Also, the draft attempted to  
mollify national interests by allocating 16-bit subspaces to national  
committees to partition individually. The suggested mode of use was to  
‘‘flip’’ between separate national standards to implement the  
international standard.


Regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Concise term for non-ASCII Unicode characters

2015-09-21 Thread Janusz S. Bien
Quote/Cytat - Sean Leonard  (Mon 21 Sep  
2015 10:51:42 PM CEST):



Related question as I am researching this:

How can I acquire (cheaply or free) the latest and most official  
copy of US-ASCII, namely, the version that Unicode references?


[...]

I've never seen the ASCII standard, but I think is it (almost?)  
identical to ISO/IEC 646, which in turn  is identical to the freely  
available ECMA-6:


http://www.ecma-international.org/publications/standards/Ecma-006.htm

Regards

Janusz


--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Input methods at the age of Unicode

2015-07-18 Thread Janusz S. Bien
Quote/Cytat - Marcel Schneider charupd...@orange.fr (Sat 18 Jul 2015  
04:33:23 PM CEST):



On 16 Jul 2015, at 23:59:24 +0100, Eli Zaretskii  wrote: wrote:


FWIW, I do that a lot, because the number of convenient input methods
in Emacs far outnumbers what I have on MS-Windows. For example, if I
have to type Russian with no Russian keyboard available, the
cyrillic-translit input method is a life savior.


You might wish also to use the Windows on-screen keyboard which  
allows to see what's exactly on each key while typing on whatever  
physical keyboard, without any need to have the keycap labels match  
the layout. This on-screen keyboard is built-in, only it does not  
support Kana shift states.
Likewise Windows came to me along with all that is needed to type Ἐν  
ἀρχῇ ἦν ὁ λόγος, so I canʼt really believe that users need Emacs as  
a savior.


cyrillic-translit and most other Emacs input methods are more  
convenient than on-screen keyboard, especially if you don't like to  
use mouse and your goal is to get the text into Emacs :-)




When process garbage is an environmental issue, one might consider  
that our real savior is Notepad++, thanks to its energy saving  
algorithms. Indeed I do not think that we should get supplemental  
input facilities at any price. This is why, too, the goal should be  
to pack a reasonably large subset of Unicode into the very core of  
the keyboard driver of every locale, and make it accessible right  
there with a Compose tree.


I don't think it would be practical.

Every time we open charmap dialogs or even go on the internet to  
pick a character, weʼre consuming some energy,


Agreed.


and if itʼs a routine task that could be done with a memorized


Memorizing also requires some effort and energy.

Compose sequence, that energy is wasted. I donʼt know if itʼs a real  
issue, but Iʼm likely to believe it is.


Of course we need some software as a savior, but this software is  
consequently called Zotero and helps us save and manage our research  
results (“Search, not re-search!” https://www.zotero.org).


I have nothing against Zotero, but its mention here seems completely  
irrelevant.


Best regards

Janusz


--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Input methods at the age of Unicode

2015-07-16 Thread Janusz S. Bien
Quote/Cytat - Richard Wordingham richard.wording...@ntlworld.com  
(Fri 17 Jul 2015 12:59:24 AM CEST):



On Thu, 16 Jul 2015 19:33:34 +0300
Eli Zaretskii e...@gnu.org wrote:


 One needs a good UTF-8 text editor as well.



Emacs is one possibility, of course.


If you're prepared to cut and paste,


Why it is relevant?


it's easy to extend it own
keyboards.  (Creating the first one was a bit stressful


It is not clear for me what do you mean by own keyboards

- the ones

that come with Emacs were almost all set up using ISO-2022, before
Emacs adopted Unicode.)


I my opinion creating a new Emacs input method is extremely easy and I  
solve my problems my modifying 'polish-slash'.


In a file you can associate an input method with it using Emacs an  
appropriate local variable.


Best regards

Janusz


--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



ISO (was Re: Accessing the WG2 document register)

2015-06-15 Thread Janusz S. Bien
Quote/Cytat - Marcel Schneider charupd...@orange.fr (Mon 15 Jun 2015  
10:29:41 AM CEST):



What are we going to do? What are you going to do? I repeat, I'm  
shocked, and I hate ISO again.


Please remember that your government supports ISO through your  
national standard body. So contact AFNOR and persuade them to take an  
appropriate action.


Good luck!

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: Accessing the WG2 document register

2015-06-10 Thread Janusz S. Bien
Quote/Cytat - William_J_G Overington wjgo_10...@btinternet.com (Wed  
10 Jun 2015 10:25:19 AM CEST):


Remind me why Unicode is still taking ISO to the dance? Sometimes  
going stag has its benefits...



As I understand it, Unicode Inc. is a recognised guest of ISO in  
participating in ISO producing an International Standard.


Cf. http://www.unicode.org/L2/L2014/14286-wg2-liaison.pdf

Regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: the usage of LATIN SMALL LETTER A WITH STROKE

2015-05-31 Thread Janusz S. Bien
Quote/Cytat - Gerrit Ansmann gansm...@uni-bonn.de (Sun 31 May 2015  
05:01:36 PM CEST):


On Sun, 31 May 2015 16:32:36 +0200, Janusz S. Bień  
jsb...@mimuw.edu.pl wrote:



I'm curious what was the motivation for adding the character to Unicode.


According to the Code Chart for Latin Extended B  
(http://www.unicode.org/charts/PDF/U0180.pdf), it’s used for  
Sencoten. It was also used in some old Norwegian texts (for a start,  
see here: http://en.wikipedia.org/wiki/Christian_Kølle).


Thank you very much for the link to old Norwegian (I was aware of Sencoten).

Best regards

JSB

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: the usage of LATIN SMALL LETTER A WITH STROKE

2015-05-31 Thread Janusz S. Bien
Quote/Cytat - Frédéric Grosshans frederic.grossh...@gmail.com (Sun  
31 May 2015 06:20:31 PM CEST):



Le 31/05/2015 17:03, Janusz S. Bien a écrit :
Quote/Cytat - Andrew West andrewcw...@gmail.com (Sun 31 May 2015  
04:56:32 PM CEST):



On 31 May 2015 at 15:32, Janusz S. Bień jsb...@mimuw.edu.pl wrote:


I'm curious what was the motivation for adding the character to
Unicode. I understand the proposal is somewhere in the archives, perhaps
it is available on the Internet?


Please see http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2942.doc.


Thank you very much for your quick answer!

Would you so kind to point me to the proposal for the upper case of  
A WITH STROKE, or advice me how to look for it in the archive?


The upper case was introduces for Sencoten, and the proposal is here
http://www.unicode.org/L2/L2004/04170-sencoten.pdf

(found by googling sencoten site:unicode.org)


Thank yout very much for both informations.

The proposal makes me curious about past and present Unicode policy,  
e.g. would it be accepted if submitted now. But this is a completely  
different question to which I perhaps will return in some future.


Thanks again to all who responded.

Best regards

Janusz



--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



Re: the usage of LATIN SMALL LETTER A WITH STROKE

2015-05-31 Thread Janusz S. Bien
Quote/Cytat - Andrew West andrewcw...@gmail.com (Sun 31 May 2015  
04:56:32 PM CEST):



On 31 May 2015 at 15:32, Janusz S. Bień jsb...@mimuw.edu.pl wrote:


I'm curious what was the motivation for adding the character to
Unicode. I understand the proposal is somewhere in the archives, perhaps
it is available on the Internet?


Please see http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2942.doc.


Thank you very much for your quick answer!

Would you so kind to point me to the proposal for the upper case of A  
WITH STROKE, or advice me how to look for it in the archive?


Best regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/



precomposed characters (was: Unicode organization is still anti-Serbian and anti-Macedonian)

2014-02-15 Thread Janusz S. Bien
Quote/Cytat - Richard Wordingham richard.wording...@ntlworld.com  
(Sat 15 Feb 2014 07:25:51 PM CET):

Each precomposed character adds a small processing
overhead to an extremely large number of computers, not just to the
computers that actually use it.


This is a very strong claim. Would be so kind to elaborate?

Best regards

Janus

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/

___
Unicode mailing list
Unicode@unicode.org
http://unicode.org/mailman/listinfo/unicode


Re: Origin of Ellipsis

2013-09-14 Thread Janusz S. Bien
Quote/Cytat - Michael Everson ever...@evertype.com (Sat 14 Sep 2013  
12:42:50 PM CEST):



On 14 Sep 2013, at 02:30, Stephan Stiller stephan.stil...@gmail.com wrote:

This means that this dot will then need to be followed by two  
spaces when it is used as a sentence-ending period.


This tradition is no longer current in the US. Though it's obvious  
there are still plenty of middle and high school–level teachers and  
college-level writing instructors teaching this in the US, not  
knowing that books and periodicals in the US haven't been using two  
spaces after a sentence-final period for a long time.


Books never used it. The tradition in typing was developed to assist  
typesetters to navigate the typewritten text they were setting. The  
typesetters never put two spaces after a full stop.


Typesetter used variable width space, and some of them put a longer  
space after a full stop. Cf. Knuth's TeXBook and the discussion of  
\frenchspacing parameter.


Regards

Janusz

--
Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
Lingwistyki Formalnej)

Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
jsb...@uw.edu.pl, jsb...@mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/