Whence UniData.txt? (was Re: unidata is big)

2002-04-24 Thread Bob_Hallissy


Theo's comment leads me to a question I've pondered recently:

Assumptions:

   Many apps, from independent sources, need to access the Unicode
   character data,

   A lot of these apps aren't overly concerned with the slight overhead of
   parsing the data as needed from Unicode-supplied data files directly.

   Similarly, such apps benefit from being able to easily upgrade to new
   Unicode releases by simply replacing the data files.

   It isn't very user-friendly to for every such app to store their own
   private copy of the character data files when a single shared copy would
   take up less space and be easier to maintain.

It would seem to me that there is some value in establishing either (1) a
standard location where programs can expect to find (or install) a local
copy of the Unicode data files, or (2) a standard way to discover where
such a local copy of these files exist. My preference would be (2), which
would make it easy to configure a network of machines to share a single
copy of the data files. Something as simple as an environment variable
could work if developers were to agree on its name and semantics.

(I understand there may be different mechanisms for different platforms,
but it would be even better if a standard mechanism were cross platform).

So, are there any conventions for this evolving?  Or would anyone like to
propose one?

Bob



On 24/04/2002 09:26:55 Theo Veenker wrote:

andreas palsson wrote:

I wouldn't bother too much about memory efficiency; it's irrelevant
these days. Even your mobile phone has enough memory to store all
unicode data 10..20 times. Same thing for lookup speed. All you have
to do to get it fast is to wait (a few seasons).

Theo






Re: Whence UniData.txt? (was Re: unidata is big)

2002-04-24 Thread Theo Veenker

[EMAIL PROTECTED] wrote:
 
 Theo's comment leads me to a question I've pondered recently:
 
 Assumptions:
 
Many apps, from independent sources, need to access the Unicode
character data,
 
A lot of these apps aren't overly concerned with the slight overhead of
parsing the data as needed from Unicode-supplied data files directly.
 
Similarly, such apps benefit from being able to easily upgrade to new
Unicode releases by simply replacing the data files.
 
It isn't very user-friendly to for every such app to store their own
private copy of the character data files when a single shared copy would
take up less space and be easier to maintain.
 
 It would seem to me that there is some value in establishing either (1) a
 standard location where programs can expect to find (or install) a local
 copy of the Unicode data files, or (2) a standard way to discover where
 such a local copy of these files exist. My preference would be (2), which
 would make it easy to configure a network of machines to share a single
 copy of the data files. Something as simple as an environment variable
 could work if developers were to agree on its name and semantics.

For applications that eat raw UCD files, this shouldn't be to difficult
to achieve. Any well designed app will/should have some parameter or env.
variable that you can set (no?). But for apps/libraries that like their UCD 
files cooked it is a different story because there is no recommended binary 
format for representing (compact) unicode character data. Personally I
would appreciate seeing such a recommendation including your point (2).
However apps/libs which enrich the character data with custom properties, 
would still need their own copy of the data.

The subject reminds me of the TZ database. Here you have a large text based 
database containing information on time zones and daylight saving times.
You can compile the data into a binary format by running a utility included
with the tz sources. Well, they don't give any recommendation on where to 
store the (text and/or binary) data, but at least there is a 'standard' 
format, which allows for sharing data. Would be nice to have something like
this for the UCD.

 (I understand there may be different mechanisms for different platforms,
 but it would be even better if a standard mechanism were cross platform).
 
 So, are there any conventions for this evolving?  Or would anyone like to
 propose one?

Please, go ahead :o)

Theo