subject:"\[OPEN\-ILS\-GENERAL\] Accession numbers, copy notes and diacritics in call numbers"

Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers

2011-08-02 Thread Linda Jansova


Dear Dan and Thomas,

Thank you very much for your recommendations!

As to the sorting issues - I believe that now we have a number of hints 
to investigate, so we can further proceed in our efforts :-)... I do 
appreciate you have taken the trouble to test things for us and thus 
facilitated testing in our own installation. Thanks a lot for that :-)!


(And, yes, you are right - I apologize for not mentioning it in my 
message - we are on EG 2.0.)


Linda

Dne 28.7.2011 16:19, Dan Scott napsal(a):

Hi Linda:

2011/7/18 Linda Jansova:




3) In case we use diacritics in call numbers, is there a way to make the
Shelf Browser show first the particular letters letters without diacritics
and then those with diacritics? E.g. to have the letter "Č" after the letter
"C" instead of somewhere at the beginning of the virtual shelf? (We are
especially interested in Czech alphabet - the correct order of letters is
available at http://en.wikipedia.org/wiki/Czech_alphabet.) Or maybe is it
better do do without and use only call numbers with decent ;-) letters?

You would think that the answer to this would be simple, but it looks
like the answer will require us to check a number of points.

First, as of Evergreen 2.0 we generate call number "sort keys" that
are used for sorting purposes. (I can't remember but I think you're on
2.0 by now?). I'm guessing that you still have your library set to the
default call number classification scheme, so the first thing we'll
need to do is check to ensure that the sort keys that we generate
aren't doing something bad to diacritic characters.

Second, sorting depends on the glibc locale environment on your
database server - the collating sequence for the locale your database
has been created in should match the behaviour of the "sort" command
from the command line on the database server for each given locale.

For example, if I create the text file "sortme" that contains the
following lines:

C
Č
C
Č

I can then test the collation behaviour of different locales. On my
laptop running Fedora 15, for example, using the 'C' locale that we
expect the database to be using:

$ LANG=C sort sortme
C
C
Č
Č

And then if I switch to the cs_CZ UTF8 locale:

$ LANG=cs_CZ.utf8 sort sortme
C
C
Č
Č

But if I switch to the cs_CZ ISO locale:

$ LANG=cs_CZ.iso88592 sort sortme
Č
Č
C
C

So - it looks like on my environment I would expect to get the
appropriate sorting sequence for call numbers using the recommended
'C' locale for the database. It's possible that your database server's
locale environment does not match this behaviour, of course.

Finally, the actual call number sort key is then wrapped in an
oils_text_as_bytea(label_sortkey) function when we sort the results of
a call number browse, which converts the sortkey at run time to a
bytea data type. Checking out the definition of bytea strings, I had
the sinking feeling that this was the reason for your problems. To
paste from the PostgreSQL documentation:

"operations on binary strings process the actual bytes, whereas the
processing of character strings depends on locale settings. In short,
binary strings are appropriate for storing data that the programmer
thinks of as "raw bytes", whereas character strings are appropriate
for storing text."

In addition,

But then, when I tested this theory on a PostgreSQL 9.0 database
created with the C locale, that theory of doom doesn't seem to hold
up:

-- Show that we are using the right database locale
SHOW LC_COLLATE;

  lc_collate

  C
(1 row)

-- Create a test table to try out our theory
CREATE TABLE test_bytea(input TEXT, output BYTEA);

-- Insert our sample data
INSERT INTO test_bytea(input) VALUES ('C'), ('Č'), ('C'), ('Č');

-- Create the BYTEA version of the text in the output column
UPDATE test_bytea SET output = oils_text_as_bytea(input);

-- Get the table as sorted by the untouched text
SELECT input, output FROM test_bytea ORDER BY input ASC;

  input | output
---+
  C | \x43
  C | \x43
  Č | \xc48c
  Č | \xc48c
(4 rows)

-- Now get the table as sorted by the BYTEA version of the strings
SELECT input, output FROM test_bytea ORDER BY output ASC;

  input | output
---+
  C | \x43
  C | \x43
  Č | \xc48c
  Č | \xc48c
(4 rows)

And... as you can see, I'm still getting the expected sort order that
you want - even though the string has been converted to a BYTEA column
(and the oils_text_as_bytea() function throws in an UPPER() call that
also normally raises a warning flag that non-ASCII data may be
destroyed).

So - I'm having trouble reproducing the problem that you're seeing
with these simple tests. Maybe you can have your systems people try
out these tests on your database server to see if they get the same
results?

Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers

2011-07-28 Thread Dan Scott

Hi Linda:

2011/7/18 Linda Jansova :



> 3) In case we use diacritics in call numbers, is there a way to make the
> Shelf Browser show first the particular letters letters without diacritics
> and then those with diacritics? E.g. to have the letter "Č" after the letter
> "C" instead of somewhere at the beginning of the virtual shelf? (We are
> especially interested in Czech alphabet - the correct order of letters is
> available at http://en.wikipedia.org/wiki/Czech_alphabet.) Or maybe is it
> better do do without and use only call numbers with decent ;-) letters?

You would think that the answer to this would be simple, but it looks
like the answer will require us to check a number of points.

First, as of Evergreen 2.0 we generate call number "sort keys" that
are used for sorting purposes. (I can't remember but I think you're on
2.0 by now?). I'm guessing that you still have your library set to the
default call number classification scheme, so the first thing we'll
need to do is check to ensure that the sort keys that we generate
aren't doing something bad to diacritic characters.

Second, sorting depends on the glibc locale environment on your
database server - the collating sequence for the locale your database
has been created in should match the behaviour of the "sort" command
from the command line on the database server for each given locale.

For example, if I create the text file "sortme" that contains the
following lines:

C
Č
C
Č

I can then test the collation behaviour of different locales. On my
laptop running Fedora 15, for example, using the 'C' locale that we
expect the database to be using:

$ LANG=C sort sortme
C
C
Č
Č

And then if I switch to the cs_CZ UTF8 locale:

$ LANG=cs_CZ.utf8 sort sortme
C
C
Č
Č

But if I switch to the cs_CZ ISO locale:

$ LANG=cs_CZ.iso88592 sort sortme
Č
Č
C
C

So - it looks like on my environment I would expect to get the
appropriate sorting sequence for call numbers using the recommended
'C' locale for the database. It's possible that your database server's
locale environment does not match this behaviour, of course.

Finally, the actual call number sort key is then wrapped in an
oils_text_as_bytea(label_sortkey) function when we sort the results of
a call number browse, which converts the sortkey at run time to a
bytea data type. Checking out the definition of bytea strings, I had
the sinking feeling that this was the reason for your problems. To
paste from the PostgreSQL documentation:

"operations on binary strings process the actual bytes, whereas the
processing of character strings depends on locale settings. In short,
binary strings are appropriate for storing data that the programmer
thinks of as "raw bytes", whereas character strings are appropriate
for storing text."

In addition,

But then, when I tested this theory on a PostgreSQL 9.0 database
created with the C locale, that theory of doom doesn't seem to hold
up:

-- Show that we are using the right database locale
SHOW LC_COLLATE;

 lc_collate

 C
(1 row)

-- Create a test table to try out our theory
CREATE TABLE test_bytea(input TEXT, output BYTEA);

-- Insert our sample data
INSERT INTO test_bytea(input) VALUES ('C'), ('Č'), ('C'), ('Č');

-- Create the BYTEA version of the text in the output column
UPDATE test_bytea SET output = oils_text_as_bytea(input);

-- Get the table as sorted by the untouched text
SELECT input, output FROM test_bytea ORDER BY input ASC;

 input | output
---+
 C | \x43
 C | \x43
 Č | \xc48c
 Č | \xc48c
(4 rows)

-- Now get the table as sorted by the BYTEA version of the strings
SELECT input, output FROM test_bytea ORDER BY output ASC;

 input | output
---+
 C | \x43
 C | \x43
 Č | \xc48c
 Č | \xc48c
(4 rows)

And... as you can see, I'm still getting the expected sort order that
you want - even though the string has been converted to a BYTEA column
(and the oils_text_as_bytea() function throws in an UPPER() call that
also normally raises a warning flag that non-ASCII data may be
destroyed).

So - I'm having trouble reproducing the problem that you're seeing
with these simple tests. Maybe you can have your systems people try
out these tests on your database server to see if they get the same
results?

Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers

2011-07-28 Thread Thomas Berezansky

On #2, you can use an Asset Statistical Category to store some of that easily.

Thomas Berezansky
Merrimack Valley Library Consortium

Quoting Linda Jansova :

Hi,

I'm just posting my questions (see below) once again. So far there
has been no reply but - as you can see - I haven't given up yet :-)...

Thank you very much in advance :-)!

Linda

Dne 18.7.2011 17:06, Linda Jansova napsal(a):

Hi all,

Three more questions to ask:

1) In the Czech Republic, libraries usually use accession numbers
for individual copies. These are not accession numbers related to
bib records but solely to individual copies. Is it okay to insert
these in the "copy number" field in Holdings Maintenance? And if
so, how can it be inserted during import? Can the staging table
(http://open-ils.org/dokuwiki/doku.php?id=importing:holdings:import_via_staging_table) be used for this
purpose?

2) In case we wish to record how a particular copy has been aquired
(bought, received as a gift etc.), should we use "copy notes" (say
with a locally standardized copy note title distinguishable from
other types of copy notes)?

3) In case we use diacritics in call numbers, is there a way to
make the Shelf Browser show first the particular letters letters
without diacritics and then those with diacritics? E.g. to have the
letter "Č" after the letter "C" instead of somewhere at the
beginning of the virtual shelf? (We are especially interested in
Czech alphabet - the correct order of letters is available at
http://en.wikipedia.org/wiki/Czech_alphabet.) Or maybe is it better
do do without and use only call numbers with decent ;-) letters?

Thank you in advance for sharing any thoughts and ideas :-)!

Linda

Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers

2011-07-28 Thread Linda Jansova


Hi,

I'm just posting my questions (see below) once again. So far there has 
been no reply but - as you can see - I haven't given up yet :-)...


Thank you very much in advance :-)!

Linda

Dne 18.7.2011 17:06, Linda Jansova napsal(a):

Hi all,

Three more questions to ask:

1) In the Czech Republic, libraries usually use accession numbers for 
individual copies. These are not accession numbers related to bib 
records but solely to individual copies. Is it okay to insert these in 
the "copy number" field in Holdings Maintenance? And if so, how can it 
be inserted during import? Can the staging table 
(http://open-ils.org/dokuwiki/doku.php?id=importing:holdings:import_via_staging_table) 
be used for this purpose?


2) In case we wish to record how a particular copy has been aquired 
(bought, received as a gift etc.), should we use "copy notes" (say 
with a locally standardized copy note title distinguishable from other 
types of copy notes)?


3) In case we use diacritics in call numbers, is there a way to make 
the Shelf Browser show first the particular letters letters without 
diacritics and then those with diacritics? E.g. to have the letter "Č" 
after the letter "C" instead of somewhere at the beginning of the 
virtual shelf? (We are especially interested in Czech alphabet - the 
correct order of letters is available at 
http://en.wikipedia.org/wiki/Czech_alphabet.) Or maybe is it better do 
do without and use only call numbers with decent ;-) letters?


Thank you in advance for sharing any thoughts and ideas :-)!

Linda

[OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers

2011-07-18 Thread Linda Jansova


Hi all,

Three more questions to ask:

1) In the Czech Republic, libraries usually use accession numbers for 
individual copies. These are not accession numbers related to bib 
records but solely to individual copies. Is it okay to insert these in 
the "copy number" field in Holdings Maintenance? And if so, how can it 
be inserted during import? Can the staging table 
(http://open-ils.org/dokuwiki/doku.php?id=importing:holdings:import_via_staging_table) 
be used for this purpose?


2) In case we wish to record how a particular copy has been aquired 
(bought, received as a gift etc.), should we use "copy notes" (say with 
a locally standardized copy note title distinguishable from other types 
of copy notes)?


3) In case we use diacritics in call numbers, is there a way to make the 
Shelf Browser show first the particular letters letters without 
diacritics and then those with diacritics? E.g. to have the letter "Č" 
after the letter "C" instead of somewhere at the beginning of the 
virtual shelf? (We are especially interested in Czech alphabet - the 
correct order of letters is available at 
http://en.wikipedia.org/wiki/Czech_alphabet.) Or maybe is it better do 
do without and use only call numbers with decent ;-) letters?


Thank you in advance for sharing any thoughts and ideas :-)!

Linda

Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers

Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers

Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers

Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers

[OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers

5 matches

Site Navigation

Mail list logo

Footer information