Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers
Dear Dan and Thomas, Thank you very much for your recommendations! As to the sorting issues - I believe that now we have a number of hints to investigate, so we can further proceed in our efforts :-)... I do appreciate you have taken the trouble to test things for us and thus facilitated testing in our own installation. Thanks a lot for that :-)! (And, yes, you are right - I apologize for not mentioning it in my message - we are on EG 2.0.) Linda Dne 28.7.2011 16:19, Dan Scott napsal(a): Hi Linda: 2011/7/18 Linda Jansova: 3) In case we use diacritics in call numbers, is there a way to make the Shelf Browser show first the particular letters letters without diacritics and then those with diacritics? E.g. to have the letter "Č" after the letter "C" instead of somewhere at the beginning of the virtual shelf? (We are especially interested in Czech alphabet - the correct order of letters is available at http://en.wikipedia.org/wiki/Czech_alphabet.) Or maybe is it better do do without and use only call numbers with decent ;-) letters? You would think that the answer to this would be simple, but it looks like the answer will require us to check a number of points. First, as of Evergreen 2.0 we generate call number "sort keys" that are used for sorting purposes. (I can't remember but I think you're on 2.0 by now?). I'm guessing that you still have your library set to the default call number classification scheme, so the first thing we'll need to do is check to ensure that the sort keys that we generate aren't doing something bad to diacritic characters. Second, sorting depends on the glibc locale environment on your database server - the collating sequence for the locale your database has been created in should match the behaviour of the "sort" command from the command line on the database server for each given locale. For example, if I create the text file "sortme" that contains the following lines: C Č C Č I can then test the collation behaviour of different locales. On my laptop running Fedora 15, for example, using the 'C' locale that we expect the database to be using: $ LANG=C sort sortme C C Č Č And then if I switch to the cs_CZ UTF8 locale: $ LANG=cs_CZ.utf8 sort sortme C C Č Č But if I switch to the cs_CZ ISO locale: $ LANG=cs_CZ.iso88592 sort sortme Č Č C C So - it looks like on my environment I would expect to get the appropriate sorting sequence for call numbers using the recommended 'C' locale for the database. It's possible that your database server's locale environment does not match this behaviour, of course. Finally, the actual call number sort key is then wrapped in an oils_text_as_bytea(label_sortkey) function when we sort the results of a call number browse, which converts the sortkey at run time to a bytea data type. Checking out the definition of bytea strings, I had the sinking feeling that this was the reason for your problems. To paste from the PostgreSQL documentation: "operations on binary strings process the actual bytes, whereas the processing of character strings depends on locale settings. In short, binary strings are appropriate for storing data that the programmer thinks of as "raw bytes", whereas character strings are appropriate for storing text." In addition, But then, when I tested this theory on a PostgreSQL 9.0 database created with the C locale, that theory of doom doesn't seem to hold up: -- Show that we are using the right database locale SHOW LC_COLLATE; lc_collate C (1 row) -- Create a test table to try out our theory CREATE TABLE test_bytea(input TEXT, output BYTEA); -- Insert our sample data INSERT INTO test_bytea(input) VALUES ('C'), ('Č'), ('C'), ('Č'); -- Create the BYTEA version of the text in the output column UPDATE test_bytea SET output = oils_text_as_bytea(input); -- Get the table as sorted by the untouched text SELECT input, output FROM test_bytea ORDER BY input ASC; input | output ---+ C | \x43 C | \x43 Č | \xc48c Č | \xc48c (4 rows) -- Now get the table as sorted by the BYTEA version of the strings SELECT input, output FROM test_bytea ORDER BY output ASC; input | output ---+ C | \x43 C | \x43 Č | \xc48c Č | \xc48c (4 rows) And... as you can see, I'm still getting the expected sort order that you want - even though the string has been converted to a BYTEA column (and the oils_text_as_bytea() function throws in an UPPER() call that also normally raises a warning flag that non-ASCII data may be destroyed). So - I'm having trouble reproducing the problem that you're seeing with these simple tests. Maybe you can have your systems people try out these tests on your database server to see if they get the same results?
Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers
Hi Linda: 2011/7/18 Linda Jansova : > 3) In case we use diacritics in call numbers, is there a way to make the > Shelf Browser show first the particular letters letters without diacritics > and then those with diacritics? E.g. to have the letter "Č" after the letter > "C" instead of somewhere at the beginning of the virtual shelf? (We are > especially interested in Czech alphabet - the correct order of letters is > available at http://en.wikipedia.org/wiki/Czech_alphabet.) Or maybe is it > better do do without and use only call numbers with decent ;-) letters? You would think that the answer to this would be simple, but it looks like the answer will require us to check a number of points. First, as of Evergreen 2.0 we generate call number "sort keys" that are used for sorting purposes. (I can't remember but I think you're on 2.0 by now?). I'm guessing that you still have your library set to the default call number classification scheme, so the first thing we'll need to do is check to ensure that the sort keys that we generate aren't doing something bad to diacritic characters. Second, sorting depends on the glibc locale environment on your database server - the collating sequence for the locale your database has been created in should match the behaviour of the "sort" command from the command line on the database server for each given locale. For example, if I create the text file "sortme" that contains the following lines: C Č C Č I can then test the collation behaviour of different locales. On my laptop running Fedora 15, for example, using the 'C' locale that we expect the database to be using: $ LANG=C sort sortme C C Č Č And then if I switch to the cs_CZ UTF8 locale: $ LANG=cs_CZ.utf8 sort sortme C C Č Č But if I switch to the cs_CZ ISO locale: $ LANG=cs_CZ.iso88592 sort sortme Č Č C C So - it looks like on my environment I would expect to get the appropriate sorting sequence for call numbers using the recommended 'C' locale for the database. It's possible that your database server's locale environment does not match this behaviour, of course. Finally, the actual call number sort key is then wrapped in an oils_text_as_bytea(label_sortkey) function when we sort the results of a call number browse, which converts the sortkey at run time to a bytea data type. Checking out the definition of bytea strings, I had the sinking feeling that this was the reason for your problems. To paste from the PostgreSQL documentation: "operations on binary strings process the actual bytes, whereas the processing of character strings depends on locale settings. In short, binary strings are appropriate for storing data that the programmer thinks of as "raw bytes", whereas character strings are appropriate for storing text." In addition, But then, when I tested this theory on a PostgreSQL 9.0 database created with the C locale, that theory of doom doesn't seem to hold up: -- Show that we are using the right database locale SHOW LC_COLLATE; lc_collate C (1 row) -- Create a test table to try out our theory CREATE TABLE test_bytea(input TEXT, output BYTEA); -- Insert our sample data INSERT INTO test_bytea(input) VALUES ('C'), ('Č'), ('C'), ('Č'); -- Create the BYTEA version of the text in the output column UPDATE test_bytea SET output = oils_text_as_bytea(input); -- Get the table as sorted by the untouched text SELECT input, output FROM test_bytea ORDER BY input ASC; input | output ---+ C | \x43 C | \x43 Č | \xc48c Č | \xc48c (4 rows) -- Now get the table as sorted by the BYTEA version of the strings SELECT input, output FROM test_bytea ORDER BY output ASC; input | output ---+ C | \x43 C | \x43 Č | \xc48c Č | \xc48c (4 rows) And... as you can see, I'm still getting the expected sort order that you want - even though the string has been converted to a BYTEA column (and the oils_text_as_bytea() function throws in an UPPER() call that also normally raises a warning flag that non-ASCII data may be destroyed). So - I'm having trouble reproducing the problem that you're seeing with these simple tests. Maybe you can have your systems people try out these tests on your database server to see if they get the same results?
Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers
On #2, you can use an Asset Statistical Category to store some of that easily. Thomas Berezansky Merrimack Valley Library Consortium Quoting Linda Jansova : Hi, I'm just posting my questions (see below) once again. So far there has been no reply but - as you can see - I haven't given up yet :-)... Thank you very much in advance :-)! Linda Dne 18.7.2011 17:06, Linda Jansova napsal(a): Hi all, Three more questions to ask: 1) In the Czech Republic, libraries usually use accession numbers for individual copies. These are not accession numbers related to bib records but solely to individual copies. Is it okay to insert these in the "copy number" field in Holdings Maintenance? And if so, how can it be inserted during import? Can the staging table (http://open-ils.org/dokuwiki/doku.php?id=importing:holdings:import_via_staging_table) be used for this purpose? 2) In case we wish to record how a particular copy has been aquired (bought, received as a gift etc.), should we use "copy notes" (say with a locally standardized copy note title distinguishable from other types of copy notes)? 3) In case we use diacritics in call numbers, is there a way to make the Shelf Browser show first the particular letters letters without diacritics and then those with diacritics? E.g. to have the letter "Č" after the letter "C" instead of somewhere at the beginning of the virtual shelf? (We are especially interested in Czech alphabet - the correct order of letters is available at http://en.wikipedia.org/wiki/Czech_alphabet.) Or maybe is it better do do without and use only call numbers with decent ;-) letters? Thank you in advance for sharing any thoughts and ideas :-)! Linda
Re: [OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers
Hi, I'm just posting my questions (see below) once again. So far there has been no reply but - as you can see - I haven't given up yet :-)... Thank you very much in advance :-)! Linda Dne 18.7.2011 17:06, Linda Jansova napsal(a): Hi all, Three more questions to ask: 1) In the Czech Republic, libraries usually use accession numbers for individual copies. These are not accession numbers related to bib records but solely to individual copies. Is it okay to insert these in the "copy number" field in Holdings Maintenance? And if so, how can it be inserted during import? Can the staging table (http://open-ils.org/dokuwiki/doku.php?id=importing:holdings:import_via_staging_table) be used for this purpose? 2) In case we wish to record how a particular copy has been aquired (bought, received as a gift etc.), should we use "copy notes" (say with a locally standardized copy note title distinguishable from other types of copy notes)? 3) In case we use diacritics in call numbers, is there a way to make the Shelf Browser show first the particular letters letters without diacritics and then those with diacritics? E.g. to have the letter "Č" after the letter "C" instead of somewhere at the beginning of the virtual shelf? (We are especially interested in Czech alphabet - the correct order of letters is available at http://en.wikipedia.org/wiki/Czech_alphabet.) Or maybe is it better do do without and use only call numbers with decent ;-) letters? Thank you in advance for sharing any thoughts and ideas :-)! Linda
[OPEN-ILS-GENERAL] Accession numbers, copy notes and diacritics in call numbers
Hi all, Three more questions to ask: 1) In the Czech Republic, libraries usually use accession numbers for individual copies. These are not accession numbers related to bib records but solely to individual copies. Is it okay to insert these in the "copy number" field in Holdings Maintenance? And if so, how can it be inserted during import? Can the staging table (http://open-ils.org/dokuwiki/doku.php?id=importing:holdings:import_via_staging_table) be used for this purpose? 2) In case we wish to record how a particular copy has been aquired (bought, received as a gift etc.), should we use "copy notes" (say with a locally standardized copy note title distinguishable from other types of copy notes)? 3) In case we use diacritics in call numbers, is there a way to make the Shelf Browser show first the particular letters letters without diacritics and then those with diacritics? E.g. to have the letter "Č" after the letter "C" instead of somewhere at the beginning of the virtual shelf? (We are especially interested in Czech alphabet - the correct order of letters is available at http://en.wikipedia.org/wiki/Czech_alphabet.) Or maybe is it better do do without and use only call numbers with decent ;-) letters? Thank you in advance for sharing any thoughts and ideas :-)! Linda