Re: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues
Hi, I have also tried the SRU search (using yaz-client) and tried some CQL queries. It seems to work fine (even for letters with diacritics) for dc.title, dc.contributor and dc.publisher. However, when trying dc.creator, no results are returned both for queries with and without diacritics. When dc.author is replaced by eg.author, it works fine. Also when searching without specifying anything (say find "matousek" or "find matoušek") it works okay: *find:* $ yaz-client http://mojzis.jabok.cuni.cz/opac/extras/sru Connecting...OK. Z> sru GET 1.1 Z> find matousek Received SRW SearchRetrieve Response Number of hits: 34 SRU server returns extra records. Skipping 10 records. Elapsed: 0.708890 Z> find matoušek Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 34 SRU server returns extra records. Skipping 10 records. Elapsed: 0.675457 *find dc.creator:* Z> find dc.creator=matousek Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 0 Elapsed: 0.192852 Z> find dc.creator=matoušek Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 0 Elapsed: 0.238054 *find eg.author:* Z> find eg.author=matousek Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 34 SRU server returns extra records. Skipping 10 records. Elapsed: 0.663780 Z> find eg.author=matoušek Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 34 SRU server returns extra records. Skipping 10 records. Elapsed: 0.861588 *find dc.title:* Z> find dc.title=buh Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 69 SRU server returns extra records. Skipping 10 records. Elapsed: 0.953206 Z> find dc.title=bůh Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 69 SRU server returns extra records. Skipping 10 records. Elapsed: 0.509182 ** *find dc.subject:* Z> find dc.subject=buh Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 243 SRU server returns extra records. Skipping 10 records. Elapsed: 2.415498 Z> find dc.subject=bůh Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 243 SRU server returns extra records. Skipping 10 records. Elapsed: 1.595494 * **find dc.contributor:* Z> find dc.contributor=dvorak Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 44 SRU server returns extra records. Skipping 10 records. Elapsed: 0.621795 Z> find dc.contributor=dvořák Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 44 SRU server returns extra records. Skipping 10 records. Elapsed: 0.843226 *find dc.publisher:* Z> find dc.publisher=portal Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 40 SRU server returns extra records. Skipping 10 records. Elapsed: 1.012238 Z> find dc.publisher=portál Connecting...OK. Received SRW SearchRetrieve Response Number of hits: 40 SRU server returns extra records. Skipping 10 records. I have also found out there is a charset command which gives the following results: yaz-client http://mojzis.jabok.cuni.cz/opac/extras/sru Connecting...OK. Z> sru GET 1.1 Z> charset Negotiation character set `none' Display character set is `UTF-8' MARC character set is `none' Query character set is `none' Do you think this could be related to our Z39.50 client query encoding issues? Should the charset be - somewhere - specifically set to utf-8? I have also found a bug reported by Jason Stephenson about a year ago (https://bugs.launchpad.net/evergreen/+bug/1346518) which seems to describe the same problem but it has probably not been looked into since then. Thank you in advance for any clues! Linda On 08/25/2015 02:51 PM, Linda Jansova wrote: Hi again, We have installed a 2.8.3 and tested the Z39.50 server output, yet the problem with diacritics remains the same. Again, when submitting a generic query with diacritics (using yaz-client), we get the expected results, but when searching specifically for author, it results in no hits. I have had a look at https://coffeecode.net/archives/217-More-granular-identifier-indexes-for-your-Evergreen-SRU-Z39.50-servers.html again and it seems to me that maybe the problem is caused by the fact that with the exception of keyword, each of the other indexes mentioned at Dan's blog (author, title, series, subject) is actually composed of more granular indexes. Maybe I am on the wrong track but it seems to me that it would be rather a strange coincidence that keyword index (which I suppose the Z39.50 client would use when no fields are specified) works fine unlike those other "composed" indexes... Do you think that this could be the reason why we are experiencing the encoding problems? And if so, do you have any idea where to look for the appropriate encoding settings and change them to utf-8? Thank you in advance for any clues to the puzzle! Linda On 08/19/2015 03:40 PM, Linda Jansova wrote: Oh, I see! In that case we shall try the upgrade and see what
Re: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues
Hi again, We have installed a 2.8.3 and tested the Z39.50 server output, yet the problem with diacritics remains the same. Again, when submitting a generic query with diacritics (using yaz-client), we get the expected results, but when searching specifically for author, it results in no hits. I have had a look at https://coffeecode.net/archives/217-More-granular-identifier-indexes-for-your-Evergreen-SRU-Z39.50-servers.html again and it seems to me that maybe the problem is caused by the fact that with the exception of keyword, each of the other indexes mentioned at Dan's blog (author, title, series, subject) is actually composed of more granular indexes. Maybe I am on the wrong track but it seems to me that it would be rather a strange coincidence that keyword index (which I suppose the Z39.50 client would use when no fields are specified) works fine unlike those other "composed" indexes... Do you think that this could be the reason why we are experiencing the encoding problems? And if so, do you have any idea where to look for the appropriate encoding settings and change them to utf-8? Thank you in advance for any clues to the puzzle! Linda On 08/19/2015 03:40 PM, Linda Jansova wrote: Oh, I see! In that case we shall try the upgrade and see what happens (we shall keep you posted :-)... Thank you for your help! Linda On 08/19/2015 03:34 PM, Jason Stephenson wrote: Quoting Linda Jansova : Thank you, Jason! I have actually come across this bug as well but it seems that it has already been fixed (or at least this is my understanding of information from Launchpad) - we are currently using Evergreen 2.8.2... The fix only went in last night. It will be in today's release of 2.8.3, so it might be worth the upgrade to see if it helps. And you hit the nail on the head - we also usually search other sources and so it took quite some time to discover the problem... Linda
Re: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues
Ahh -- sorry I misunderstood -- I don't read computerize and didn't follow all the stuff about servers. For a variety of reasons, we don't allow access to the PINES database via Z39.50 Elaine J. Elaine Hardy PINES & Collaborative Projects Manager Georgia Public Library Service 1800 Century Place, Ste 150 Atlanta, Ga. 30345-4304 404.235.7128 404.235.7201, fax eha...@georgialibraries.org www.georgialibraries.org www.georgialibraries.org/pines -Original Message- From: Open-ils-general [mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of Linda Jansova Sent: Wednesday, August 19, 2015 9:16 AM To: open-ils-general@list.georgialibraries.org Subject: Re: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues Thank you, Elaine! Our problem, however, is not related to making searches in external databases (such as Library of Congress or OCLC), this works fine. (I apologize for writing such a long message which has probably resulted in being less undestandable than desired.) The problem occurs when someone else who acts as Z39.50 client wishes to query our database (playing the role of Z39.50 server). In our case we would like the library gateway for the disabled (especially for the blind people) to be able to query our database which contains quite a lot of documents which may be of interest to those using the gateway. As I could not find your Z39.50 server info (especially host, port, database), I could not verify if those wishing to query your database could experience the same difficulties. But since I have encountered them at Laurentian University I believe it may be the same for other Evergreen installations... Linda On 08/19/2015 02:56 PM, Hardy, Elaine wrote: > I was able to retrieve 3539 hits in a search of OCLC for author > matoušek and for author matousek through our Z39.50 gateway. I am > afraid I can't help with anything else other than that it does work in > our Z39.50 instance with OCLC. > > We have had occasional problems with some diacritics and with some > language scripts. It is a minor issue for us; however, and I have been > able to use Vandelay to bring in the individual record that didn't > retrieve via the > Z39.50 connection. I believe it was a record with a parallel title in > Turkish. Occasionally, an OCLC record will have a nonUTF-8 character > which will also block retrieval; but, that is a simple matter of > correcting the record in OCLC. > > > Elaine > > > J. Elaine Hardy > PINES & Collaborative Projects Manager Georgia Public Library Service > 1800 Century Place, Ste 150 > Atlanta, Ga. 30345-4304 > > > 404.235.7128 > 404.235.7201, fax > eha...@georgialibraries.org > www.georgialibraries.org > www.georgialibraries.org/pines > > -Original Message- > From: Open-ils-general > [mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of > Linda Jansova > Sent: Wednesday, August 19, 2015 3:51 AM > To: Evergreen Discussion Group > Subject: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues > > Hi all, > > Jabok Library currently uses Evergreen 2.8.2 and we have successfully > changed charsets both for and (in the configuration > files > mentioned at > http://docs.evergreen-ils.org/2.1/html/Z3950serversupport.html) to utf-8 > and > so now Z39.50 clients can receive data (records) with the correct > diacritics. > > However, one related problem still persists - the Z39.50 queries only work > when no diacritics are used. Eg. search results are returned when we > submit > a query "matousek" (author's surname) but no results are reported when the > correct version "matoušek" is used. > > We have tried the following but to no avail: > > 1) add element client_query_charset to gfs (according to > http://www.indexdata.com/yaz/doc/server.vhosts.html) but it was an unknown > element; > > 2) delete the second mention of "encoding="utf-8"" from > /xsl/MARC21slim2SRWDC.xsl and restart the open-ils.supercat service, > hoping > that this procedure would have similar results like when MODS stylesheets > were treated in the same way to resolve our Zotero encoding problems (see > https://bugs.launchpad.net/evergreen/+bug/1442276). > > We have also tried further query testing in yaz-client. In this case, some > interesting things happened: > > When yaz-client was used for a generic query "find matoušek" (i.e., with > diacritics), the answer was 34 hits: > > Z> find matoušek > Sent searchRequest. > Received SearchResponse. > Search was a success. > Number of hits: 34, setno 1 > records returned: 0 > Elapsed: 0.681894 > > However, when searching specifically for author (with diacritic
Re: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues
Oh, I see! In that case we shall try the upgrade and see what happens (we shall keep you posted :-)... Thank you for your help! Linda On 08/19/2015 03:34 PM, Jason Stephenson wrote: Quoting Linda Jansova : Thank you, Jason! I have actually come across this bug as well but it seems that it has already been fixed (or at least this is my understanding of information from Launchpad) - we are currently using Evergreen 2.8.2... The fix only went in last night. It will be in today's release of 2.8.3, so it might be worth the upgrade to see if it helps. And you hit the nail on the head - we also usually search other sources and so it took quite some time to discover the problem... Linda
Re: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues
Quoting Linda Jansova : Thank you, Jason! I have actually come across this bug as well but it seems that it has already been fixed (or at least this is my understanding of information from Launchpad) - we are currently using Evergreen 2.8.2... The fix only went in last night. It will be in today's release of 2.8.3, so it might be worth the upgrade to see if it helps. And you hit the nail on the head - we also usually search other sources and so it took quite some time to discover the problem... Linda -- Jason Stephenson Assistant Director for Technology Services Merrimack Valley Library Consortium 4 High ST, Suite 175 North Andover, MA 01845 Phone: 978-557-5891 Email: jstephen...@mvlc.org
Re: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues
Thank you, Jason! I have actually come across this bug as well but it seems that it has already been fixed (or at least this is my understanding of information from Launchpad) - we are currently using Evergreen 2.8.2... And you hit the nail on the head - we also usually search other sources and so it took quite some time to discover the problem... Linda On 08/19/2015 03:20 PM, Jason Stephenson wrote: Hello, Linda! I cannot speak to your exact problem, but I have noticed similar issues when searching z39.50 in general, particularly when searching other Evergreen systems. (We don't search our own via Z39.50 except for the occasional test.) I can tell you that Evergreen's Z39.50 ends up going through SRU (Simple Retrieval by URL). There has been a recent bug fix for Evergreen's SRU and double encoding of characters: https://bugs.launchpad.net/bugs/1431541 I just thought that I would share that information, in case you were not aware of it. Hope that helps, Jason
Re: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues
Hello, Linda! I cannot speak to your exact problem, but I have noticed similar issues when searching z39.50 in general, particularly when searching other Evergreen systems. (We don't search our own via Z39.50 except for the occasional test.) I can tell you that Evergreen's Z39.50 ends up going through SRU (Simple Retrieval by URL). There has been a recent bug fix for Evergreen's SRU and double encoding of characters: https://bugs.launchpad.net/bugs/1431541 I just thought that I would share that information, in case you were not aware of it. Hope that helps, Jason -- Jason Stephenson Assistant Director for Technology Services Merrimack Valley Library Consortium 4 High ST, Suite 175 North Andover, MA 01845 Phone: 978-557-5891 Email: jstephen...@mvlc.org
Re: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues
Thank you, Elaine! Our problem, however, is not related to making searches in external databases (such as Library of Congress or OCLC), this works fine. (I apologize for writing such a long message which has probably resulted in being less undestandable than desired.) The problem occurs when someone else who acts as Z39.50 client wishes to query our database (playing the role of Z39.50 server). In our case we would like the library gateway for the disabled (especially for the blind people) to be able to query our database which contains quite a lot of documents which may be of interest to those using the gateway. As I could not find your Z39.50 server info (especially host, port, database), I could not verify if those wishing to query your database could experience the same difficulties. But since I have encountered them at Laurentian University I believe it may be the same for other Evergreen installations... Linda On 08/19/2015 02:56 PM, Hardy, Elaine wrote: I was able to retrieve 3539 hits in a search of OCLC for author matoušek and for author matousek through our Z39.50 gateway. I am afraid I can't help with anything else other than that it does work in our Z39.50 instance with OCLC. We have had occasional problems with some diacritics and with some language scripts. It is a minor issue for us; however, and I have been able to use Vandelay to bring in the individual record that didn't retrieve via the Z39.50 connection. I believe it was a record with a parallel title in Turkish. Occasionally, an OCLC record will have a nonUTF-8 character which will also block retrieval; but, that is a simple matter of correcting the record in OCLC. Elaine J. Elaine Hardy PINES & Collaborative Projects Manager Georgia Public Library Service 1800 Century Place, Ste 150 Atlanta, Ga. 30345-4304 404.235.7128 404.235.7201, fax eha...@georgialibraries.org www.georgialibraries.org www.georgialibraries.org/pines -Original Message- From: Open-ils-general [mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of Linda Jansova Sent: Wednesday, August 19, 2015 3:51 AM To: Evergreen Discussion Group Subject: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues Hi all, Jabok Library currently uses Evergreen 2.8.2 and we have successfully changed charsets both for and (in the configuration files mentioned at http://docs.evergreen-ils.org/2.1/html/Z3950serversupport.html) to utf-8 and so now Z39.50 clients can receive data (records) with the correct diacritics. However, one related problem still persists - the Z39.50 queries only work when no diacritics are used. Eg. search results are returned when we submit a query "matousek" (author's surname) but no results are reported when the correct version "matoušek" is used. We have tried the following but to no avail: 1) add element client_query_charset to gfs (according to http://www.indexdata.com/yaz/doc/server.vhosts.html) but it was an unknown element; 2) delete the second mention of "encoding="utf-8"" from /xsl/MARC21slim2SRWDC.xsl and restart the open-ils.supercat service, hoping that this procedure would have similar results like when MODS stylesheets were treated in the same way to resolve our Zotero encoding problems (see https://bugs.launchpad.net/evergreen/+bug/1442276). We have also tried further query testing in yaz-client. In this case, some interesting things happened: When yaz-client was used for a generic query "find matoušek" (i.e., with diacritics), the answer was 34 hits: Z> find matoušek Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 34, setno 1 records returned: 0 Elapsed: 0.681894 However, when searching specifically for author (with diacritics again), the answer was zero hits: Z> find @attr 1=1003 @attr 2=3 "matoušek" Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 0, setno 12 records returned: 0 Elapsed: 0.117265 When diacritics were omitted, we got 34 hits again: Z> find @attr 1=1003 @attr 2=3 "matousek" Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 34, setno 13 records returned: 0 Elapsed: 0.637897 Our Z39.50 server runs at mojzis.jabok.cuni.cz (port , database Jabok) and it now uses the utf-8 encoding. When we have tried Laurentian (laurentian.concat.ca, port 210, database OSUL), we have used a word "francais" and "français" (searching for a person in Tellico), in case of "francais" we got the results but when asking for "français", no results were found. So probably it is not just our case... Do you have any ideas what we could do to make the queries with diacritics work correctly? Thank you in advance for any hints! Linda
Re: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues
I was able to retrieve 3539 hits in a search of OCLC for author matoušek and for author matousek through our Z39.50 gateway. I am afraid I can't help with anything else other than that it does work in our Z39.50 instance with OCLC. We have had occasional problems with some diacritics and with some language scripts. It is a minor issue for us; however, and I have been able to use Vandelay to bring in the individual record that didn't retrieve via the Z39.50 connection. I believe it was a record with a parallel title in Turkish. Occasionally, an OCLC record will have a nonUTF-8 character which will also block retrieval; but, that is a simple matter of correcting the record in OCLC. Elaine J. Elaine Hardy PINES & Collaborative Projects Manager Georgia Public Library Service 1800 Century Place, Ste 150 Atlanta, Ga. 30345-4304 404.235.7128 404.235.7201, fax eha...@georgialibraries.org www.georgialibraries.org www.georgialibraries.org/pines -Original Message- From: Open-ils-general [mailto:open-ils-general-boun...@list.georgialibraries.org] On Behalf Of Linda Jansova Sent: Wednesday, August 19, 2015 3:51 AM To: Evergreen Discussion Group Subject: [OPEN-ILS-GENERAL] Z39.50 client query encoding issues Hi all, Jabok Library currently uses Evergreen 2.8.2 and we have successfully changed charsets both for and (in the configuration files mentioned at http://docs.evergreen-ils.org/2.1/html/Z3950serversupport.html) to utf-8 and so now Z39.50 clients can receive data (records) with the correct diacritics. However, one related problem still persists - the Z39.50 queries only work when no diacritics are used. Eg. search results are returned when we submit a query "matousek" (author's surname) but no results are reported when the correct version "matoušek" is used. We have tried the following but to no avail: 1) add element client_query_charset to gfs (according to http://www.indexdata.com/yaz/doc/server.vhosts.html) but it was an unknown element; 2) delete the second mention of "encoding="utf-8"" from /xsl/MARC21slim2SRWDC.xsl and restart the open-ils.supercat service, hoping that this procedure would have similar results like when MODS stylesheets were treated in the same way to resolve our Zotero encoding problems (see https://bugs.launchpad.net/evergreen/+bug/1442276). We have also tried further query testing in yaz-client. In this case, some interesting things happened: When yaz-client was used for a generic query "find matoušek" (i.e., with diacritics), the answer was 34 hits: Z> find matoušek Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 34, setno 1 records returned: 0 Elapsed: 0.681894 However, when searching specifically for author (with diacritics again), the answer was zero hits: Z> find @attr 1=1003 @attr 2=3 "matoušek" Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 0, setno 12 records returned: 0 Elapsed: 0.117265 When diacritics were omitted, we got 34 hits again: Z> find @attr 1=1003 @attr 2=3 "matousek" Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 34, setno 13 records returned: 0 Elapsed: 0.637897 Our Z39.50 server runs at mojzis.jabok.cuni.cz (port , database Jabok) and it now uses the utf-8 encoding. When we have tried Laurentian (laurentian.concat.ca, port 210, database OSUL), we have used a word "francais" and "français" (searching for a person in Tellico), in case of "francais" we got the results but when asking for "français", no results were found. So probably it is not just our case... Do you have any ideas what we could do to make the queries with diacritics work correctly? Thank you in advance for any hints! Linda
[OPEN-ILS-GENERAL] Z39.50 client query encoding issues
Hi all, Jabok Library currently uses Evergreen 2.8.2 and we have successfully changed charsets both for and (in the configuration files mentioned at http://docs.evergreen-ils.org/2.1/html/Z3950serversupport.html) to utf-8 and so now Z39.50 clients can receive data (records) with the correct diacritics. However, one related problem still persists - the Z39.50 queries only work when no diacritics are used. Eg. search results are returned when we submit a query "matousek" (author's surname) but no results are reported when the correct version "matoušek" is used. We have tried the following but to no avail: 1) add element client_query_charset to gfs (according to http://www.indexdata.com/yaz/doc/server.vhosts.html) but it was an unknown element; 2) delete the second mention of "encoding="utf-8"" from /xsl/MARC21slim2SRWDC.xsl and restart the open-ils.supercat service, hoping that this procedure would have similar results like when MODS stylesheets were treated in the same way to resolve our Zotero encoding problems (see https://bugs.launchpad.net/evergreen/+bug/1442276). We have also tried further query testing in yaz-client. In this case, some interesting things happened: When yaz-client was used for a generic query "find matoušek" (i.e., with diacritics), the answer was 34 hits: Z> find matoušek Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 34, setno 1 records returned: 0 Elapsed: 0.681894 However, when searching specifically for author (with diacritics again), the answer was zero hits: Z> find @attr 1=1003 @attr 2=3 "matoušek" Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 0, setno 12 records returned: 0 Elapsed: 0.117265 When diacritics were omitted, we got 34 hits again: Z> find @attr 1=1003 @attr 2=3 "matousek" Sent searchRequest. Received SearchResponse. Search was a success. Number of hits: 34, setno 13 records returned: 0 Elapsed: 0.637897 Our Z39.50 server runs at mojzis.jabok.cuni.cz (port , database Jabok) and it now uses the utf-8 encoding. When we have tried Laurentian (laurentian.concat.ca, port 210, database OSUL), we have used a word "francais" and "français" (searching for a person in Tellico), in case of "francais" we got the results but when asking for "français", no results were found. So probably it is not just our case... Do you have any ideas what we could do to make the queries with diacritics work correctly? Thank you in advance for any hints! Linda