Hi,

I have also tried the SRU search (using yaz-client) and tried some CQL queries.

It seems to work fine (even for letters with diacritics) for dc.title, dc.contributor and dc.publisher. However, when trying dc.creator, no results are returned both for queries with and without diacritics. When dc.author is replaced by eg.author, it works fine. Also when searching without specifying anything (say find "matousek" or "find matoušek") it works okay:

*find:*
$ yaz-client http://mojzis.jabok.cuni.cz/opac/extras/sru
Connecting...OK.
Z> sru GET 1.1
Z> find matousek
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.708890

Z> find matoušek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.675457

*find dc.creator:*
Z> find dc.creator=matousek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 0
Elapsed: 0.192852

Z> find dc.creator=matoušek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 0
Elapsed: 0.238054

*find eg.author:*
Z> find eg.author=matousek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.663780

Z> find eg.author=matoušek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.861588

*find dc.title:*
Z> find dc.title=buh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 69
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.953206

Z> find dc.title=bůh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 69
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.509182
**
*find dc.subject:*
Z> find dc.subject=buh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 243
SRU server returns extra records. Skipping 10 records.
Elapsed: 2.415498

Z> find dc.subject=bůh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 243
SRU server returns extra records. Skipping 10 records.
Elapsed: 1.595494
*
**find dc.contributor:*
Z> find dc.contributor=dvorak
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 44
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.621795
Z> find dc.contributor=dvořák
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 44
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.843226

*find dc.publisher:*
Z> find dc.publisher=portal
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 40
SRU server returns extra records. Skipping 10 records.
Elapsed: 1.012238

Z> find dc.publisher=portál
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 40
SRU server returns extra records. Skipping 10 records.

I have also found out there is a charset command which gives the following results:

yaz-client http://mojzis.jabok.cuni.cz/opac/extras/sru
Connecting...OK.
Z> sru GET 1.1
Z> charset
Negotiation character set `none'
Display character set is `UTF-8'
MARC character set is `none'
Query character set is `none'

Do you think this could be related to our Z39.50 client query encoding issues? Should the charset be - somewhere - specifically set to utf-8?

I have also found a bug reported by Jason Stephenson about a year ago (https://bugs.launchpad.net/evergreen/+bug/1346518) which seems to describe the same problem but it has probably not been looked into since then.

Thank you in advance for any clues!

Linda

On 08/25/2015 02:51 PM, Linda Jansova wrote:
Hi again,

We have installed a 2.8.3 and tested the Z39.50 server output, yet the problem with diacritics remains the same. Again, when submitting a generic query with diacritics (using yaz-client), we get the expected results, but when searching specifically for author, it results in no hits.

I have had a look at https://coffeecode.net/archives/217-More-granular-identifier-indexes-for-your-Evergreen-SRU-Z39.50-servers.html again and it seems to me that maybe the problem is caused by the fact that with the exception of keyword, each of the other indexes mentioned at Dan's blog (author, title, series, subject) is actually composed of more granular indexes.

Maybe I am on the wrong track but it seems to me that it would be rather a strange coincidence that keyword index (which I suppose the Z39.50 client would use when no fields are specified) works fine unlike those other "composed" indexes...

Do you think that this could be the reason why we are experiencing the encoding problems? And if so, do you have any idea where to look for the appropriate encoding settings and change them to utf-8?

Thank you in advance for any clues to the puzzle!

Linda

On 08/19/2015 03:40 PM, Linda Jansova wrote:
Oh, I see! In that case we shall try the upgrade and see what happens (we shall keep you posted :-)...

Thank you for your help!

Linda

On 08/19/2015 03:34 PM, Jason Stephenson wrote:
Quoting Linda Jansova <skolk...@chello.cz>:

Thank you, Jason!

I have actually come across this bug as well but it seems that it has already been fixed (or at least this is my understanding of information from Launchpad) - we are currently using Evergreen 2.8.2...

The fix only went in last night. It will be in today's release of 2.8.3,
so it might be worth the upgrade to see if it helps.


And you hit the nail on the head - we also usually search other sources and so it took quite some time to discover the problem...

Linda






Reply via email to