Hi,
I have also tried the SRU search (using yaz-client) and tried some CQL
queries.
It seems to work fine (even for letters with diacritics) for dc.title,
dc.contributor and dc.publisher. However, when trying dc.creator, no
results are returned both for queries with and without diacritics. When
dc.author is replaced by eg.author, it works fine. Also when searching
without specifying anything (say find "matousek" or "find matoušek") it
works okay:
*find:*
$ yaz-client http://mojzis.jabok.cuni.cz/opac/extras/sru
Connecting...OK.
Z> sru GET 1.1
Z> find matousek
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.708890
Z> find matoušek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.675457
*find dc.creator:*
Z> find dc.creator=matousek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 0
Elapsed: 0.192852
Z> find dc.creator=matoušek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 0
Elapsed: 0.238054
*find eg.author:*
Z> find eg.author=matousek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.663780
Z> find eg.author=matoušek
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 34
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.861588
*find dc.title:*
Z> find dc.title=buh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 69
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.953206
Z> find dc.title=bůh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 69
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.509182
**
*find dc.subject:*
Z> find dc.subject=buh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 243
SRU server returns extra records. Skipping 10 records.
Elapsed: 2.415498
Z> find dc.subject=bůh
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 243
SRU server returns extra records. Skipping 10 records.
Elapsed: 1.595494
*
**find dc.contributor:*
Z> find dc.contributor=dvorak
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 44
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.621795
Z> find dc.contributor=dvořák
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 44
SRU server returns extra records. Skipping 10 records.
Elapsed: 0.843226
*find dc.publisher:*
Z> find dc.publisher=portal
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 40
SRU server returns extra records. Skipping 10 records.
Elapsed: 1.012238
Z> find dc.publisher=portál
Connecting...OK.
Received SRW SearchRetrieve Response
Number of hits: 40
SRU server returns extra records. Skipping 10 records.
I have also found out there is a charset command which gives the
following results:
yaz-client http://mojzis.jabok.cuni.cz/opac/extras/sru
Connecting...OK.
Z> sru GET 1.1
Z> charset
Negotiation character set `none'
Display character set is `UTF-8'
MARC character set is `none'
Query character set is `none'
Do you think this could be related to our Z39.50 client query encoding
issues? Should the charset be - somewhere - specifically set to utf-8?
I have also found a bug reported by Jason Stephenson about a year ago
(https://bugs.launchpad.net/evergreen/+bug/1346518) which seems to
describe the same problem but it has probably not been looked into since
then.
Thank you in advance for any clues!
Linda
On 08/25/2015 02:51 PM, Linda Jansova wrote:
Hi again,
We have installed a 2.8.3 and tested the Z39.50 server output, yet the
problem with diacritics remains the same. Again, when submitting a
generic query with diacritics (using yaz-client), we get the expected
results, but when searching specifically for author, it results in no
hits.
I have had a look at
https://coffeecode.net/archives/217-More-granular-identifier-indexes-for-your-Evergreen-SRU-Z39.50-servers.html
again and it seems to me that maybe the problem is caused by the fact
that with the exception of keyword, each of the other indexes
mentioned at Dan's blog (author, title, series, subject) is actually
composed of more granular indexes.
Maybe I am on the wrong track but it seems to me that it would be
rather a strange coincidence that keyword index (which I suppose the
Z39.50 client would use when no fields are specified) works fine
unlike those other "composed" indexes...
Do you think that this could be the reason why we are experiencing the
encoding problems? And if so, do you have any idea where to look for
the appropriate encoding settings and change them to utf-8?
Thank you in advance for any clues to the puzzle!
Linda
On 08/19/2015 03:40 PM, Linda Jansova wrote:
Oh, I see! In that case we shall try the upgrade and see what happens
(we shall keep you posted :-)...
Thank you for your help!
Linda
On 08/19/2015 03:34 PM, Jason Stephenson wrote:
Quoting Linda Jansova <skolk...@chello.cz>:
Thank you, Jason!
I have actually come across this bug as well but it seems that it
has already been fixed (or at least this is my understanding of
information from Launchpad) - we are currently using Evergreen
2.8.2...
The fix only went in last night. It will be in today's release of
2.8.3,
so it might be worth the upgrade to see if it helps.
And you hit the nail on the head - we also usually search other
sources and so it took quite some time to discover the problem...
Linda