Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
Arash - you might not want to use a straight dump of worldcat catalog records- at least not without the associated holdings information.* There are a lot of quasi-duplicate records that are sufficiently broken that the worldcat de-duplication algorithm refuses to merge them. These records will usually only be used by a handful of institutions; the better records will tend to have more associated holdings. The holdings count should be used to weight the strength of association between class numbers and features. Also, since classification/categorization is something that is usually considered to be a property of works, rather than manifestations, one might get better results by using Work sets for training. I would suggest, er, contacting Thom Hickey. Simon * Well, not precisely holdings - you just need the number of distinct institutions with at least one copy. I call them 'hasings'. On Sat, May 19, 2012 at 8:42 PM, Roy Tennant roytenn...@gmail.com wrote: Arash, Yes, we have made WorldCat available to researchers under a special license agreement. I suggest contacting Thom Hickeyhic...@oclc.org about such an arrangement. Thanks, Roy On Fri, May 18, 2012 at 3:46 AM, Arash.Joorabchi arash.joorab...@ul.ie wrote: Dear Karen, I am conducting a research experiment on automatic text classification and I am trying to retrieve top matching bib records (which include DDC fields) for a set of keyphrases extracted from a given document. So, I suppose this is a rather exceptional use case. In fact, the right approach for this experiment is to process the full dump of WorldCat database directly rather than sending a limited number of queries via the API. I read here: http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ that WorldCat might become available as open linked data in future, which would solve my problem and help similar text mining projects. However, I wonder if it is currently available to researchers under a research/non-commercial use license agreement. Regards, Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Coombs Sent: 17 May 2012 08:37 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I forwarded this thread to the Product Manager for the WorldCat Search API. She responded back that unfortunately this query is not possible using the API at this time. FYI, the SRU interface to WorldCat Search API doesn't currently support any scan type searches either. Is there a particular use case you're trying to support? Know that would help us document this as a possible enhancement. Karen Karen Coombs Senior Product Analyst Web Services OCLC coom...@oclc.org On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi Andy, I am a SRU newbie myself, so I don't know how this could be achieved using scan operations and could not find much info on SRU website (http://www.loc.gov/standards/sru/). As for the wildcards, according to this guide: http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea rchworldcatquickreference.pdf the symbols should be preceded by at least 3 characters, and therefore clauses like: ... AND srw.dd=* ... AND srw.dd=?.* ... AND srw/dd=###.* ... AND srw/dd=?3.* do not work and result in the following error: Diagnostics Identifier: info:srw/diagnostic/1/9 Meaning: Details: Message: Not enough chars in truncated term:Truncated words too short(9) Thanks, Arash From: Houghton,Andrew [mailto:hough...@oclc.org] Sent: 16 May 2012 11:58 To: Arash.Joorabchi Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I'm not an SRU guru, but is it possible to do a scan and look for a postings of zero? Andy. On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi mark, Srw.dd=* does not work either: Identifier: info:srw/diagnostic/1/27 Meaning: Details:srw.dd Message:The index [srw.dd] did not include a searchable value I suppose the only option left is to retrieve everything and filter the results on the client side. Thanks for your quick reply. Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mike Taylor Sent: 16 May 2012 10:43 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set There is no standard way in CQL to express field X is not
Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
Thank you Roy and Simon for the info. As for your second point, I suppose one advantage of using the WorldCat API at this experimental stage is that the returned bib records are already FRBR-ized. Ross - Thanks for the link of Open Library data dump. WorldCat collection is 2 orders of magnitude larger than open library which makes a significant difference considering the skewness and sparsity of bib records classified according to library taxonomies, e.g., DDC, LCC (for more info, see: http://cdm15003.contentdm.oclc.org/cdm/singleitem/collection/p267701coll 27/id/277/rec/28) Thanks, Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero Sent: 22 May 2012 19:47 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set Arash - you might not want to use a straight dump of worldcat catalog records- at least not without the associated holdings information.* There are a lot of quasi-duplicate records that are sufficiently broken that the worldcat de-duplication algorithm refuses to merge them. These records will usually only be used by a handful of institutions; the better records will tend to have more associated holdings. The holdings count should be used to weight the strength of association between class numbers and features. Also, since classification/categorization is something that is usually considered to be a property of works, rather than manifestations, one might get better results by using Work sets for training. I would suggest, er, contacting Thom Hickey. Simon * Well, not precisely holdings - you just need the number of distinct institutions with at least one copy. I call them 'hasings'. On Sat, May 19, 2012 at 8:42 PM, Roy Tennant roytenn...@gmail.com wrote: Arash, Yes, we have made WorldCat available to researchers under a special license agreement. I suggest contacting Thom Hickeyhic...@oclc.org about such an arrangement. Thanks, Roy On Fri, May 18, 2012 at 3:46 AM, Arash.Joorabchi arash.joorab...@ul.ie wrote: Dear Karen, I am conducting a research experiment on automatic text classification and I am trying to retrieve top matching bib records (which include DDC fields) for a set of keyphrases extracted from a given document. So, I suppose this is a rather exceptional use case. In fact, the right approach for this experiment is to process the full dump of WorldCat database directly rather than sending a limited number of queries via the API. I read here: http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ that WorldCat might become available as open linked data in future, which would solve my problem and help similar text mining projects. However, I wonder if it is currently available to researchers under a research/non-commercial use license agreement. Regards, Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Coombs Sent: 17 May 2012 08:37 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I forwarded this thread to the Product Manager for the WorldCat Search API. She responded back that unfortunately this query is not possible using the API at this time. FYI, the SRU interface to WorldCat Search API doesn't currently support any scan type searches either. Is there a particular use case you're trying to support? Know that would help us document this as a possible enhancement. Karen Karen Coombs Senior Product Analyst Web Services OCLC coom...@oclc.org On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi Andy, I am a SRU newbie myself, so I don't know how this could be achieved using scan operations and could not find much info on SRU website (http://www.loc.gov/standards/sru/). As for the wildcards, according to this guide: http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea rchworldcatquickreference.pdf the symbols should be preceded by at least 3 characters, and therefore clauses like: ... AND srw.dd=* ... AND srw.dd=?.* ... AND srw/dd=###.* ... AND srw/dd=?3.* do not work and result in the following error: Diagnostics Identifier: info:srw/diagnostic/1/9 Meaning: Details: Message: Not enough chars in truncated term:Truncated words too short(9) Thanks, Arash From: Houghton,Andrew [mailto:hough...@oclc.org] Sent: 16 May 2012 11:58 To: Arash.Joorabchi Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I'm not an SRU guru, but is it possible to do a scan and look for a postings of zero? Andy. On May
Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
Arash, Yes, we have made WorldCat available to researchers under a special license agreement. I suggest contacting Thom Hickeyhic...@oclc.org about such an arrangement. Thanks, Roy On Fri, May 18, 2012 at 3:46 AM, Arash.Joorabchi arash.joorab...@ul.ie wrote: Dear Karen, I am conducting a research experiment on automatic text classification and I am trying to retrieve top matching bib records (which include DDC fields) for a set of keyphrases extracted from a given document. So, I suppose this is a rather exceptional use case. In fact, the right approach for this experiment is to process the full dump of WorldCat database directly rather than sending a limited number of queries via the API. I read here: http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ that WorldCat might become available as open linked data in future, which would solve my problem and help similar text mining projects. However, I wonder if it is currently available to researchers under a research/non-commercial use license agreement. Regards, Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Coombs Sent: 17 May 2012 08:37 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I forwarded this thread to the Product Manager for the WorldCat Search API. She responded back that unfortunately this query is not possible using the API at this time. FYI, the SRU interface to WorldCat Search API doesn't currently support any scan type searches either. Is there a particular use case you're trying to support? Know that would help us document this as a possible enhancement. Karen Karen Coombs Senior Product Analyst Web Services OCLC coom...@oclc.org On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi Andy, I am a SRU newbie myself, so I don't know how this could be achieved using scan operations and could not find much info on SRU website (http://www.loc.gov/standards/sru/). As for the wildcards, according to this guide: http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea rchworldcatquickreference.pdf the symbols should be preceded by at least 3 characters, and therefore clauses like: ... AND srw.dd=* ... AND srw.dd=?.* ... AND srw/dd=###.* ... AND srw/dd=?3.* do not work and result in the following error: Diagnostics Identifier: info:srw/diagnostic/1/9 Meaning: Details: Message: Not enough chars in truncated term:Truncated words too short(9) Thanks, Arash From: Houghton,Andrew [mailto:hough...@oclc.org] Sent: 16 May 2012 11:58 To: Arash.Joorabchi Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I'm not an SRU guru, but is it possible to do a scan and look for a postings of zero? Andy. On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi mark, Srw.dd=* does not work either: Identifier: info:srw/diagnostic/1/27 Meaning: Details: srw.dd Message: The index [srw.dd] did not include a searchable value I suppose the only option left is to retrieve everything and filter the results on the client side. Thanks for your quick reply. Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mike Taylor Sent: 16 May 2012 10:43 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set There is no standard way in CQL to express field X is not empty. Depending on implementations, NOT srw.dd= might work (but evidently doesn't in this case). Another possibility is srw.dd=*, but again that may or may not work, and might be appallingly inefficient if it does. NOT srw.dd=null will definitely not work: null is not a special word in CQL. -- Mike. On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi all, I am sending SRU queries to the WorldCat in the following form: String host = http://worldcat.org/webservices/catalog/search/;; String query = sru?query=srw.kw=\ + keyword + \ + AND srw.ln exact \eng\ + AND srw.mt all \bks\ + AND srw.nt=\ + keyword + \ + servicelevel=full + maximumRecords=100 + sortKeys=relevance,,0
Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
Dear Karen, I am conducting a research experiment on automatic text classification and I am trying to retrieve top matching bib records (which include DDC fields) for a set of keyphrases extracted from a given document. So, I suppose this is a rather exceptional use case. In fact, the right approach for this experiment is to process the full dump of WorldCat database directly rather than sending a limited number of queries via the API. I read here: http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ that WorldCat might become available as open linked data in future, which would solve my problem and help similar text mining projects. However, I wonder if it is currently available to researchers under a research/non-commercial use license agreement. Regards, Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Coombs Sent: 17 May 2012 08:37 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I forwarded this thread to the Product Manager for the WorldCat Search API. She responded back that unfortunately this query is not possible using the API at this time. FYI, the SRU interface to WorldCat Search API doesn't currently support any scan type searches either. Is there a particular use case you're trying to support? Know that would help us document this as a possible enhancement. Karen Karen Coombs Senior Product Analyst Web Services OCLC coom...@oclc.org On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi Andy, I am a SRU newbie myself, so I don't know how this could be achieved using scan operations and could not find much info on SRU website (http://www.loc.gov/standards/sru/). As for the wildcards, according to this guide: http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea rchworldcatquickreference.pdf the symbols should be preceded by at least 3 characters, and therefore clauses like: ... AND srw.dd=* ... AND srw.dd=?.* ... AND srw/dd=###.* ... AND srw/dd=?3.* do not work and result in the following error: Diagnostics Identifier: info:srw/diagnostic/1/9 Meaning: Details: Message: Not enough chars in truncated term:Truncated words too short(9) Thanks, Arash From: Houghton,Andrew [mailto:hough...@oclc.org] Sent: 16 May 2012 11:58 To: Arash.Joorabchi Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I'm not an SRU guru, but is it possible to do a scan and look for a postings of zero? Andy. On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi mark, Srw.dd=* does not work either: Identifier: info:srw/diagnostic/1/27 Meaning: Details: srw.dd Message: The index [srw.dd] did not include a searchable value I suppose the only option left is to retrieve everything and filter the results on the client side. Thanks for your quick reply. Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mike Taylor Sent: 16 May 2012 10:43 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set There is no standard way in CQL to express field X is not empty. Depending on implementations, NOT srw.dd= might work (but evidently doesn't in this case). Another possibility is srw.dd=*, but again that may or may not work, and might be appallingly inefficient if it does. NOT srw.dd=null will definitely not work: null is not a special word in CQL. -- Mike. On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi all, I am sending SRU queries to the WorldCat in the following form: String host = http://worldcat.org/webservices/catalog/search/;; String query = sru?query=srw.kw=\ + keyword + \ + AND srw.ln exact \eng\ + AND srw.mt all \bks\ + AND srw.nt=\ + keyword + \ + servicelevel=full + maximumRecords=100 + sortKeys=relevance,,0 + wskey=[wskey]; And it is working fine, however I'd like to limit the results to those records that have a DDC number assigned to them, but I don't know what's the right way to specify this limit in the query. NOT
Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
On May 18, 2012, at 6:46 AM, Arash.Joorabchi wrote: Dear Karen, I am conducting a research experiment on automatic text classification and I am trying to retrieve top matching bib records (which include DDC fields) for a set of keyphrases extracted from a given document. So, I suppose this is a rather exceptional use case. In fact, the right approach for this experiment is to process the full dump of WorldCat database directly rather than sending a limited number of queries via the API. I read here: http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ that WorldCat might become available as open linked data in future, which would solve my problem and help similar text mining projects. However, I wonder if it is currently available to researchers under a research/non-commercial use license agreement. Why not use Open Library's dataset (which is freely available with no restrictions)? http://openlibrary.org/developers/dumps -Ross. Regards, Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen Coombs Sent: 17 May 2012 08:37 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I forwarded this thread to the Product Manager for the WorldCat Search API. She responded back that unfortunately this query is not possible using the API at this time. FYI, the SRU interface to WorldCat Search API doesn't currently support any scan type searches either. Is there a particular use case you're trying to support? Know that would help us document this as a possible enhancement. Karen Karen Coombs Senior Product Analyst Web Services OCLC coom...@oclc.org On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi Andy, I am a SRU newbie myself, so I don't know how this could be achieved using scan operations and could not find much info on SRU website (http://www.loc.gov/standards/sru/). As for the wildcards, according to this guide: http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea rchworldcatquickreference.pdf the symbols should be preceded by at least 3 characters, and therefore clauses like: ... AND srw.dd=* ... AND srw.dd=?.* ... AND srw/dd=###.* ... AND srw/dd=?3.* do not work and result in the following error: Diagnostics Identifier: info:srw/diagnostic/1/9 Meaning: Details: Message: Not enough chars in truncated term:Truncated words too short(9) Thanks, Arash From: Houghton,Andrew [mailto:hough...@oclc.org] Sent: 16 May 2012 11:58 To: Arash.Joorabchi Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I'm not an SRU guru, but is it possible to do a scan and look for a postings of zero? Andy. On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi mark, Srw.dd=* does not work either: Identifier: info:srw/diagnostic/1/27 Meaning: Details:srw.dd Message:The index [srw.dd] did not include a searchable value I suppose the only option left is to retrieve everything and filter the results on the client side. Thanks for your quick reply. Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mike Taylor Sent: 16 May 2012 10:43 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set There is no standard way in CQL to express field X is not empty. Depending on implementations, NOT srw.dd= might work (but evidently doesn't in this case). Another possibility is srw.dd=*, but again that may or may not work, and might be appallingly inefficient if it does. NOT srw.dd=null will definitely not work: null is not a special word in CQL. -- Mike. On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi all, I am sending SRU queries to the WorldCat in the following form: String host = http://worldcat.org/webservices/catalog/search/;; String query = sru?query=srw.kw=\ + keyword + \ + AND srw.ln exact \eng\ + AND srw.mt all \bks\ + AND srw.nt=\ + keyword + \ + servicelevel=full + maximumRecords=100 + sortKeys=relevance,,0
Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
I forwarded this thread to the Product Manager for the WorldCat Search API. She responded back that unfortunately this query is not possible using the API at this time. FYI, the SRU interface to WorldCat Search API doesn't currently support any scan type searches either. Is there a particular use case you're trying to support? Know that would help us document this as a possible enhancement. Karen Karen Coombs Senior Product Analyst Web Services OCLC coom...@oclc.org On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi Andy, I am a SRU newbie myself, so I don't know how this could be achieved using scan operations and could not find much info on SRU website (http://www.loc.gov/standards/sru/). As for the wildcards, according to this guide: http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea rchworldcatquickreference.pdf the symbols should be preceded by at least 3 characters, and therefore clauses like: ... AND srw.dd=* ... AND srw.dd=?.* ... AND srw/dd=###.* ... AND srw/dd=?3.* do not work and result in the following error: Diagnostics Identifier: info:srw/diagnostic/1/9 Meaning: Details: Message: Not enough chars in truncated term:Truncated words too short(9) Thanks, Arash From: Houghton,Andrew [mailto:hough...@oclc.org] Sent: 16 May 2012 11:58 To: Arash.Joorabchi Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I'm not an SRU guru, but is it possible to do a scan and look for a postings of zero? Andy. On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi mark, Srw.dd=* does not work either: Identifier: info:srw/diagnostic/1/27 Meaning: Details: srw.dd Message: The index [srw.dd] did not include a searchable value I suppose the only option left is to retrieve everything and filter the results on the client side. Thanks for your quick reply. Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mike Taylor Sent: 16 May 2012 10:43 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set There is no standard way in CQL to express field X is not empty. Depending on implementations, NOT srw.dd= might work (but evidently doesn't in this case). Another possibility is srw.dd=*, but again that may or may not work, and might be appallingly inefficient if it does. NOT srw.dd=null will definitely not work: null is not a special word in CQL. -- Mike. On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi all, I am sending SRU queries to the WorldCat in the following form: String host = http://worldcat.org/webservices/catalog/search/;; String query = sru?query=srw.kw=\ + keyword + \ + AND srw.ln exact \eng\ + AND srw.mt all \bks\ + AND srw.nt=\ + keyword + \ + servicelevel=full + maximumRecords=100 + sortKeys=relevance,,0 + wskey=[wskey]; And it is working fine, however I'd like to limit the results to those records that have a DDC number assigned to them, but I don't know what's the right way to specify this limit in the query. NOT srw.dd= NOT srw.dd=null Neither of above work Thanks, Arash No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.2176 / Virus Database: 2425/5001 - Release Date: 05/15/12
Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
There is no standard way in CQL to express field X is not empty. Depending on implementations, NOT srw.dd= might work (but evidently doesn't in this case). Another possibility is srw.dd=*, but again that may or may not work, and might be appallingly inefficient if it does. NOT srw.dd=null will definitely not work: null is not a special word in CQL. -- Mike. On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi all, I am sending SRU queries to the WorldCat in the following form: String host = http://worldcat.org/webservices/catalog/search/;; String query = sru?query=srw.kw=\ + keyword + \ + AND srw.ln exact \eng\ + AND srw.mt all \bks\ + AND srw.nt=\ + keyword + \ + servicelevel=full + maximumRecords=100 + sortKeys=relevance,,0 + wskey=[wskey]; And it is working fine, however I'd like to limit the results to those records that have a DDC number assigned to them, but I don't know what's the right way to specify this limit in the query. NOT srw.dd= NOT srw.dd=null Neither of above work Thanks, Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Chad Benjamin Nelson Sent: 15 May 2012 21:54 To: CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] Atlanta Digital Libraries meetup - May 23rd The first / next Atlanta Digital Libraries meetup is coming up soon: Wednesday, May 23rd 7pm Manuel's Tavernhttp://www.manuelstavern.com/location.php 602 N Highland Avenue Northeast Atlanta, GA 30307 North Avenue Room We have two scheduled talks, and are still looking others interested in presenting. It's informal, so even if it is just a short topic you want to get some feedback on, we'd love to hear it. So, come along if you are interested and in the area. Chad Chad Nelson Web Services Programmer University Library Georgia State University e: cnelso...@gsu.edu t: 404 413 2771 My Calendarhttp://bit.ly/qybPLJ - No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.2176 / Virus Database: 2425/5000 - Release Date: 05/15/12
Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
Hi mark, Srw.dd=* does not work either: Identifier: info:srw/diagnostic/1/27 Meaning: Details:srw.dd Message:The index [srw.dd] did not include a searchable value I suppose the only option left is to retrieve everything and filter the results on the client side. Thanks for your quick reply. Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mike Taylor Sent: 16 May 2012 10:43 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set There is no standard way in CQL to express field X is not empty. Depending on implementations, NOT srw.dd= might work (but evidently doesn't in this case). Another possibility is srw.dd=*, but again that may or may not work, and might be appallingly inefficient if it does. NOT srw.dd=null will definitely not work: null is not a special word in CQL. -- Mike. On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi all, I am sending SRU queries to the WorldCat in the following form: String host = http://worldcat.org/webservices/catalog/search/;; String query = sru?query=srw.kw=\ + keyword + \ + AND srw.ln exact \eng\ + AND srw.mt all \bks\ + AND srw.nt=\ + keyword + \ + servicelevel=full + maximumRecords=100 + sortKeys=relevance,,0 + wskey=[wskey]; And it is working fine, however I'd like to limit the results to those records that have a DDC number assigned to them, but I don't know what's the right way to specify this limit in the query. NOT srw.dd= NOT srw.dd=null Neither of above work Thanks, Arash
Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set
Hi Andy, I am a SRU newbie myself, so I don't know how this could be achieved using scan operations and could not find much info on SRU website (http://www.loc.gov/standards/sru/). As for the wildcards, according to this guide: http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea rchworldcatquickreference.pdf the symbols should be preceded by at least 3 characters, and therefore clauses like: ... AND srw.dd=* ... AND srw.dd=?.* ... AND srw/dd=###.* ... AND srw/dd=?3.* do not work and result in the following error: Diagnostics Identifier: info:srw/diagnostic/1/9 Meaning: Details: Message: Not enough chars in truncated term:Truncated words too short(9) Thanks, Arash From: Houghton,Andrew [mailto:hough...@oclc.org] Sent: 16 May 2012 11:58 To: Arash.Joorabchi Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set I'm not an SRU guru, but is it possible to do a scan and look for a postings of zero? Andy. On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi mark, Srw.dd=* does not work either: Identifier: info:srw/diagnostic/1/27 Meaning: Details:srw.dd Message:The index [srw.dd] did not include a searchable value I suppose the only option left is to retrieve everything and filter the results on the client side. Thanks for your quick reply. Arash -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mike Taylor Sent: 16 May 2012 10:43 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set There is no standard way in CQL to express field X is not empty. Depending on implementations, NOT srw.dd= might work (but evidently doesn't in this case). Another possibility is srw.dd=*, but again that may or may not work, and might be appallingly inefficient if it does. NOT srw.dd=null will definitely not work: null is not a special word in CQL. -- Mike. On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie wrote: Hi all, I am sending SRU queries to the WorldCat in the following form: String host = http://worldcat.org/webservices/catalog/search/;; String query = sru?query=srw.kw=\ + keyword + \ + AND srw.ln exact \eng\ + AND srw.mt all \bks\ + AND srw.nt=\ + keyword + \ + servicelevel=full + maximumRecords=100 + sortKeys=relevance,,0 + wskey=[wskey]; And it is working fine, however I'd like to limit the results to those records that have a DDC number assigned to them, but I don't know what's the right way to specify this limit in the query. NOT srw.dd= NOT srw.dd=null Neither of above work Thanks, Arash No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.2176 / Virus Database: 2425/5001 - Release Date: 05/15/12