Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-22 Thread Simon Spero
Arash - you might not want to use a straight dump of worldcat catalog
records- at least not without the associated holdings information.*

There are a lot of quasi-duplicate records that are  sufficiently broken
that the worldcat de-duplication algorithm refuses to merge them.  These
records will usually only be used by a handful of institutions;  the better
records will tend to have more associated holdings.  The holdings count
should be used to weight the strength of association between class numbers
and features.

Also, since classification/categorization is something that is usually
considered to be a property of works, rather than manifestations, one might
get better results by using Work sets for training.

I would suggest, er, contacting  Thom Hickey.

Simon

* Well, not precisely holdings - you just need the number of distinct
institutions with at least one copy.  I call them 'hasings'.

On Sat, May 19, 2012 at 8:42 PM, Roy Tennant roytenn...@gmail.com wrote:

 Arash,
 Yes, we have made WorldCat available to researchers under a special
 license agreement. I suggest contacting Thom Hickeyhic...@oclc.org
 about such an arrangement. Thanks,
 Roy

 On Fri, May 18, 2012 at 3:46 AM, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:
  Dear Karen,
 
  I am conducting a research experiment on automatic text classification
 and I am trying to retrieve top matching bib records (which include DDC
 fields) for a set of keyphrases extracted from a given document. So, I
 suppose this is a rather exceptional use case. In fact, the right approach
 for this experiment is to process the full dump of WorldCat database
 directly rather than sending a limited number of queries via the API.
 
  I read here:
  http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/
  that WorldCat might become available as open linked data in future,
 which would solve my problem and help similar text mining projects.
 However, I wonder if it is currently available to researchers under a
 research/non-commercial use license agreement.
 
  Regards,
  Arash
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Karen Coombs
  Sent: 17 May 2012 08:37
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
 without a DDC no from the result set
 
  I forwarded this thread to the Product Manager for the WorldCat Search
  API. She responded back that unfortunately this query is not possible
  using the API at this time.
 
  FYI, the SRU interface to WorldCat Search API doesn't currently
  support any scan type searches either.
 
  Is there a particular use case you're trying to support? Know that
  would help us document this as a possible enhancement.
 
  Karen
 
  Karen Coombs
  Senior Product Analyst
  Web Services
  OCLC
  coom...@oclc.org
 
  On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:
  Hi Andy,
 
 
 
  I am a SRU newbie myself, so I don't know how this could be achieved
  using scan operations and could not find much info on SRU website
  (http://www.loc.gov/standards/sru/).
 
  As for the wildcards, according to this guide:
 
 http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
  rchworldcatquickreference.pdf the symbols should be preceded by at least
  3 characters, and therefore clauses like:
 
 
 
  ... AND srw.dd=*
 
  ... AND srw.dd=?.*
 
  ... AND srw/dd=###.*
 
  ... AND srw/dd=?3.*
 
 
 
 
 
  do not work and result in the following error:
 
  Diagnostics
 
  Identifier:
 
  info:srw/diagnostic/1/9
 
  Meaning:
 
 
 
  Details:
 
 
 
  Message:
 
  Not enough chars in truncated term:Truncated words too short(9)
 
 
 
 
 
  Thanks,
 
  Arash
 
 
 
  
 
  From: Houghton,Andrew [mailto:hough...@oclc.org]
  Sent: 16 May 2012 11:58
  To: Arash.Joorabchi
  Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
  without a DDC no from the result set
 
 
 
  I'm not an SRU guru, but is it possible to do a scan and look for a
  postings of zero?
 
 
 
  Andy.
 
  On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie
  wrote:
 
 Hi mark,
 
 Srw.dd=* does not work either:
 
 Identifier: info:srw/diagnostic/1/27
 Meaning:
 Details:srw.dd
 Message:The index [srw.dd] did not include a searchable
  value
 
 I suppose the only option left is to retrieve everything and
  filter the results on the client side.
 
 Thanks for your quick reply.
 Arash
 
 
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
  Behalf Of Mike Taylor
 Sent: 16 May 2012 10:43
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
  records without a DDC no from the result set
 
 There is no standard way in CQL to express field X is not
  

Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-22 Thread Arash.Joorabchi
Thank you Roy and Simon for the info.

As for your second point, I suppose one advantage of using the WorldCat
API at this experimental stage is that the returned bib records are
already FRBR-ized.

Ross - Thanks for the link of Open Library data dump. WorldCat
collection is 2 orders of magnitude larger than open library which makes
a significant difference considering the skewness and sparsity of bib
records classified according to library taxonomies, e.g., DDC, LCC (for
more info, see:
http://cdm15003.contentdm.oclc.org/cdm/singleitem/collection/p267701coll
27/id/277/rec/28)


Thanks,
Arash


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Simon Spero
Sent: 22 May 2012 19:47
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
without a DDC no from the result set

Arash - you might not want to use a straight dump of worldcat catalog
records- at least not without the associated holdings information.*

There are a lot of quasi-duplicate records that are  sufficiently broken
that the worldcat de-duplication algorithm refuses to merge them.  These
records will usually only be used by a handful of institutions;  the
better
records will tend to have more associated holdings.  The holdings count
should be used to weight the strength of association between class
numbers
and features.

Also, since classification/categorization is something that is usually
considered to be a property of works, rather than manifestations, one
might
get better results by using Work sets for training.

I would suggest, er, contacting  Thom Hickey.

Simon

* Well, not precisely holdings - you just need the number of distinct
institutions with at least one copy.  I call them 'hasings'.

On Sat, May 19, 2012 at 8:42 PM, Roy Tennant roytenn...@gmail.com
wrote:

 Arash,
 Yes, we have made WorldCat available to researchers under a special
 license agreement. I suggest contacting Thom Hickeyhic...@oclc.org
 about such an arrangement. Thanks,
 Roy

 On Fri, May 18, 2012 at 3:46 AM, Arash.Joorabchi
arash.joorab...@ul.ie
 wrote:
  Dear Karen,
 
  I am conducting a research experiment on automatic text
classification
 and I am trying to retrieve top matching bib records (which include
DDC
 fields) for a set of keyphrases extracted from a given document. So, I
 suppose this is a rather exceptional use case. In fact, the right
approach
 for this experiment is to process the full dump of WorldCat database
 directly rather than sending a limited number of queries via the API.
 
  I read here:
  http://dltj.org/article/worldcat-lld-may-become-available
under-odc-by/
  that WorldCat might become available as open linked data in future,
 which would solve my problem and help similar text mining projects.
 However, I wonder if it is currently available to researchers under a
 research/non-commercial use license agreement.
 
  Regards,
  Arash
 
  -Original Message-
  From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
 Karen Coombs
  Sent: 17 May 2012 08:37
  To: CODE4LIB@LISTSERV.ND.EDU
  Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
records
 without a DDC no from the result set
 
  I forwarded this thread to the Product Manager for the WorldCat
Search
  API. She responded back that unfortunately this query is not
possible
  using the API at this time.
 
  FYI, the SRU interface to WorldCat Search API doesn't currently
  support any scan type searches either.
 
  Is there a particular use case you're trying to support? Know that
  would help us document this as a possible enhancement.
 
  Karen
 
  Karen Coombs
  Senior Product Analyst
  Web Services
  OCLC
  coom...@oclc.org
 
  On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi
arash.joorab...@ul.ie
 wrote:
  Hi Andy,
 
 
 
  I am a SRU newbie myself, so I don't know how this could be
achieved
  using scan operations and could not find much info on SRU website
  (http://www.loc.gov/standards/sru/).
 
  As for the wildcards, according to this guide:
 

http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
  rchworldcatquickreference.pdf the symbols should be preceded by at
least
  3 characters, and therefore clauses like:
 
 
 
  ... AND srw.dd=*
 
  ... AND srw.dd=?.*
 
  ... AND srw/dd=###.*
 
  ... AND srw/dd=?3.*
 
 
 
 
 
  do not work and result in the following error:
 
  Diagnostics
 
  Identifier:
 
  info:srw/diagnostic/1/9
 
  Meaning:
 
 
 
  Details:
 
 
 
  Message:
 
  Not enough chars in truncated term:Truncated words too short(9)
 
 
 
 
 
  Thanks,
 
  Arash
 
 
 
  
 
  From: Houghton,Andrew [mailto:hough...@oclc.org]
  Sent: 16 May 2012 11:58
  To: Arash.Joorabchi
  Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
records
  without a DDC no from the result set
 
 
 
  I'm not an SRU guru, but is it possible to do a scan and look for a
  postings of zero?
 
 
 
  Andy.
 
  On May 

Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-19 Thread Roy Tennant
Arash,
Yes, we have made WorldCat available to researchers under a special
license agreement. I suggest contacting Thom Hickeyhic...@oclc.org
about such an arrangement. Thanks,
Roy

On Fri, May 18, 2012 at 3:46 AM, Arash.Joorabchi arash.joorab...@ul.ie wrote:
 Dear Karen,

 I am conducting a research experiment on automatic text classification and I 
 am trying to retrieve top matching bib records (which include DDC fields) for 
 a set of keyphrases extracted from a given document. So, I suppose this is a 
 rather exceptional use case. In fact, the right approach for this experiment 
 is to process the full dump of WorldCat database directly rather than sending 
 a limited number of queries via the API.

 I read here:
 http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/
 that WorldCat might become available as open linked data in future, which 
 would solve my problem and help similar text mining projects. However, I 
 wonder if it is currently available to researchers under a 
 research/non-commercial use license agreement.

 Regards,
 Arash

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
 Coombs
 Sent: 17 May 2012 08:37
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without 
 a DDC no from the result set

 I forwarded this thread to the Product Manager for the WorldCat Search
 API. She responded back that unfortunately this query is not possible
 using the API at this time.

 FYI, the SRU interface to WorldCat Search API doesn't currently
 support any scan type searches either.

 Is there a particular use case you're trying to support? Know that
 would help us document this as a possible enhancement.

 Karen

 Karen Coombs
 Senior Product Analyst
 Web Services
 OCLC
 coom...@oclc.org

 On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie 
 wrote:
 Hi Andy,



 I am a SRU newbie myself, so I don't know how this could be achieved
 using scan operations and could not find much info on SRU website
 (http://www.loc.gov/standards/sru/).

 As for the wildcards, according to this guide:
 http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
 rchworldcatquickreference.pdf the symbols should be preceded by at least
 3 characters, and therefore clauses like:



 ... AND srw.dd=*

 ... AND srw.dd=?.*

 ... AND srw/dd=###.*

 ... AND srw/dd=?3.*





 do not work and result in the following error:

 Diagnostics

 Identifier:

 info:srw/diagnostic/1/9

 Meaning:



 Details:



 Message:

 Not enough chars in truncated term:Truncated words too short(9)





 Thanks,

 Arash



 

 From: Houghton,Andrew [mailto:hough...@oclc.org]
 Sent: 16 May 2012 11:58
 To: Arash.Joorabchi
 Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
 without a DDC no from the result set



 I'm not an SRU guru, but is it possible to do a scan and look for a
 postings of zero?



 Andy.

 On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:

        Hi mark,

        Srw.dd=* does not work either:

        Identifier:     info:srw/diagnostic/1/27
        Meaning:
        Details:        srw.dd
        Message:        The index [srw.dd] did not include a searchable
 value

        I suppose the only option left is to retrieve everything and
 filter the results on the client side.

        Thanks for your quick reply.
        Arash


        -Original Message-
        From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
 Behalf Of Mike Taylor
        Sent: 16 May 2012 10:43
        To: CODE4LIB@LISTSERV.ND.EDU
        Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
 records without a DDC no from the result set

        There is no standard way in CQL to express field X is not
 empty.
        Depending on implementations, NOT srw.dd= might work (but
 evidently
        doesn't in this case).  Another possibility is srw.dd=*, but
 again
        that may or may not work, and might be appallingly inefficient
 if it
        does.  NOT srw.dd=null will definitely not work: null is not a
        special word in CQL.

        -- Mike.


        On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:
          Hi all,
        
         I am sending SRU queries to the WorldCat in the following
 form:
        
        
                        String host =
         http://worldcat.org/webservices/catalog/search/;;
                    String query = sru?query=srw.kw=\ + keyword +
 \
                                        +  AND srw.ln exact \eng\
                                        +  AND srw.mt all \bks\
                                        +  AND srw.nt=\ + keyword +
 \
                                        + servicelevel=full
                                        + maximumRecords=100
                                      + sortKeys=relevance,,0
                   

Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-18 Thread Arash.Joorabchi
Dear Karen,

I am conducting a research experiment on automatic text classification and I am 
trying to retrieve top matching bib records (which include DDC fields) for a 
set of keyphrases extracted from a given document. So, I suppose this is a 
rather exceptional use case. In fact, the right approach for this experiment is 
to process the full dump of WorldCat database directly rather than sending a 
limited number of queries via the API.

I read here: 
http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ 
that WorldCat might become available as open linked data in future, which would 
solve my problem and help similar text mining projects. However, I wonder if it 
is currently available to researchers under a research/non-commercial use 
license agreement.

Regards,
Arash

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
Coombs
Sent: 17 May 2012 08:37
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a 
DDC no from the result set

I forwarded this thread to the Product Manager for the WorldCat Search
API. She responded back that unfortunately this query is not possible
using the API at this time.

FYI, the SRU interface to WorldCat Search API doesn't currently
support any scan type searches either.

Is there a particular use case you're trying to support? Know that
would help us document this as a possible enhancement.

Karen

Karen Coombs
Senior Product Analyst
Web Services
OCLC
coom...@oclc.org

On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie wrote:
 Hi Andy,



 I am a SRU newbie myself, so I don't know how this could be achieved
 using scan operations and could not find much info on SRU website
 (http://www.loc.gov/standards/sru/).

 As for the wildcards, according to this guide:
 http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
 rchworldcatquickreference.pdf the symbols should be preceded by at least
 3 characters, and therefore clauses like:



 ... AND srw.dd=*

 ... AND srw.dd=?.*

 ... AND srw/dd=###.*

 ... AND srw/dd=?3.*





 do not work and result in the following error:

 Diagnostics

 Identifier:

 info:srw/diagnostic/1/9

 Meaning:



 Details:



 Message:

 Not enough chars in truncated term:Truncated words too short(9)





 Thanks,

 Arash



 

 From: Houghton,Andrew [mailto:hough...@oclc.org]
 Sent: 16 May 2012 11:58
 To: Arash.Joorabchi
 Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
 without a DDC no from the result set



 I'm not an SRU guru, but is it possible to do a scan and look for a
 postings of zero?



 Andy.

 On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:

        Hi mark,

        Srw.dd=* does not work either:

        Identifier:     info:srw/diagnostic/1/27
        Meaning:
        Details:        srw.dd
        Message:        The index [srw.dd] did not include a searchable
 value

        I suppose the only option left is to retrieve everything and
 filter the results on the client side.

        Thanks for your quick reply.
        Arash


        -Original Message-
        From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
 Behalf Of Mike Taylor
        Sent: 16 May 2012 10:43
        To: CODE4LIB@LISTSERV.ND.EDU
        Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
 records without a DDC no from the result set

        There is no standard way in CQL to express field X is not
 empty.
        Depending on implementations, NOT srw.dd= might work (but
 evidently
        doesn't in this case).  Another possibility is srw.dd=*, but
 again
        that may or may not work, and might be appallingly inefficient
 if it
        does.  NOT srw.dd=null will definitely not work: null is not a
        special word in CQL.

        -- Mike.


        On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:
          Hi all,
        
         I am sending SRU queries to the WorldCat in the following
 form:
        
        
                        String host =
         http://worldcat.org/webservices/catalog/search/;;
                    String query = sru?query=srw.kw=\ + keyword +
 \
                                        +  AND srw.ln exact \eng\
                                        +  AND srw.mt all \bks\
                                        +  AND srw.nt=\ + keyword +
 \
                                        + servicelevel=full
                                        + maximumRecords=100
                                      + sortKeys=relevance,,0
                                        + wskey=[wskey];
        
         And it is working fine, however I'd like to limit the results
 to those
         records that have a DDC number assigned to them, but I don't
 know what's
         the right way to specify this limit in the query.
        
          NOT 

Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-18 Thread Ross Singer
On May 18, 2012, at 6:46 AM, Arash.Joorabchi wrote:

 Dear Karen,
 
 I am conducting a research experiment on automatic text classification and I 
 am trying to retrieve top matching bib records (which include DDC fields) for 
 a set of keyphrases extracted from a given document. So, I suppose this is a 
 rather exceptional use case. In fact, the right approach for this experiment 
 is to process the full dump of WorldCat database directly rather than sending 
 a limited number of queries via the API.
 
 I read here: 
 http://dltj.org/article/worldcat-lld-may-become-available under-odc-by/ 
 that WorldCat might become available as open linked data in future, which 
 would solve my problem and help similar text mining projects. However, I 
 wonder if it is currently available to researchers under a 
 research/non-commercial use license agreement.

Why not use Open Library's dataset (which is freely available with no 
restrictions)?

http://openlibrary.org/developers/dumps

-Ross.

 
 Regards,
 Arash
 
 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Karen 
 Coombs
 Sent: 17 May 2012 08:37
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without 
 a DDC no from the result set
 
 I forwarded this thread to the Product Manager for the WorldCat Search
 API. She responded back that unfortunately this query is not possible
 using the API at this time.
 
 FYI, the SRU interface to WorldCat Search API doesn't currently
 support any scan type searches either.
 
 Is there a particular use case you're trying to support? Know that
 would help us document this as a possible enhancement.
 
 Karen
 
 Karen Coombs
 Senior Product Analyst
 Web Services
 OCLC
 coom...@oclc.org
 
 On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie 
 wrote:
 Hi Andy,
 
 
 
 I am a SRU newbie myself, so I don't know how this could be achieved
 using scan operations and could not find much info on SRU website
 (http://www.loc.gov/standards/sru/).
 
 As for the wildcards, according to this guide:
 http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
 rchworldcatquickreference.pdf the symbols should be preceded by at least
 3 characters, and therefore clauses like:
 
 
 
 ... AND srw.dd=*
 
 ... AND srw.dd=?.*
 
 ... AND srw/dd=###.*
 
 ... AND srw/dd=?3.*
 
 
 
 
 
 do not work and result in the following error:
 
 Diagnostics
 
 Identifier:
 
 info:srw/diagnostic/1/9
 
 Meaning:
 
 
 
 Details:
 
 
 
 Message:
 
 Not enough chars in truncated term:Truncated words too short(9)
 
 
 
 
 
 Thanks,
 
 Arash
 
 
 
 
 
 From: Houghton,Andrew [mailto:hough...@oclc.org]
 Sent: 16 May 2012 11:58
 To: Arash.Joorabchi
 Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
 without a DDC no from the result set
 
 
 
 I'm not an SRU guru, but is it possible to do a scan and look for a
 postings of zero?
 
 
 
 Andy.
 
 On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:
 
Hi mark,
 
Srw.dd=* does not work either:
 
Identifier: info:srw/diagnostic/1/27
Meaning:
Details:srw.dd
Message:The index [srw.dd] did not include a searchable
 value
 
I suppose the only option left is to retrieve everything and
 filter the results on the client side.
 
Thanks for your quick reply.
Arash
 
 
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
 Behalf Of Mike Taylor
Sent: 16 May 2012 10:43
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
 records without a DDC no from the result set
 
There is no standard way in CQL to express field X is not
 empty.
Depending on implementations, NOT srw.dd= might work (but
 evidently
doesn't in this case).  Another possibility is srw.dd=*, but
 again
that may or may not work, and might be appallingly inefficient
 if it
does.  NOT srw.dd=null will definitely not work: null is not a
special word in CQL.
 
-- Mike.
 
 
On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:
  Hi all,

 I am sending SRU queries to the WorldCat in the following
 form:


String host =
 http://worldcat.org/webservices/catalog/search/;;
String query = sru?query=srw.kw=\ + keyword +
 \
+  AND srw.ln exact \eng\
+  AND srw.mt all \bks\
+  AND srw.nt=\ + keyword +
 \
+ servicelevel=full
+ maximumRecords=100
  + sortKeys=relevance,,0
   

Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-17 Thread Karen Coombs
I forwarded this thread to the Product Manager for the WorldCat Search
API. She responded back that unfortunately this query is not possible
using the API at this time.

FYI, the SRU interface to WorldCat Search API doesn't currently
support any scan type searches either.

Is there a particular use case you're trying to support? Know that
would help us document this as a possible enhancement.

Karen

Karen Coombs
Senior Product Analyst
Web Services
OCLC
coom...@oclc.org

On Wed, May 16, 2012 at 9:49 PM, Arash.Joorabchi arash.joorab...@ul.ie wrote:
 Hi Andy,



 I am a SRU newbie myself, so I don't know how this could be achieved
 using scan operations and could not find much info on SRU website
 (http://www.loc.gov/standards/sru/).

 As for the wildcards, according to this guide:
 http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
 rchworldcatquickreference.pdf the symbols should be preceded by at least
 3 characters, and therefore clauses like:



 ... AND srw.dd=*

 ... AND srw.dd=?.*

 ... AND srw/dd=###.*

 ... AND srw/dd=?3.*





 do not work and result in the following error:

 Diagnostics

 Identifier:

 info:srw/diagnostic/1/9

 Meaning:



 Details:



 Message:

 Not enough chars in truncated term:Truncated words too short(9)





 Thanks,

 Arash



 

 From: Houghton,Andrew [mailto:hough...@oclc.org]
 Sent: 16 May 2012 11:58
 To: Arash.Joorabchi
 Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
 without a DDC no from the result set



 I'm not an SRU guru, but is it possible to do a scan and look for a
 postings of zero?



 Andy.

 On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:

        Hi mark,

        Srw.dd=* does not work either:

        Identifier:     info:srw/diagnostic/1/27
        Meaning:
        Details:        srw.dd
        Message:        The index [srw.dd] did not include a searchable
 value

        I suppose the only option left is to retrieve everything and
 filter the results on the client side.

        Thanks for your quick reply.
        Arash


        -Original Message-
        From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
 Behalf Of Mike Taylor
        Sent: 16 May 2012 10:43
        To: CODE4LIB@LISTSERV.ND.EDU
        Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
 records without a DDC no from the result set

        There is no standard way in CQL to express field X is not
 empty.
        Depending on implementations, NOT srw.dd= might work (but
 evidently
        doesn't in this case).  Another possibility is srw.dd=*, but
 again
        that may or may not work, and might be appallingly inefficient
 if it
        does.  NOT srw.dd=null will definitely not work: null is not a
        special word in CQL.

        -- Mike.


        On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie
 wrote:
          Hi all,
        
         I am sending SRU queries to the WorldCat in the following
 form:
        
        
                        String host =
         http://worldcat.org/webservices/catalog/search/;;
                    String query = sru?query=srw.kw=\ + keyword +
 \
                                        +  AND srw.ln exact \eng\
                                        +  AND srw.mt all \bks\
                                        +  AND srw.nt=\ + keyword +
 \
                                        + servicelevel=full
                                        + maximumRecords=100
                                      + sortKeys=relevance,,0
                                        + wskey=[wskey];
        
         And it is working fine, however I'd like to limit the results
 to those
         records that have a DDC number assigned to them, but I don't
 know what's
         the right way to specify this limit in the query.
        
          NOT srw.dd=
          NOT srw.dd=null
        
         Neither of above work
        
        
         Thanks,
         Arash
        

 

 No virus found in this message.
 Checked by AVG - www.avg.com
 Version: 2012.0.2176 / Virus Database: 2425/5001 - Release Date:
 05/15/12


Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-16 Thread Mike Taylor
There is no standard way in CQL to express field X is not empty.
Depending on implementations, NOT srw.dd= might work (but evidently
doesn't in this case).  Another possibility is srw.dd=*, but again
that may or may not work, and might be appallingly inefficient if it
does.  NOT srw.dd=null will definitely not work: null is not a
special word in CQL.

-- Mike.


On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie wrote:
  Hi all,

 I am sending SRU queries to the WorldCat in the following form:


                String host =
 http://worldcat.org/webservices/catalog/search/;;
            String query = sru?query=srw.kw=\ + keyword + \
                                +  AND srw.ln exact \eng\
                                +  AND srw.mt all \bks\
                                +  AND srw.nt=\ + keyword + \
                                + servicelevel=full
                                + maximumRecords=100
                              + sortKeys=relevance,,0
                                + wskey=[wskey];

 And it is working fine, however I'd like to limit the results to those
 records that have a DDC number assigned to them, but I don't know what's
 the right way to specify this limit in the query.

  NOT srw.dd=
  NOT srw.dd=null

 Neither of above work


 Thanks,
 Arash

 -Original Message-
 From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
 Chad Benjamin Nelson
 Sent: 15 May 2012 21:54
 To: CODE4LIB@LISTSERV.ND.EDU
 Subject: [CODE4LIB] Atlanta Digital Libraries meetup - May 23rd

 The first / next Atlanta Digital Libraries meetup is coming up soon:

 Wednesday, May 23rd 7pm
 Manuel's Tavernhttp://www.manuelstavern.com/location.php
 602 N Highland Avenue Northeast
 Atlanta, GA 30307
 North Avenue Room

 We have two scheduled talks, and are still looking others interested in
 presenting. It's informal, so even if it is just a short topic you want
 to get some feedback on, we'd love to hear it.

 So, come along if you are interested and in the area.


 Chad


 Chad Nelson
 Web Services Programmer
 University Library
 Georgia State University

 e: cnelso...@gsu.edu
 t: 404 413 2771
 My Calendarhttp://bit.ly/qybPLJ

 -
 No virus found in this message.
 Checked by AVG - www.avg.com
 Version: 2012.0.2176 / Virus Database: 2425/5000 - Release Date:
 05/15/12



Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-16 Thread Arash.Joorabchi
Hi mark,

Srw.dd=* does not work either:

Identifier: info:srw/diagnostic/1/27
Meaning:
Details:srw.dd
Message:The index [srw.dd] did not include a searchable value

I suppose the only option left is to retrieve everything and filter the results 
on the client side.

Thanks for your quick reply.
Arash 


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mike 
Taylor
Sent: 16 May 2012 10:43
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a 
DDC no from the result set

There is no standard way in CQL to express field X is not empty.
Depending on implementations, NOT srw.dd= might work (but evidently
doesn't in this case).  Another possibility is srw.dd=*, but again
that may or may not work, and might be appallingly inefficient if it
does.  NOT srw.dd=null will definitely not work: null is not a
special word in CQL.

-- Mike.


On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie wrote:
  Hi all,

 I am sending SRU queries to the WorldCat in the following form:


                String host =
 http://worldcat.org/webservices/catalog/search/;;
            String query = sru?query=srw.kw=\ + keyword + \
                                +  AND srw.ln exact \eng\
                                +  AND srw.mt all \bks\
                                +  AND srw.nt=\ + keyword + \
                                + servicelevel=full
                                + maximumRecords=100
                              + sortKeys=relevance,,0
                                + wskey=[wskey];

 And it is working fine, however I'd like to limit the results to those
 records that have a DDC number assigned to them, but I don't know what's
 the right way to specify this limit in the query.

  NOT srw.dd=
  NOT srw.dd=null

 Neither of above work


 Thanks,
 Arash



Re: [CODE4LIB] WorldCat SRU queries - elimination of records without a DDC no from the result set

2012-05-16 Thread Arash.Joorabchi
Hi Andy,

 

I am a SRU newbie myself, so I don't know how this could be achieved
using scan operations and could not find much info on SRU website
(http://www.loc.gov/standards/sru/).

As for the wildcards, according to this guide:
http://www.oclc.org/support/documentation/worldcat/searching/refcard/sea
rchworldcatquickreference.pdf the symbols should be preceded by at least
3 characters, and therefore clauses like: 

 

... AND srw.dd=*

... AND srw.dd=?.*

... AND srw/dd=###.*

... AND srw/dd=?3.*

 

 

do not work and result in the following error:

Diagnostics

Identifier:

info:srw/diagnostic/1/9

Meaning:

 

Details:

 

Message:

Not enough chars in truncated term:Truncated words too short(9)

 

 

Thanks,

Arash

 



From: Houghton,Andrew [mailto:hough...@oclc.org] 
Sent: 16 May 2012 11:58
To: Arash.Joorabchi
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of records
without a DDC no from the result set

 

I'm not an SRU guru, but is it possible to do a scan and look for a
postings of zero?

 

Andy.

On May 16, 2012, at 6:39, Arash.Joorabchi arash.joorab...@ul.ie
wrote:

Hi mark,

Srw.dd=* does not work either:

Identifier: info:srw/diagnostic/1/27
Meaning:   
Details:srw.dd
Message:The index [srw.dd] did not include a searchable
value

I suppose the only option left is to retrieve everything and
filter the results on the client side.

Thanks for your quick reply.
Arash


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On
Behalf Of Mike Taylor
Sent: 16 May 2012 10:43
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] WorldCat SRU queries - elimination of
records without a DDC no from the result set

There is no standard way in CQL to express field X is not
empty.
Depending on implementations, NOT srw.dd= might work (but
evidently
doesn't in this case).  Another possibility is srw.dd=*, but
again
that may or may not work, and might be appallingly inefficient
if it
does.  NOT srw.dd=null will definitely not work: null is not a
special word in CQL.

-- Mike.


On 16 May 2012 10:32, Arash.Joorabchi arash.joorab...@ul.ie
wrote:
  Hi all,

 I am sending SRU queries to the WorldCat in the following
form:


String host =
 http://worldcat.org/webservices/catalog/search/;;
String query = sru?query=srw.kw=\ + keyword +
\
+  AND srw.ln exact \eng\
+  AND srw.mt all \bks\
+  AND srw.nt=\ + keyword +
\
+ servicelevel=full
+ maximumRecords=100
  + sortKeys=relevance,,0
+ wskey=[wskey];

 And it is working fine, however I'd like to limit the results
to those
 records that have a DDC number assigned to them, but I don't
know what's
 the right way to specify this limit in the query.

  NOT srw.dd=
  NOT srw.dd=null

 Neither of above work


 Thanks,
 Arash




No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.2176 / Virus Database: 2425/5001 - Release Date:
05/15/12