Re: [OPEN-ILS-GENERAL] Evergreen access via Google?
On 2015-04-09 1005, Ben Shum wrote: That all said, I suppose one potential danger of having bots freely scan over your site is that if they get too busy with indexing your site's contents, they can overwhelm and cause interruptions in your ability to use Evergreen. This happened to us at least once before, where some indexer in China scanned our whole catalog and tried to index every page causing us to run out of system resources trying to serve up all the content it was requesting. Disclaimer: I'm a lurker who is very interested in and hopeful for Evergreen's success. Our site currently uses a proprietary ILS. I don't know whether an Evergreen site has yet done this, but FWIW, another approach is to sign up for OCLC's WorldCat Local (to be rebranded WorldCat Discovery) service. Among the (apparently less-known) features included is web-scale discovery. OCLC makes its records discoverable via Google and other search services. These in turn -- via WCL -- are linked in real-time to availability information in subscribers' ILSes. cheers, - mt -- * Marc Truitt University Librarianvoice : 506-364-2567 Mount Allison Universitye-mail : mtru...@mta.ca Libraries and Archives fax: 506-364-2617 49 York Street cell : 506-232-0503 Sackville, NB E4L 1C6 We wanted flying cars, instead we got 140 characters. -- Peter Thiel Wearing the sensible shoes proudly since 1978!
Re: [OPEN-ILS-GENERAL] Evergreen access via Google?
Welcome to the list Marc! I don't know whether an Evergreen site has yet done this, but FWIW, another approach is to sign up for OCLC's WorldCat Local (to be rebranded WorldCat Discovery) service. Among the (apparently less-known) features included is web-scale discovery. OCLC makes its records discoverable via Google and other search services. These in turn -- via WCL -- are linked in real-time to availability information in subscribers' ILSes. That might be true, but I'm perplexed as to why a library would pay for this discoverability through WorldCat Local when they already get it out of the box with Evergreen? Of course, there might be other reasons for signing up for WorldCat Local, but I think the work Dan has done has put Evergreen, Koha, and VuFind ahead of the pack in this area. Kathy On 04/09/2015 09:15 AM, Marc Truitt wrote: On 2015-04-09 1005, Ben Shum wrote: That all said, I suppose one potential danger of having bots freely scan over your site is that if they get too busy with indexing your site's contents, they can overwhelm and cause interruptions in your ability to use Evergreen. This happened to us at least once before, where some indexer in China scanned our whole catalog and tried to index every page causing us to run out of system resources trying to serve up all the content it was requesting. Disclaimer: I'm a lurker who is very interested in and hopeful for Evergreen's success. Our site currently uses a proprietary ILS. I don't know whether an Evergreen site has yet done this, but FWIW, another approach is to sign up for OCLC's WorldCat Local (to be rebranded WorldCat Discovery) service. Among the (apparently less-known) features included is web-scale discovery. OCLC makes its records discoverable via Google and other search services. These in turn -- via WCL -- are linked in real-time to availability information in subscribers' ILSes. cheers, - mt -- Kathy Lussier Project Coordinator Massachusetts Library Network Cooperative (508) 343-0128 kluss...@masslnc.org Twitter: http://www.twitter.com/kmlussier
Re: [OPEN-ILS-GENERAL] Evergreen access via Google?
I can't remember the year off hand but Dan did a presentation on schema.org (and related stuff -- highly technical term) at one of the Evergreen conferences. Vancouver I think? There might be video somewhere. On Thu, Apr 9, 2015 at 8:52 AM, Ben Shum bs...@biblio.org wrote: Hi Don, Starting as recently as Evergreen 2.6 (it's noted on the Evergreen 2.6 release notes under structured data - http://evergreen-ils.org/documentation/release/RELEASE_NOTES_2_6.html), efforts were made by developers like Dan Scott to add structured data elements to Evergreen's catalog to make them more discoverable. This work has continued throughout newer Evergreen releases and I'd like to say that through Dan's work and others, it has been essential towards keeping Evergreen's catalog more friendly to search engines, like Google, etc. Evergreen 2.8's release notes include lots more discoverability enhancements added with that release too: http://evergreen-ils.org/documentation/release/RELEASE_NOTES_2_8.html#_opac Since your site does not include a manually configured robots.txt file, I'll point you at an example set at Dan's library Laurentian University's catalog: https://laurentian.concat.ca/robots.txt (we based many of our changes following the example they set). That robots.txt file tends to guide search engine bots that arrive at the catalog towards indexing the appropriate contents, and avoid/skip over certain undesirables. By default, if you do not have anything set, then search engine bots will likely attempt to index everything in your catalog that it can publicly access. Doing an example search like https://www.google.com/#q=asbury+catalog+Star+Trek (aka, keywords in Google for asbury catalog Star Trek I can already see a couple results that come from your Evergreen catalog records. So at least Google's search engine bots are already working to grab your catalog's contents. That all said, I suppose one potential danger of having bots freely scan over your site is that if they get too busy with indexing your site's contents, they can overwhelm and cause interruptions in your ability to use Evergreen. This happened to us at least once before, where some indexer in China scanned our whole catalog and tried to index every page causing us to run out of system resources trying to serve up all the content it was requesting. For myself and Bibliomation's catalog, I've been experimenting with modifying our robots.txt file and continually upgrading our Evergreen catalog to reflect the latest enhancements for structured data to try making the most use out of what's possible in Evergreen. Proceeding forward, I've also done some small experiments in creating Google Custom Search Engines to search against our indexed online catalog (and requesting scheduled indexing from Google's bots) as an alternative means of discovering the content contained in our systems. Moving forward, I expect this to continue to be an exciting area to explore the ways of improving discoverability of Evergreen's content. -- Ben On Thu, Apr 9, 2015 at 8:15 AM, Donald Butterworth don.butterwo...@asburyseminary.edu wrote: Hi everyone, I was asked to toss these questions out and get some perspectives. What would it take to make the Evergreen catalog holdings available to generic search engines like Google, Bing, Yahoo and DuckDuckGo? Even if it is doable, is it a good idea? The motivation behind these questions is a perception that the first attempt many students make to do research is through a general web search. Anybody have a comment? Don -- Don Butterworth Faculty Associate / Librarian III B.L. Fisher Library Asbury Theological Seminary don.butterwo...@asburyseminary.edu (859) 858-2227 -- Benjamin Shum Evergreen Systems Manager Bibliomation, Inc. 24 Wooster Ave. Waterbury, CT 06708 203-577-4070, ext. 113 -- Rogan Hamby, MLS, CCNP, MIA Managers Headquarters Library and Reference Services, York County Library System “You can never get a cup of tea large enough or a book long enough to suit me.” ― C.S. Lewis http://www.goodreads.com/author/show/1069006.C_S_Lewis
Re: [OPEN-ILS-GENERAL] Evergreen access via Google?
Hi Don, Starting as recently as Evergreen 2.6 (it's noted on the Evergreen 2.6 release notes under structured data - http://evergreen-ils.org/documentation/release/RELEASE_NOTES_2_6.html), efforts were made by developers like Dan Scott to add structured data elements to Evergreen's catalog to make them more discoverable. This work has continued throughout newer Evergreen releases and I'd like to say that through Dan's work and others, it has been essential towards keeping Evergreen's catalog more friendly to search engines, like Google, etc. Evergreen 2.8's release notes include lots more discoverability enhancements added with that release too: http://evergreen-ils.org/documentation/release/RELEASE_NOTES_2_8.html#_opac Since your site does not include a manually configured robots.txt file, I'll point you at an example set at Dan's library Laurentian University's catalog: https://laurentian.concat.ca/robots.txt (we based many of our changes following the example they set). That robots.txt file tends to guide search engine bots that arrive at the catalog towards indexing the appropriate contents, and avoid/skip over certain undesirables. By default, if you do not have anything set, then search engine bots will likely attempt to index everything in your catalog that it can publicly access. Doing an example search like https://www.google.com/#q=asbury+catalog+Star+Trek (aka, keywords in Google for asbury catalog Star Trek I can already see a couple results that come from your Evergreen catalog records. So at least Google's search engine bots are already working to grab your catalog's contents. That all said, I suppose one potential danger of having bots freely scan over your site is that if they get too busy with indexing your site's contents, they can overwhelm and cause interruptions in your ability to use Evergreen. This happened to us at least once before, where some indexer in China scanned our whole catalog and tried to index every page causing us to run out of system resources trying to serve up all the content it was requesting. For myself and Bibliomation's catalog, I've been experimenting with modifying our robots.txt file and continually upgrading our Evergreen catalog to reflect the latest enhancements for structured data to try making the most use out of what's possible in Evergreen. Proceeding forward, I've also done some small experiments in creating Google Custom Search Engines to search against our indexed online catalog (and requesting scheduled indexing from Google's bots) as an alternative means of discovering the content contained in our systems. Moving forward, I expect this to continue to be an exciting area to explore the ways of improving discoverability of Evergreen's content. -- Ben On Thu, Apr 9, 2015 at 8:15 AM, Donald Butterworth don.butterwo...@asburyseminary.edu wrote: Hi everyone, I was asked to toss these questions out and get some perspectives. What would it take to make the Evergreen catalog holdings available to generic search engines like Google, Bing, Yahoo and DuckDuckGo? Even if it is doable, is it a good idea? The motivation behind these questions is a perception that the first attempt many students make to do research is through a general web search. Anybody have a comment? Don -- Don Butterworth Faculty Associate / Librarian III B.L. Fisher Library Asbury Theological Seminary don.butterwo...@asburyseminary.edu (859) 858-2227 -- Benjamin Shum Evergreen Systems Manager Bibliomation, Inc. 24 Wooster Ave. Waterbury, CT 06708 203-577-4070, ext. 113
Re: [OPEN-ILS-GENERAL] Evergreen access via Google?
Wow Ben! Thanks for the great answer! -- Don On Thu, Apr 9, 2015 at 8:52 AM, Ben Shum bs...@biblio.org wrote: Hi Don, Starting as recently as Evergreen 2.6 (it's noted on the Evergreen 2.6 release notes under structured data - http://evergreen-ils.org/documentation/release/RELEASE_NOTES_2_6.html), efforts were made by developers like Dan Scott to add structured data elements to Evergreen's catalog to make them more discoverable. This work has continued throughout newer Evergreen releases and I'd like to say that through Dan's work and others, it has been essential towards keeping Evergreen's catalog more friendly to search engines, like Google, etc. Evergreen 2.8's release notes include lots more discoverability enhancements added with that release too: http://evergreen-ils.org/documentation/release/RELEASE_NOTES_2_8.html#_opac Since your site does not include a manually configured robots.txt file, I'll point you at an example set at Dan's library Laurentian University's catalog: https://laurentian.concat.ca/robots.txt (we based many of our changes following the example they set). That robots.txt file tends to guide search engine bots that arrive at the catalog towards indexing the appropriate contents, and avoid/skip over certain undesirables. By default, if you do not have anything set, then search engine bots will likely attempt to index everything in your catalog that it can publicly access. Doing an example search like https://www.google.com/#q=asbury+catalog+Star+Trek (aka, keywords in Google for asbury catalog Star Trek I can already see a couple results that come from your Evergreen catalog records. So at least Google's search engine bots are already working to grab your catalog's contents. That all said, I suppose one potential danger of having bots freely scan over your site is that if they get too busy with indexing your site's contents, they can overwhelm and cause interruptions in your ability to use Evergreen. This happened to us at least once before, where some indexer in China scanned our whole catalog and tried to index every page causing us to run out of system resources trying to serve up all the content it was requesting. For myself and Bibliomation's catalog, I've been experimenting with modifying our robots.txt file and continually upgrading our Evergreen catalog to reflect the latest enhancements for structured data to try making the most use out of what's possible in Evergreen. Proceeding forward, I've also done some small experiments in creating Google Custom Search Engines to search against our indexed online catalog (and requesting scheduled indexing from Google's bots) as an alternative means of discovering the content contained in our systems. Moving forward, I expect this to continue to be an exciting area to explore the ways of improving discoverability of Evergreen's content. -- Ben On Thu, Apr 9, 2015 at 8:15 AM, Donald Butterworth don.butterwo...@asburyseminary.edu wrote: Hi everyone, I was asked to toss these questions out and get some perspectives. What would it take to make the Evergreen catalog holdings available to generic search engines like Google, Bing, Yahoo and DuckDuckGo? Even if it is doable, is it a good idea? The motivation behind these questions is a perception that the first attempt many students make to do research is through a general web search. Anybody have a comment? Don -- Don Butterworth Faculty Associate / Librarian III B.L. Fisher Library Asbury Theological Seminary don.butterwo...@asburyseminary.edu (859) 858-2227 -- Benjamin Shum Evergreen Systems Manager Bibliomation, Inc. 24 Wooster Ave. Waterbury, CT 06708 203-577-4070, ext. 113 -- Don Butterworth Faculty Associate / Librarian III B.L. Fisher Library Asbury Theological Seminary don.butterwo...@asburyseminary.edu (859) 858-2227
Re: [OPEN-ILS-GENERAL] Evergreen access via Google?
Oh and Dan writes good stuff about his findings with structured data on his blog. See: https://coffeecode.net/categories/22-Structured-data I've found that to be a helpful resource in learning about how these things work and where they might be headed someday. -- Ben Sent from my Nexus 6 On Apr 9, 2015 8:52 AM, Ben Shum bs...@biblio.org wrote: Hi Don, Starting as recently as Evergreen 2.6 (it's noted on the Evergreen 2.6 release notes under structured data - http://evergreen-ils.org/documentation/release/RELEASE_NOTES_2_6.html), efforts were made by developers like Dan Scott to add structured data elements to Evergreen's catalog to make them more discoverable. This work has continued throughout newer Evergreen releases and I'd like to say that through Dan's work and others, it has been essential towards keeping Evergreen's catalog more friendly to search engines, like Google, etc. Evergreen 2.8's release notes include lots more discoverability enhancements added with that release too: http://evergreen-ils.org/documentation/release/RELEASE_NOTES_2_8.html#_opac Since your site does not include a manually configured robots.txt file, I'll point you at an example set at Dan's library Laurentian University's catalog: https://laurentian.concat.ca/robots.txt (we based many of our changes following the example they set). That robots.txt file tends to guide search engine bots that arrive at the catalog towards indexing the appropriate contents, and avoid/skip over certain undesirables. By default, if you do not have anything set, then search engine bots will likely attempt to index everything in your catalog that it can publicly access. Doing an example search like https://www.google.com/#q=asbury+catalog+Star+Trek (aka, keywords in Google for asbury catalog Star Trek I can already see a couple results that come from your Evergreen catalog records. So at least Google's search engine bots are already working to grab your catalog's contents. That all said, I suppose one potential danger of having bots freely scan over your site is that if they get too busy with indexing your site's contents, they can overwhelm and cause interruptions in your ability to use Evergreen. This happened to us at least once before, where some indexer in China scanned our whole catalog and tried to index every page causing us to run out of system resources trying to serve up all the content it was requesting. For myself and Bibliomation's catalog, I've been experimenting with modifying our robots.txt file and continually upgrading our Evergreen catalog to reflect the latest enhancements for structured data to try making the most use out of what's possible in Evergreen. Proceeding forward, I've also done some small experiments in creating Google Custom Search Engines to search against our indexed online catalog (and requesting scheduled indexing from Google's bots) as an alternative means of discovering the content contained in our systems. Moving forward, I expect this to continue to be an exciting area to explore the ways of improving discoverability of Evergreen's content. -- Ben On Thu, Apr 9, 2015 at 8:15 AM, Donald Butterworth don.butterwo...@asburyseminary.edu wrote: Hi everyone, I was asked to toss these questions out and get some perspectives. What would it take to make the Evergreen catalog holdings available to generic search engines like Google, Bing, Yahoo and DuckDuckGo? Even if it is doable, is it a good idea? The motivation behind these questions is a perception that the first attempt many students make to do research is through a general web search. Anybody have a comment? Don -- Don Butterworth Faculty Associate / Librarian III B.L. Fisher Library Asbury Theological Seminary don.butterwo...@asburyseminary.edu (859) 858-2227 -- Benjamin Shum Evergreen Systems Manager Bibliomation, Inc. 24 Wooster Ave. Waterbury, CT 06708 203-577-4070, ext. 113