Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-23 Thread Ere Maijala

Hi Edward,

I haven't actually used it for searching apart from a quick test, but 
PKP Harvester2 (pkp.sfu.ca/harvester/) might fit your needs. It's LAMP, 
open source and workable. We're building an OAI-PMH aggregator on top of it.


--Ere

On 16.3.2011 17:00, Edward M. Corrado wrote:

Hi,

I [will soon] have a small set (<  1000 records) of Dublin Core
metadata published in OAI_DC format that I want to be searchable via a
Web browser.  Normally we would use Ex Libris's Primo for this, but
this particular set of data may have some confidential information and
our repository only has minimal built in search functions. While we
still may go with Primo for these records, I am looking for at other
possibilities. The requirements as I see them are:

1) Can ingest records in OAI_DC format
2) Allow remote end-users who are familiar with the collection search
these ingest records via a Web browser.
3)Search should be keyword anywhere or individual fields although it
does not need to have every whizzbang feature out there. In other
words, basic search feature are fine.
4) Should support the ability to link to the display copy in our
repository (probably goes without saying)
5) Should be simple to install and maintain (Thus, at least in my
mind, eliminating something like Blacklight)
6) Preferably a LAMP application although a Windows server based
solution is a possibility as well
7) Preferably Open Source, or at least no- or low-cost

I haven't been able to find anything searching the Web, but it seems
like something people may have done before. Before I re-invent the
wheel or shoe-horn something together, does anyone have any
suggestions?

Edward




--
Ere Maijala (Mr.)
The National Library of Finland


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-23 Thread Ere Maijala

Oops, should have read the other replies first. :)

--Ere

On 23.3.2011 17:05, Ere Maijala wrote:

Hi Edward,

I haven't actually used it for searching apart from a quick test, but
PKP Harvester2 (pkp.sfu.ca/harvester/) might fit your needs. It's LAMP,
open source and workable. We're building an OAI-PMH aggregator on top of
it.

--Ere

On 16.3.2011 17:00, Edward M. Corrado wrote:

Hi,

I [will soon] have a small set (< 1000 records) of Dublin Core
metadata published in OAI_DC format that I want to be searchable via a
Web browser. Normally we would use Ex Libris's Primo for this, but
this particular set of data may have some confidential information and
our repository only has minimal built in search functions. While we
still may go with Primo for these records, I am looking for at other
possibilities. The requirements as I see them are:

1) Can ingest records in OAI_DC format
2) Allow remote end-users who are familiar with the collection search
these ingest records via a Web browser.
3)Search should be keyword anywhere or individual fields although it
does not need to have every whizzbang feature out there. In other
words, basic search feature are fine.
4) Should support the ability to link to the display copy in our
repository (probably goes without saying)
5) Should be simple to install and maintain (Thus, at least in my
mind, eliminating something like Blacklight)
6) Preferably a LAMP application although a Windows server based
solution is a possibility as well
7) Preferably Open Source, or at least no- or low-cost

I haven't been able to find anything searching the Web, but it seems
like something people may have done before. Before I re-invent the
wheel or shoe-horn something together, does anyone have any
suggestions?

Edward







--
Ere Maijala
Kansalliskirjasto


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-17 Thread Lemann, Alexander Bernard
I did something similar to this recently.  I used Python with oaipmh + 
simplejson + couchdb (the python library) to write a simple oai2json script 
which converts OAI XML into JSON and inserts it into a CouchDB instance.  I 
then used the CouchDB river service to index the CouchDB JSON in ElasticSearch. 
Now that I'm writing this, there may be a way to remove the Couch intermediary 
step from the process if you're not intending to ever change the records.

Once it was indexed I just needed to write some simple javascript to query the 
Elastic Search and then display the records in a friendly way (eg create that 
link into your system). The whole thing seems to be only ~160 lines of code 
including HTML and curl mojo to setup couchdb. You could block access to the ES 
indexer by IP via firewall rules for your access control if this is sufficient. 
Since the app is all in JS you don't even need a web server. There is a bit of 
work to get all of those other servers configured though.

If this sounds useful, drop me a line and I'll see about getting you the code.

Regards,
Alex Lemann

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward 
M. Corrado
Sent: Wednesday, March 16, 2011 11:01 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Simple Web-based Dublin Core search engine?

Hi,

I [will soon] have a small set (< 1000 records) of Dublin Core metadata 
published in OAI_DC format that I want to be searchable via a Web browser.  
Normally we would use Ex Libris's Primo for this, but this particular set of 
data may have some confidential information and our repository only has minimal 
built in search functions. While we still may go with Primo for these records, 
I am looking for at other possibilities. The requirements as I see them are:

1) Can ingest records in OAI_DC format
2) Allow remote end-users who are familiar with the collection search these 
ingest records via a Web browser.
3)Search should be keyword anywhere or individual fields although it does not 
need to have every whizzbang feature out there. In other words, basic search 
feature are fine.
4) Should support the ability to link to the display copy in our repository 
(probably goes without saying)
5) Should be simple to install and maintain (Thus, at least in my mind, 
eliminating something like Blacklight)
6) Preferably a LAMP application although a Windows server based solution is a 
possibility as well
7) Preferably Open Source, or at least no- or low-cost

I haven't been able to find anything searching the Web, but it seems like 
something people may have done before. Before I re-invent the wheel or 
shoe-horn something together, does anyone have any suggestions?

Edward


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-17 Thread Edward M. Corrado
Thanks for the suggestion Patrick!

On Thu, Mar 17, 2011 at 11:24 AM, Patrick Murray-John
 wrote:
> Edward,
>
> One option might be http://omeka.org";>Omeka from the Center for
> History and New Media (full disclosure, I work for CHNM). It's designed for
> libraries, museums, archives, and like-minded folks to create online
> exhibits of their materials, and so has lots of Dublin Core and other
> metadata goodness (see the http://omeka.org/add-ons/plugins>plugins
> page). It's open source LAMP, and installation is pretty easy. There's
> an OAI_PMH harvester plugin, and a CSV importer plugin. One of those would
> likely do the trick for your importing.
>
> Hope that helps,
> Patrick
>


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Edward M. Corrado
Thanks for the information on the new version VuFind. I think it might
be ale to work, but I think I'm going to try PKP Harvester first.

Edward

On Wed, Mar 16, 2011 at 11:51 AM, Demian Katz  wrote:
> The new release of VuFind (1.1, due out this coming Monday) includes tools 
> for OAI-PMH harvesting and ingestion of arbitrary XML formats (some Dublin 
> Core examples are included).  With a little bit of XSLT tweaking (and 
> possibly implementation of a PHP class to customize record presentation), you 
> could probably get it to meet your needs fairly easily.  If you're interested 
> in trying this approach, I'm happy to offer more specific assistance -- just 
> let me know!  See also http://vufind.org.
>
> - Demian
>
>> -Original Message-
>> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>> Edward M. Corrado
>> Sent: Wednesday, March 16, 2011 11:01 AM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: [CODE4LIB] Simple Web-based Dublin Core search engine?
>>
>> Hi,
>>
>> I [will soon] have a small set (< 1000 records) of Dublin Core
>> metadata published in OAI_DC format that I want to be searchable via a
>> Web browser.  Normally we would use Ex Libris's Primo for this, but
>> this particular set of data may have some confidential information and
>> our repository only has minimal built in search functions. While we
>> still may go with Primo for these records, I am looking for at other
>> possibilities. The requirements as I see them are:
>>
>> 1) Can ingest records in OAI_DC format
>> 2) Allow remote end-users who are familiar with the collection search
>> these ingest records via a Web browser.
>> 3)Search should be keyword anywhere or individual fields although it
>> does not need to have every whizzbang feature out there. In other
>> words, basic search feature are fine.
>> 4) Should support the ability to link to the display copy in our
>> repository (probably goes without saying)
>> 5) Should be simple to install and maintain (Thus, at least in my
>> mind, eliminating something like Blacklight)
>> 6) Preferably a LAMP application although a Windows server based
>> solution is a possibility as well
>> 7) Preferably Open Source, or at least no- or low-cost
>>
>> I haven't been able to find anything searching the Web, but it seems
>> like something people may have done before. Before I re-invent the
>> wheel or shoe-horn something together, does anyone have any
>> suggestions?
>>
>> Edward
>


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Edward M. Corrado
Thanks for the information on PKP Harvester. I browsed through the
documentation and it looks like it should do what I need.

Edward

On Wed, Mar 16, 2011 at 1:10 PM, Roy Tennant  wrote:
> Yeah, duh, so have I and I completely forgot about it. My bad. This
> would likely be the easiest path if what you have is OAI_DC. As I
> recall, it was fairly simple to get going.
> Roy
>
> On Wed, Mar 16, 2011 at 10:01 AM, Mark Jordan  wrote:
>> Hi,
>>
>> I've done this by encoding the DC records in an OAI static repository, works 
>> great.
>>
>> Mark
>>
>> Mark Jordan
>> Head of Library Systems
>> W.A.C. Bennett Library, Simon Fraser University
>> Burnaby, British Columbia, V5A 1S6, Canada
>> Voice: 778.782.5753 / Fax: 778.782.3023 / Skype: mark.jordan50
>> mjor...@sfu.ca
>>
>> - Original Message -
>>> I wonder if you might be able to load the file in PKP Harvester.
>>>
>>> http://pkp.sfu.ca/?q=harvester
>>>
>>> It should already be able to parse and index OAI-DC, and would give
>>> you a nice, simple interface. It's based on a straight LAMP stack,
>>> which would make it easier to get up and running than some of the
>>> other suggestions so far.
>>>
>>> It's designed to harvest rather than load data, but that has got to be
>>> a fairly simple thing to workaround. I've never done this myself, so I
>>> could be entirely wrong.
>>>
>>> --Dave
>>>
>>> ==
>>> David Walker
>>> Library Web Services Manager
>>> California State University
>>> http://xerxes.calstate.edu
>>> 
>>> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>>> Edward M. Corrado [ecorr...@ecorrado.us]
>>> Sent: Wednesday, March 16, 2011 8:00 AM
>>> To: CODE4LIB@LISTSERV.ND.EDU
>>> Subject: [CODE4LIB] Simple Web-based Dublin Core search engine?
>>>
>>> Hi,
>>>
>>> I [will soon] have a small set (< 1000 records) of Dublin Core
>>> metadata published in OAI_DC format that I want to be searchable via a
>>> Web browser. Normally we would use Ex Libris's Primo for this, but
>>> this particular set of data may have some confidential information and
>>> our repository only has minimal built in search functions. While we
>>> still may go with Primo for these records, I am looking for at other
>>> possibilities. The requirements as I see them are:
>>>
>>> 1) Can ingest records in OAI_DC format
>>> 2) Allow remote end-users who are familiar with the collection search
>>> these ingest records via a Web browser.
>>> 3)Search should be keyword anywhere or individual fields although it
>>> does not need to have every whizzbang feature out there. In other
>>> words, basic search feature are fine.
>>> 4) Should support the ability to link to the display copy in our
>>> repository (probably goes without saying)
>>> 5) Should be simple to install and maintain (Thus, at least in my
>>> mind, eliminating something like Blacklight)
>>> 6) Preferably a LAMP application although a Windows server based
>>> solution is a possibility as well
>>> 7) Preferably Open Source, or at least no- or low-cost
>>>
>>> I haven't been able to find anything searching the Web, but it seems
>>> like something people may have done before. Before I re-invent the
>>> wheel or shoe-horn something together, does anyone have any
>>> suggestions?
>>>
>>> Edward
>>
>


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Roy Tennant
Yeah, duh, so have I and I completely forgot about it. My bad. This
would likely be the easiest path if what you have is OAI_DC. As I
recall, it was fairly simple to get going.
Roy

On Wed, Mar 16, 2011 at 10:01 AM, Mark Jordan  wrote:
> Hi,
>
> I've done this by encoding the DC records in an OAI static repository, works 
> great.
>
> Mark
>
> Mark Jordan
> Head of Library Systems
> W.A.C. Bennett Library, Simon Fraser University
> Burnaby, British Columbia, V5A 1S6, Canada
> Voice: 778.782.5753 / Fax: 778.782.3023 / Skype: mark.jordan50
> mjor...@sfu.ca
>
> - Original Message -
>> I wonder if you might be able to load the file in PKP Harvester.
>>
>> http://pkp.sfu.ca/?q=harvester
>>
>> It should already be able to parse and index OAI-DC, and would give
>> you a nice, simple interface. It's based on a straight LAMP stack,
>> which would make it easier to get up and running than some of the
>> other suggestions so far.
>>
>> It's designed to harvest rather than load data, but that has got to be
>> a fairly simple thing to workaround. I've never done this myself, so I
>> could be entirely wrong.
>>
>> --Dave
>>
>> ==
>> David Walker
>> Library Web Services Manager
>> California State University
>> http://xerxes.calstate.edu
>> 
>> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
>> Edward M. Corrado [ecorr...@ecorrado.us]
>> Sent: Wednesday, March 16, 2011 8:00 AM
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: [CODE4LIB] Simple Web-based Dublin Core search engine?
>>
>> Hi,
>>
>> I [will soon] have a small set (< 1000 records) of Dublin Core
>> metadata published in OAI_DC format that I want to be searchable via a
>> Web browser. Normally we would use Ex Libris's Primo for this, but
>> this particular set of data may have some confidential information and
>> our repository only has minimal built in search functions. While we
>> still may go with Primo for these records, I am looking for at other
>> possibilities. The requirements as I see them are:
>>
>> 1) Can ingest records in OAI_DC format
>> 2) Allow remote end-users who are familiar with the collection search
>> these ingest records via a Web browser.
>> 3)Search should be keyword anywhere or individual fields although it
>> does not need to have every whizzbang feature out there. In other
>> words, basic search feature are fine.
>> 4) Should support the ability to link to the display copy in our
>> repository (probably goes without saying)
>> 5) Should be simple to install and maintain (Thus, at least in my
>> mind, eliminating something like Blacklight)
>> 6) Preferably a LAMP application although a Windows server based
>> solution is a possibility as well
>> 7) Preferably Open Source, or at least no- or low-cost
>>
>> I haven't been able to find anything searching the Web, but it seems
>> like something people may have done before. Before I re-invent the
>> wheel or shoe-horn something together, does anyone have any
>> suggestions?
>>
>> Edward
>


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Mark Jordan
Hi,

I've done this by encoding the DC records in an OAI static repository, works 
great.

Mark

Mark Jordan
Head of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Voice: 778.782.5753 / Fax: 778.782.3023 / Skype: mark.jordan50
mjor...@sfu.ca

- Original Message -
> I wonder if you might be able to load the file in PKP Harvester.
> 
> http://pkp.sfu.ca/?q=harvester
> 
> It should already be able to parse and index OAI-DC, and would give
> you a nice, simple interface. It's based on a straight LAMP stack,
> which would make it easier to get up and running than some of the
> other suggestions so far.
> 
> It's designed to harvest rather than load data, but that has got to be
> a fairly simple thing to workaround. I've never done this myself, so I
> could be entirely wrong.
> 
> --Dave
> 
> ==
> David Walker
> Library Web Services Manager
> California State University
> http://xerxes.calstate.edu
> 
> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Edward M. Corrado [ecorr...@ecorrado.us]
> Sent: Wednesday, March 16, 2011 8:00 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Simple Web-based Dublin Core search engine?
> 
> Hi,
> 
> I [will soon] have a small set (< 1000 records) of Dublin Core
> metadata published in OAI_DC format that I want to be searchable via a
> Web browser. Normally we would use Ex Libris's Primo for this, but
> this particular set of data may have some confidential information and
> our repository only has minimal built in search functions. While we
> still may go with Primo for these records, I am looking for at other
> possibilities. The requirements as I see them are:
> 
> 1) Can ingest records in OAI_DC format
> 2) Allow remote end-users who are familiar with the collection search
> these ingest records via a Web browser.
> 3)Search should be keyword anywhere or individual fields although it
> does not need to have every whizzbang feature out there. In other
> words, basic search feature are fine.
> 4) Should support the ability to link to the display copy in our
> repository (probably goes without saying)
> 5) Should be simple to install and maintain (Thus, at least in my
> mind, eliminating something like Blacklight)
> 6) Preferably a LAMP application although a Windows server based
> solution is a possibility as well
> 7) Preferably Open Source, or at least no- or low-cost
> 
> I haven't been able to find anything searching the Web, but it seems
> like something people may have done before. Before I re-invent the
> wheel or shoe-horn something together, does anyone have any
> suggestions?
> 
> Edward


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Walker, David
I wonder if you might be able to load the file in PKP Harvester.

  http://pkp.sfu.ca/?q=harvester

It should already be able to parse and index OAI-DC, and would give you a nice, 
simple interface.  It's based on a straight LAMP stack, which would make it 
easier to get up and running than some of the other suggestions so far.

It's designed to harvest rather than load data, but that has got to be a fairly 
simple thing to workaround.  I've never done this myself, so I could be 
entirely wrong.

--Dave

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward M. 
Corrado [ecorr...@ecorrado.us]
Sent: Wednesday, March 16, 2011 8:00 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Simple Web-based Dublin Core search engine?

Hi,

I [will soon] have a small set (< 1000 records) of Dublin Core
metadata published in OAI_DC format that I want to be searchable via a
Web browser.  Normally we would use Ex Libris's Primo for this, but
this particular set of data may have some confidential information and
our repository only has minimal built in search functions. While we
still may go with Primo for these records, I am looking for at other
possibilities. The requirements as I see them are:

1) Can ingest records in OAI_DC format
2) Allow remote end-users who are familiar with the collection search
these ingest records via a Web browser.
3)Search should be keyword anywhere or individual fields although it
does not need to have every whizzbang feature out there. In other
words, basic search feature are fine.
4) Should support the ability to link to the display copy in our
repository (probably goes without saying)
5) Should be simple to install and maintain (Thus, at least in my
mind, eliminating something like Blacklight)
6) Preferably a LAMP application although a Windows server based
solution is a possibility as well
7) Preferably Open Source, or at least no- or low-cost

I haven't been able to find anything searching the Web, but it seems
like something people may have done before. Before I re-invent the
wheel or shoe-horn something together, does anyone have any
suggestions?

Edward


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Demian Katz
The new release of VuFind (1.1, due out this coming Monday) includes tools for 
OAI-PMH harvesting and ingestion of arbitrary XML formats (some Dublin Core 
examples are included).  With a little bit of XSLT tweaking (and possibly 
implementation of a PHP class to customize record presentation), you could 
probably get it to meet your needs fairly easily.  If you're interested in 
trying this approach, I'm happy to offer more specific assistance -- just let 
me know!  See also http://vufind.org.

- Demian

> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Edward M. Corrado
> Sent: Wednesday, March 16, 2011 11:01 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Simple Web-based Dublin Core search engine?
> 
> Hi,
> 
> I [will soon] have a small set (< 1000 records) of Dublin Core
> metadata published in OAI_DC format that I want to be searchable via a
> Web browser.  Normally we would use Ex Libris's Primo for this, but
> this particular set of data may have some confidential information and
> our repository only has minimal built in search functions. While we
> still may go with Primo for these records, I am looking for at other
> possibilities. The requirements as I see them are:
> 
> 1) Can ingest records in OAI_DC format
> 2) Allow remote end-users who are familiar with the collection search
> these ingest records via a Web browser.
> 3)Search should be keyword anywhere or individual fields although it
> does not need to have every whizzbang feature out there. In other
> words, basic search feature are fine.
> 4) Should support the ability to link to the display copy in our
> repository (probably goes without saying)
> 5) Should be simple to install and maintain (Thus, at least in my
> mind, eliminating something like Blacklight)
> 6) Preferably a LAMP application although a Windows server based
> solution is a possibility as well
> 7) Preferably Open Source, or at least no- or low-cost
> 
> I haven't been able to find anything searching the Web, but it seems
> like something people may have done before. Before I re-invent the
> wheel or shoe-horn something together, does anyone have any
> suggestions?
> 
> Edward


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Edward M. Corrado
Thanks Roy,

I will look into Swish-e.

Edward

On Wed, Mar 16, 2011 at 11:32 AM, Roy Tennant  wrote:
> These requirements fit Swish-e [1] to a "T". I've used it to index
> millions of XML records [2], and there are no particular requirements
> for the XML -- it just needs to be well-formed. You can have it
> automatically detect and index XML fields as well as index all words
> across all fields. This is all handled by a very simple text config
> file. The only downside is you will need to write the user interface
> (CGI) in your favorite language to interact with Swish-e.
>
> For example, here is my entire config file for Current Cites [3],
> where I store citations in my own XML format:
>
> DefaultContents XML*
> UndefinedMetaTags auto
> IndexDir /home/tennantr/public_html/currentcites/cites/
> ReplaceRules remove /home/tennantr/public_html/currentcites/cites/
> PropertyNames creator title description booktitle source
> IndexOnly .xml
>
> This tells Swish-e to expect XML, the line "UndefinedMetaTags auto"
> tells it to keep track of any XML tag it sees, the next two lines
> telll it where the files are and I remove the path from the index so I
> only get returned each file title without the server path included.
> The "PropertyNames" line defines with elements are actually stored in
> the index, which I can then retrieve directly in the search results
> for display to the user. The "IndexOnly .xml" line tells Swish-e to
> ignore anything without that filename extension. Nothing could be
> easier.
> Roy
>
> [1] http://swish-e.org/
> [2] http://roytennant.com/proto/hathi/
> [3] http://lists.webjunction.org/currentcites/
>
> On Wed, Mar 16, 2011 at 8:00 AM, Edward M. Corrado  
> wrote:
>> Hi,
>>
>> I [will soon] have a small set (< 1000 records) of Dublin Core
>> metadata published in OAI_DC format that I want to be searchable via a
>> Web browser.  Normally we would use Ex Libris's Primo for this, but
>> this particular set of data may have some confidential information and
>> our repository only has minimal built in search functions. While we
>> still may go with Primo for these records, I am looking for at other
>> possibilities. The requirements as I see them are:
>>
>> 1) Can ingest records in OAI_DC format
>> 2) Allow remote end-users who are familiar with the collection search
>> these ingest records via a Web browser.
>> 3)Search should be keyword anywhere or individual fields although it
>> does not need to have every whizzbang feature out there. In other
>> words, basic search feature are fine.
>> 4) Should support the ability to link to the display copy in our
>> repository (probably goes without saying)
>> 5) Should be simple to install and maintain (Thus, at least in my
>> mind, eliminating something like Blacklight)
>> 6) Preferably a LAMP application although a Windows server based
>> solution is a possibility as well
>> 7) Preferably Open Source, or at least no- or low-cost
>>
>> I haven't been able to find anything searching the Web, but it seems
>> like something people may have done before. Before I re-invent the
>> wheel or shoe-horn something together, does anyone have any
>> suggestions?
>>
>> Edward
>>
>


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Roy Tennant
These requirements fit Swish-e [1] to a "T". I've used it to index
millions of XML records [2], and there are no particular requirements
for the XML -- it just needs to be well-formed. You can have it
automatically detect and index XML fields as well as index all words
across all fields. This is all handled by a very simple text config
file. The only downside is you will need to write the user interface
(CGI) in your favorite language to interact with Swish-e.

For example, here is my entire config file for Current Cites [3],
where I store citations in my own XML format:

DefaultContents XML*
UndefinedMetaTags auto
IndexDir /home/tennantr/public_html/currentcites/cites/
ReplaceRules remove /home/tennantr/public_html/currentcites/cites/
PropertyNames creator title description booktitle source
IndexOnly .xml

This tells Swish-e to expect XML, the line "UndefinedMetaTags auto"
tells it to keep track of any XML tag it sees, the next two lines
telll it where the files are and I remove the path from the index so I
only get returned each file title without the server path included.
The "PropertyNames" line defines with elements are actually stored in
the index, which I can then retrieve directly in the search results
for display to the user. The "IndexOnly .xml" line tells Swish-e to
ignore anything without that filename extension. Nothing could be
easier.
Roy

[1] http://swish-e.org/
[2] http://roytennant.com/proto/hathi/
[3] http://lists.webjunction.org/currentcites/

On Wed, Mar 16, 2011 at 8:00 AM, Edward M. Corrado  wrote:
> Hi,
>
> I [will soon] have a small set (< 1000 records) of Dublin Core
> metadata published in OAI_DC format that I want to be searchable via a
> Web browser.  Normally we would use Ex Libris's Primo for this, but
> this particular set of data may have some confidential information and
> our repository only has minimal built in search functions. While we
> still may go with Primo for these records, I am looking for at other
> possibilities. The requirements as I see them are:
>
> 1) Can ingest records in OAI_DC format
> 2) Allow remote end-users who are familiar with the collection search
> these ingest records via a Web browser.
> 3)Search should be keyword anywhere or individual fields although it
> does not need to have every whizzbang feature out there. In other
> words, basic search feature are fine.
> 4) Should support the ability to link to the display copy in our
> repository (probably goes without saying)
> 5) Should be simple to install and maintain (Thus, at least in my
> mind, eliminating something like Blacklight)
> 6) Preferably a LAMP application although a Windows server based
> solution is a possibility as well
> 7) Preferably Open Source, or at least no- or low-cost
>
> I haven't been able to find anything searching the Web, but it seems
> like something people may have done before. Before I re-invent the
> wheel or shoe-horn something together, does anyone have any
> suggestions?
>
> Edward
>


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Edward M. Corrado
Hi Pascal,

Thanks for the Primo-suggestions. One thing I should probably have
added besides some confidential information, the collection will also
be only useful for a small number of scholars working with this
specific collection and not something anyone else would probably be
all that interested in. Thus having the collection in something that
combines it with other collections is not a requirement and might even
be considered an impediment. That said, I'll check out your
suggestions.

Thanks,

Edward

On Wed, Mar 16, 2011 at 11:17 AM, Pascal Calarco  wrote:
> Hi Edward --
>
> I am not sure if you're allowed to tweak normalization and pipe rules for the 
> hosted Primo you have, but if the confidential information were in fairly 
> consistent fields, you could either 1) make this a collection that is only 
> searchable for authenticated Primo users or 2) define a regex normalization 
> rule that strips out the confidential information (although you may want to 
> retain that for staff to see), or 3) retain the information in the PNX 
> record, but not add it to the display section with a regex normalization rule 
> (so staff users could still see these with the view PNX option, but general 
> users would not see these fields).  Just some ideas from the Primo end of 
> things.
>
>  - pascal
>
> ---
> Pascal Calarco
> Head, Library Information Systems
> Hesburgh Libraries, University of Notre Dame /
> Michiana Academic Library Consortium
> Notre Dame, IN
> http://www.library.nd.edu/
> 
> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward M. 
> Corrado [ecorr...@ecorrado.us]
> Sent: Wednesday, March 16, 2011 11:00 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: [CODE4LIB] Simple Web-based Dublin Core search engine?
>
> Hi,
>
> I [will soon] have a small set (< 1000 records) of Dublin Core
> metadata published in OAI_DC format that I want to be searchable via a
> Web browser.  Normally we would use Ex Libris's Primo for this, but
> this particular set of data may have some confidential information and
> our repository only has minimal built in search functions. While we
> still may go with Primo for these records, I am looking for at other
> possibilities. The requirements as I see them are:
>
> 1) Can ingest records in OAI_DC format
> 2) Allow remote end-users who are familiar with the collection search
> these ingest records via a Web browser.
> 3)Search should be keyword anywhere or individual fields although it
> does not need to have every whizzbang feature out there. In other
> words, basic search feature are fine.
> 4) Should support the ability to link to the display copy in our
> repository (probably goes without saying)
> 5) Should be simple to install and maintain (Thus, at least in my
> mind, eliminating something like Blacklight)
> 6) Preferably a LAMP application although a Windows server based
> solution is a possibility as well
> 7) Preferably Open Source, or at least no- or low-cost
>
> I haven't been able to find anything searching the Web, but it seems
> like something people may have done before. Before I re-invent the
> wheel or shoe-horn something together, does anyone have any
> suggestions?
>
> Edward
>


Re: [CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Pascal Calarco
Hi Edward --

I am not sure if you're allowed to tweak normalization and pipe rules for the 
hosted Primo you have, but if the confidential information were in fairly 
consistent fields, you could either 1) make this a collection that is only 
searchable for authenticated Primo users or 2) define a regex normalization 
rule that strips out the confidential information (although you may want to 
retain that for staff to see), or 3) retain the information in the PNX record, 
but not add it to the display section with a regex normalization rule (so staff 
users could still see these with the view PNX option, but general users would 
not see these fields).  Just some ideas from the Primo end of things.

  - pascal

---
Pascal Calarco
Head, Library Information Systems
Hesburgh Libraries, University of Notre Dame / 
Michiana Academic Library Consortium
Notre Dame, IN 
http://www.library.nd.edu/

From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Edward M. 
Corrado [ecorr...@ecorrado.us]
Sent: Wednesday, March 16, 2011 11:00 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] Simple Web-based Dublin Core search engine?

Hi,

I [will soon] have a small set (< 1000 records) of Dublin Core
metadata published in OAI_DC format that I want to be searchable via a
Web browser.  Normally we would use Ex Libris's Primo for this, but
this particular set of data may have some confidential information and
our repository only has minimal built in search functions. While we
still may go with Primo for these records, I am looking for at other
possibilities. The requirements as I see them are:

1) Can ingest records in OAI_DC format
2) Allow remote end-users who are familiar with the collection search
these ingest records via a Web browser.
3)Search should be keyword anywhere or individual fields although it
does not need to have every whizzbang feature out there. In other
words, basic search feature are fine.
4) Should support the ability to link to the display copy in our
repository (probably goes without saying)
5) Should be simple to install and maintain (Thus, at least in my
mind, eliminating something like Blacklight)
6) Preferably a LAMP application although a Windows server based
solution is a possibility as well
7) Preferably Open Source, or at least no- or low-cost

I haven't been able to find anything searching the Web, but it seems
like something people may have done before. Before I re-invent the
wheel or shoe-horn something together, does anyone have any
suggestions?

Edward


[CODE4LIB] Simple Web-based Dublin Core search engine?

2011-03-16 Thread Edward M. Corrado
Hi,

I [will soon] have a small set (< 1000 records) of Dublin Core
metadata published in OAI_DC format that I want to be searchable via a
Web browser.  Normally we would use Ex Libris's Primo for this, but
this particular set of data may have some confidential information and
our repository only has minimal built in search functions. While we
still may go with Primo for these records, I am looking for at other
possibilities. The requirements as I see them are:

1) Can ingest records in OAI_DC format
2) Allow remote end-users who are familiar with the collection search
these ingest records via a Web browser.
3)Search should be keyword anywhere or individual fields although it
does not need to have every whizzbang feature out there. In other
words, basic search feature are fine.
4) Should support the ability to link to the display copy in our
repository (probably goes without saying)
5) Should be simple to install and maintain (Thus, at least in my
mind, eliminating something like Blacklight)
6) Preferably a LAMP application although a Windows server based
solution is a possibility as well
7) Preferably Open Source, or at least no- or low-cost

I haven't been able to find anything searching the Web, but it seems
like something people may have done before. Before I re-invent the
wheel or shoe-horn something together, does anyone have any
suggestions?

Edward