Re: pushing updates to solr from postgresql

2012-04-18 Thread Otis Gospodnetic
Hi Richard,

One thing to think about here is what you will do when Solr is unavailable to 
take a new document for whatever reason.  If you send docs to Solr from PG, 
docs either get indexed or not.  So you may have to catch errors and then mark 
documents in PG as not indexed.  You may want to keep track of initial and/or 
last index attempt and the total number of indexing attempts (new DB columns) 
and will probably want to use DIH to "pick up" unindexed documents from PG and 
get them indexed.

Also keep in mind that sending docs to Solr one by one will not be as efficient 
as sending batches of them or as efficient as getting a batch of them via DIH.  
If your data volume is low this likely won't be a problem, but if it is it high 
or is growing, you'll want to keep this in mind.

Otis

Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html



>
> From: "Welty, Richard" 
>To: solr-user@lucene.apache.org 
>Sent: Wednesday, April 18, 2012 10:48 AM
>Subject: pushing updates to solr from postgresql
> 
>i have a setup right this instant where the dataimporthandler is being used to 
>pull data for an index from a postgresql server.
>
>i'd like to switch over to push, and am looking for some validation of my 
>approach.
>
>i have perl installed as an untrusted language on my postgresql server and am 
>planning to set up triggers on the tables where insert/update/delete 
>operations should cause an update of the relevant solr indexes. the trigger 
>functions will build xml in the format for UpdateXmlMessages and notify Solr 
>via http requests.
>
>
>is this sensible, or am i missing something easier?
>
>also, does anyone have any thoughts about coordinating initial indexing/full 
>reindexing via dataimporthandler with the trigger based push operations?
>
>thanks,
>   richard
>
>
>

pushing updates to solr from postgresql

2012-04-18 Thread Welty, Richard
i have a setup right this instant where the dataimporthandler is being used to 
pull data for an index from a postgresql server.

i'd like to switch over to push, and am looking for some validation of my 
approach.

i have perl installed as an untrusted language on my postgresql server and am 
planning to set up triggers on the tables where insert/update/delete operations 
should cause an update of the relevant solr indexes. the trigger functions will 
build xml in the format for UpdateXmlMessages and notify Solr via http requests.


is this sensible, or am i missing something easier?

also, does anyone have any thoughts about coordinating initial indexing/full 
reindexing via dataimporthandler with the trigger based push operations?

thanks,
   richard