Hi Richard,
One thing to think about here is what you will do when Solr is unavailable to
take a new document for whatever reason. If you send docs to Solr from PG,
docs either get indexed or not. So you may have to catch errors and then mark
documents in PG as not indexed. You may want to keep track of initial and/or
last index attempt and the total number of indexing attempts (new DB columns)
and will probably want to use DIH to "pick up" unindexed documents from PG and
get them indexed.
Also keep in mind that sending docs to Solr one by one will not be as efficient
as sending batches of them or as efficient as getting a batch of them via DIH.
If your data volume is low this likely won't be a problem, but if it is it high
or is growing, you'll want to keep this in mind.
Otis
Performance Monitoring SaaS for Solr -
http://sematext.com/spm/solr-performance-monitoring/index.html
>
> From: "Welty, Richard"
>To: solr-user@lucene.apache.org
>Sent: Wednesday, April 18, 2012 10:48 AM
>Subject: pushing updates to solr from postgresql
>
>i have a setup right this instant where the dataimporthandler is being used to
>pull data for an index from a postgresql server.
>
>i'd like to switch over to push, and am looking for some validation of my
>approach.
>
>i have perl installed as an untrusted language on my postgresql server and am
>planning to set up triggers on the tables where insert/update/delete
>operations should cause an update of the relevant solr indexes. the trigger
>functions will build xml in the format for UpdateXmlMessages and notify Solr
>via http requests.
>
>
>is this sensible, or am i missing something easier?
>
>also, does anyone have any thoughts about coordinating initial indexing/full
>reindexing via dataimporthandler with the trigger based push operations?
>
>thanks,
> richard
>
>
>