Re: Updating last_modified field when using DIH
Stephan, Ephraim. Thanks for the answers!!! I am finding Solr to be a useful product, but definitely the community is what makes it a great product! So far everyone has been very helpful. Thanks! Cheers! Juan M. On Wed, Nov 3, 2010 at 9:13 AM, Ephraim Ofir ephra...@icq.com wrote: Also, your deltaImportQuery should be: deltaImportQuery='SELECT * FROM Entities WHERE ent_id=${dataimporter.delta.id}' Otherwise you're just importing the ids and not the rest of the data. If performance is important to you, you might also want to check out http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3 c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com %3E Ephraim Ofir -Original Message- From: Stefan Matheis [mailto:matheis.ste...@googlemail.com] Sent: Wednesday, November 03, 2010 12:58 PM To: solr-user@lucene.apache.org Subject: Re: Updating last_modified field when using DIH Juan, that's correct .. solr will not touch your database, that's part of your application-code. solr uses an updated timestamp (which is available through dataimporter.last_index_time). so, image the following situation, solr import runs every 10 minutes .. last run at 11:00, your entity gets updated at 11:03, next solr-run at 11:10 will detect this as changed, import the entity and run again at 11:20 .. then, no entity will match the delta-query because solr will ask for a modification_date 11:10 (last solr-run at this time). you'll only need to update the last_modified field (in your application) when the entity is changed and you want solr to (re-)index your data. HTH, Stefan On Tue, Nov 2, 2010 at 7:35 PM, Juan Manuel Alvarez naici...@gmail.comwrote: Hello everyone! I would like to ask you a question about DIH and delta import. I am trying to sync Solr with a PostgreSQL database and I have a field ent_lastModified of type timestamp without timezone. Here is my xml file: dataConfig dataSource name=jdbc driver=org.postgresql.Driver url=jdbc:postgresql://host user=XXX password=XXX readOnly=true autoCommit=false transactionIsolation=TRANSACTION_READ_COMMITTED holdability=CLOSE_CURSORS_AT_COMMIT/ document entity name='myEntity' dataSource='jdbc' pk='id' query=' SELECT * FROM Entities' deltaImportQuery='SELECT ent_id AS id FROM Entities WHERE ent_id=${dataimporter.delta.id}' deltaQuery=' SELECT ent_id AS id FROM Entities WHERE ent_lastModified gt; #39;${dataimporter.last_index_time}#39;' /entity /document /dataConfig Full-import works fine, but when I run a delta-import the ent_lastModified field, I get the corresponding records, but the ent_lastModified stays the same, so if I make another delta-import, the same records are retreived. I have read all the documentation at http://wiki.apache.org/solr/DataImportHandler but I could not find an update query for the last_modified field and Solr does not seem to do this automatically. I have also tried to name the field last_modified as in the example, but its value keeps unchanged after a delta-import. Can anyone point me in the right direction? Thanks in advance! Juan M.
Re: Updating last_modified field when using DIH
Juan, that's correct .. solr will not touch your database, that's part of your application-code. solr uses an updated timestamp (which is available through dataimporter.last_index_time). so, image the following situation, solr import runs every 10 minutes .. last run at 11:00, your entity gets updated at 11:03, next solr-run at 11:10 will detect this as changed, import the entity and run again at 11:20 .. then, no entity will match the delta-query because solr will ask for a modification_date 11:10 (last solr-run at this time). you'll only need to update the last_modified field (in your application) when the entity is changed and you want solr to (re-)index your data. HTH, Stefan On Tue, Nov 2, 2010 at 7:35 PM, Juan Manuel Alvarez naici...@gmail.comwrote: Hello everyone! I would like to ask you a question about DIH and delta import. I am trying to sync Solr with a PostgreSQL database and I have a field ent_lastModified of type timestamp without timezone. Here is my xml file: dataConfig dataSource name=jdbc driver=org.postgresql.Driver url=jdbc:postgresql://host user=XXX password=XXX readOnly=true autoCommit=false transactionIsolation=TRANSACTION_READ_COMMITTED holdability=CLOSE_CURSORS_AT_COMMIT/ document entity name='myEntity' dataSource='jdbc' pk='id' query=' SELECT * FROM Entities' deltaImportQuery='SELECT ent_id AS id FROM Entities WHERE ent_id=${dataimporter.delta.id}' deltaQuery=' SELECT ent_id AS id FROM Entities WHERE ent_lastModified gt; #39;${dataimporter.last_index_time}#39;' /entity /document /dataConfig Full-import works fine, but when I run a delta-import the ent_lastModified field, I get the corresponding records, but the ent_lastModified stays the same, so if I make another delta-import, the same records are retreived. I have read all the documentation at http://wiki.apache.org/solr/DataImportHandler but I could not find an update query for the last_modified field and Solr does not seem to do this automatically. I have also tried to name the field last_modified as in the example, but its value keeps unchanged after a delta-import. Can anyone point me in the right direction? Thanks in advance! Juan M.
RE: Updating last_modified field when using DIH
Also, your deltaImportQuery should be: deltaImportQuery='SELECT * FROM Entities WHERE ent_id=${dataimporter.delta.id}' Otherwise you're just importing the ids and not the rest of the data. If performance is important to you, you might also want to check out http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201008.mbox/%3 c9f8b39cb3b7c6d4594293ea29ccf438b01702...@icq-mail.icq.il.office.aol.com %3E Ephraim Ofir -Original Message- From: Stefan Matheis [mailto:matheis.ste...@googlemail.com] Sent: Wednesday, November 03, 2010 12:58 PM To: solr-user@lucene.apache.org Subject: Re: Updating last_modified field when using DIH Juan, that's correct .. solr will not touch your database, that's part of your application-code. solr uses an updated timestamp (which is available through dataimporter.last_index_time). so, image the following situation, solr import runs every 10 minutes .. last run at 11:00, your entity gets updated at 11:03, next solr-run at 11:10 will detect this as changed, import the entity and run again at 11:20 .. then, no entity will match the delta-query because solr will ask for a modification_date 11:10 (last solr-run at this time). you'll only need to update the last_modified field (in your application) when the entity is changed and you want solr to (re-)index your data. HTH, Stefan On Tue, Nov 2, 2010 at 7:35 PM, Juan Manuel Alvarez naici...@gmail.comwrote: Hello everyone! I would like to ask you a question about DIH and delta import. I am trying to sync Solr with a PostgreSQL database and I have a field ent_lastModified of type timestamp without timezone. Here is my xml file: dataConfig dataSource name=jdbc driver=org.postgresql.Driver url=jdbc:postgresql://host user=XXX password=XXX readOnly=true autoCommit=false transactionIsolation=TRANSACTION_READ_COMMITTED holdability=CLOSE_CURSORS_AT_COMMIT/ document entity name='myEntity' dataSource='jdbc' pk='id' query=' SELECT * FROM Entities' deltaImportQuery='SELECT ent_id AS id FROM Entities WHERE ent_id=${dataimporter.delta.id}' deltaQuery=' SELECT ent_id AS id FROM Entities WHERE ent_lastModified gt; #39;${dataimporter.last_index_time}#39;' /entity /document /dataConfig Full-import works fine, but when I run a delta-import the ent_lastModified field, I get the corresponding records, but the ent_lastModified stays the same, so if I make another delta-import, the same records are retreived. I have read all the documentation at http://wiki.apache.org/solr/DataImportHandler but I could not find an update query for the last_modified field and Solr does not seem to do this automatically. I have also tried to name the field last_modified as in the example, but its value keeps unchanged after a delta-import. Can anyone point me in the right direction? Thanks in advance! Juan M.