RE: dataImportHandler: delta query fetching data, not just ids?
You can also use $deleteDocById . If you also use $skipDoc, you can sometimes get the deletes on the same entity with a "command=full-import&clean=false" delta. This may or may not be more convienent that what you're doing already. See http://wiki.apache.org/solr/DataImportHandler#Special_Commands . James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: janne mattila [mailto:jannepostilis...@gmail.com] Sent: Thursday, March 29, 2012 12:45 AM To: solr-user@lucene.apache.org Subject: Re: dataImportHandler: delta query fetching data, not just ids? > I'm not sure why deltas were implemented this way. Possibly it was designed > to behave like some of our object-to-relational libraries? In any case, > there are 2 ways to do deltas and you just have to take your pick based on > what will work best for your situation. I wouldn't consider the > "command=full-import&clean=false" method a workaround but just a different > way to tackle the same problem. Yeah, I find the delta-update strategy a little strange as well. Problem with command=full-import&clean=false is that you can't handle removes nicely using that. If you use the actual delta-import and deletedPkQuery for that, you run into problems with last_index_time and miss either modifications or deletes. I'm handling that by creating a different entity config for updates (using command=full-import&clean=false) and deletes (using command=delta-import) but it ends up being much dirtier than it should be.
Re: dataImportHandler: delta query fetching data, not just ids?
> I'm not sure why deltas were implemented this way. Possibly it was designed > to behave like some of our object-to-relational libraries? In any case, > there are 2 ways to do deltas and you just have to take your pick based on > what will work best for your situation. I wouldn't consider the > "command=full-import&clean=false" method a workaround but just a different > way to tackle the same problem. Yeah, I find the delta-update strategy a little strange as well. Problem with command=full-import&clean=false is that you can't handle removes nicely using that. If you use the actual delta-import and deletedPkQuery for that, you run into problems with last_index_time and miss either modifications or deletes. I'm handling that by creating a different entity config for updates (using command=full-import&clean=false) and deletes (using command=delta-import) but it ends up being much dirtier than it should be.
RE: dataImportHandler: delta query fetching data, not just ids?
Janne, You're correct on how the delta import works. You specify 3 queries: - deletedPkQuery = query should return all "id"s (only) of items that were deleted since the last run. - deltaQuery = query should return all "id"s (only) of items that were added/updated since the last run. - deltaImportQuery = query should return full data for ONE row with "where id='${dih.delta.id}'". When DIH runs, it executes the first 2 queries and puts all of the returned id's in memory in a Set or something. Then it does N selects on the deltaImportQuery, executing the query once per id. This is maybe a good way to do it if you're doing very frequent deltas and you only expect a small number of changed documents per run, but I personally use the alternate way (as you noted): http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport I'm not sure why deltas were implemented this way. Possibly it was designed to behave like some of our object-to-relational libraries? In any case, there are 2 ways to do deltas and you just have to take your pick based on what will work best for your situation. I wouldn't consider the "command=full-import&clean=false" method a workaround but just a different way to tackle the same problem. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: janne mattila [mailto:jannepostilis...@gmail.com] Sent: Tuesday, March 27, 2012 9:25 AM To: solr-user@lucene.apache.org Subject: dataImportHandler: delta query fetching data, not just ids? It seems that delta import works in 2 steps, first query fetches the ids of the modified entries, then second query fetches the actual data. I am aware that there's a workaround: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport But still, to clarify, and make sure I have up-to-date info how Solr works: 1. Is it possible to fetch the modified data with a single SQL query using deltaImportQuery, as in: deltaImportQuery="select * from item where last_modified > '${dataimporter.last_index_time}'"? 2. If not - what's the reason delta import is implemented like it is? Why split it in two queries? I would think having a single delta query that fetches the data would be kind of an "obvious" design unless there's something that calls for 2 separate queries...?
Re: dataImportHandler: delta query fetching data, not just ids?
How did it work before SOLR-811 update? I don't understand. Did it fetch delta data with two queries (1. gets ids, 2. gets data per each id) or did it fetch all delta data with a single query? On Tue, Mar 27, 2012 at 5:45 PM, Ahmet Arslan wrote: >> 2. If not - what's the reason delta import is implemented >> like it is? >> Why split it in two queries? I would think having a single >> delta query >> that fetches the data would be kind of an "obvious" design >> unless >> there's something that calls for 2 separate queries...? > > I think this is it? https://issues.apache.org/jira/browse/SOLR-811
Re: dataImportHandler: delta query fetching data, not just ids?
> 2. If not - what's the reason delta import is implemented > like it is? > Why split it in two queries? I would think having a single > delta query > that fetches the data would be kind of an "obvious" design > unless > there's something that calls for 2 separate queries...? I think this is it? https://issues.apache.org/jira/browse/SOLR-811
dataImportHandler: delta query fetching data, not just ids?
It seems that delta import works in 2 steps, first query fetches the ids of the modified entries, then second query fetches the actual data. I am aware that there's a workaround: http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport But still, to clarify, and make sure I have up-to-date info how Solr works: 1. Is it possible to fetch the modified data with a single SQL query using deltaImportQuery, as in: deltaImportQuery="select * from item where last_modified > '${dataimporter.last_index_time}'"? 2. If not - what's the reason delta import is implemented like it is? Why split it in two queries? I would think having a single delta query that fetches the data would be kind of an "obvious" design unless there's something that calls for 2 separate queries...?