Hi,
I use the DIH with RDBMS for indexing a large mysql database with
about 7 mill. entries.
Full index is working fine, in schema.xml I implemented a uniqueKey
field (which is of the type 'text').
I start queries with the dismax query handler, and get my results as
an php array.
Now, since the database entries change every second, I use the delta
query property to
a) delete documents from the index that have been deleted in the
database (there´s a table for deleted items) and
b) update documents in the index that have changed since the last
index (there´s a last_modified-column in a table for that).
From my understanding, when I start a delta-import, the DIH checks
the deletedPkQuery first and deletes the documents that should be
deleted (identified by the uniqueKey-field?).
Seems to work - the catalina.out says INFO: deleted from document to
Solr: 1851010 for example.
Next thing would be the deltaQuery. This seems to work, too - when
finished, a query returns the new database entries.
But (and here comes the problem):
The dataimport status always says Added / Changed x-hundred
documents, deleted 0 documents - no deletes?
Everytime I change an item in the database, and do a delta-import
after that, my next query will return that item *twice*.
After the next change and next delta-import solr will return *three*
result documents, and so on.
As I mentioned before, I get my search results as an array, consisting
of many arrays (= solr documents) with the fields I set in schema.xml.
After changing some documents and delta-indexing them, I get lots of
identical arrays (even the uniqueKey-field is absolutely identical).
I have read somewhere in the wiki, that an update is a delete of the
old document plus a new document.
I guess the problem could be that something fails with the delete-
process, but I don´t have a clue why.
Any ideas?
Thanks in advance
Chris