Re: Getting started with indexing a database
Hi Mike, Can you try removing ' from the nested entities? Just keep it in the top level entity. Regards, Rakesh Varna On Wed, Jan 11, 2012 at 7:26 AM, Gora Mohanty wrote: > On Tue, Jan 10, 2012 at 7:09 AM, Mike O'Leary wrote: > [...] > > My data-config.xml file looks like this: > > > > > > > url="jdbc:mysql://localhost:3306/bioscope" user="db_user" > password=""/> > > > > >deltaQuery="SELECT doc_id FROM bioscope.docs where > last_modified > '${dataimporter.last_index_time}'"> > > > > > > Your SELECT above does not include the field "type" > > >^^ This should be: WHERE id=='${docs.doc_id}' as 'id' is > what >you are selecting in this entity. > > Same issue for the second nested entity, i.e., replace doc_id= with id= > > Regards, > Gora >
Re: Getting started with indexing a database
On Tue, Jan 10, 2012 at 7:09 AM, Mike O'Leary wrote: [...] > My data-config.xml file looks like this: > > > url="jdbc:mysql://localhost:3306/bioscope" user="db_user" > password=""/> > > deltaQuery="SELECT doc_id FROM bioscope.docs where last_modified > > '${dataimporter.last_index_time}'"> > > Your SELECT above does not include the field "type" >
Re: Getting started with indexing a database
I'm not going to be much help here since DIH is a mystery to me, I usually go with a SolrJ program when DIH gets beyond simple cases. But have you seen: http://wiki.apache.org/solr/DataImportHandler#interactive It's a tool that helps you see what's going on with your query. Best Erick On Mon, Jan 9, 2012 at 8:39 PM, Mike O'Leary wrote: > I am trying to index the contents of a database for the first time, and I am > only getting the primary key of the table represented by the top level entity > in my data-config.xml file to be indexed. The database I am starting with has > three tables: > > The table called docs has columns called doc_id, type and last_modified. The > primary key is doc_id. > The table called codes has columns called id, doc_id, origin, type, code and > last_modified. The primary key is id. doc_id is a foreign key to the doc_id > column in the docs table. > The table called texts has columns called id, doc_id, origin, type, text and > last_modified. The primary key is id. doc_id is a foreign key to the doc_id > column in the docs table. > > My data-config.xml file looks like this: > > > url="jdbc:mysql://localhost:3306/bioscope" user="db_user" > password=""/> > > deltaQuery="SELECT doc_id FROM bioscope.docs where last_modified > > '${dataimporter.last_index_time}'"> > > > deltaQuery="SELECT doc_id FROM bioscope.codes WHERE > last_modified > '${dataimporter.last_index_time}'" > parentDeltaQuery="SELECT doc_id from bioscope.docs WHERE > doc_id='${codes.doc_id}'"> > > > > > > > deltaQuery="SELECT doc_id FROM bioscope.texts WHERE > last_modified > '${dataimporter.last_index_time}'" > parentDeltaQuery="SELECT doc_id from bioscope.docs WHERE > doc_id='${texts.doc_id}'"> > > > > > > > > > > > I added these lines to the schema.xml file: > > stored="true"/> > stored="true"/> > > stored="true"/> > stored="true"/> > stored="true"/> > stored="true"/> > > stored="true"/> > stored="true"/> > stored="true"/> > stored="true"/> > > ... > > DOC_ID > NOTE_TEXT > > When I run the full-import operation, only the DOC_ID values are written to > the index. When I run a program that dumps the index contents as an xml > string, the output looks like this: > > > > > > > > > > > > ... > > > Since this is new to me, I am sure that I have simply left something out or > specified something the wrong way, but I haven't been able to spot what I > have been doing wrong when I have gone over the configuration files that I am > using. Can anyone help me figure out why the other database contents are not > being indexed? > Thanks, > Mike >
Getting started with indexing a database
I am trying to index the contents of a database for the first time, and I am only getting the primary key of the table represented by the top level entity in my data-config.xml file to be indexed. The database I am starting with has three tables: The table called docs has columns called doc_id, type and last_modified. The primary key is doc_id. The table called codes has columns called id, doc_id, origin, type, code and last_modified. The primary key is id. doc_id is a foreign key to the doc_id column in the docs table. The table called texts has columns called id, doc_id, origin, type, text and last_modified. The primary key is id. doc_id is a foreign key to the doc_id column in the docs table. My data-config.xml file looks like this: I added these lines to the schema.xml file: ... DOC_ID NOTE_TEXT When I run the full-import operation, only the DOC_ID values are written to the index. When I run a program that dumps the index contents as an xml string, the output looks like this: ... Since this is new to me, I am sure that I have simply left something out or specified something the wrong way, but I haven't been able to spot what I have been doing wrong when I have gone over the configuration files that I am using. Can anyone help me figure out why the other database contents are not being indexed? Thanks, Mike