Re: Getting started with indexing a database

2012-01-15 Thread Rakesh Varna
Hi Mike,
   Can you try removing '  from the
nested entities? Just keep it in the top level entity.

Regards,
Rakesh Varna

On Wed, Jan 11, 2012 at 7:26 AM, Gora Mohanty  wrote:

> On Tue, Jan 10, 2012 at 7:09 AM, Mike O'Leary  wrote:
> [...]
> > My data-config.xml file looks like this:
> >
> > 
> >   >  url="jdbc:mysql://localhost:3306/bioscope" user="db_user"
> password=""/>
> >  
> > >deltaQuery="SELECT doc_id FROM bioscope.docs where
> last_modified > '${dataimporter.last_index_time}'">
> >  
> >  
>
> Your SELECT above does not include the field "type"
>
> >^^ This should be: WHERE id=='${docs.doc_id}' as 'id' is
> what
>you are selecting in this entity.
>
> Same issue for the second nested entity, i.e., replace doc_id= with id=
>
> Regards,
> Gora
>


Re: Getting started with indexing a database

2012-01-11 Thread Gora Mohanty
On Tue, Jan 10, 2012 at 7:09 AM, Mike O'Leary  wrote:
[...]
> My data-config.xml file looks like this:
>
> 
>                url="jdbc:mysql://localhost:3306/bioscope" user="db_user" 
> password=""/>
>  
>                deltaQuery="SELECT doc_id FROM bioscope.docs where last_modified > 
> '${dataimporter.last_index_time}'">
>      
>      

Your SELECT above does not include the field "type"

>      

Re: Getting started with indexing a database

2012-01-11 Thread Erick Erickson
I'm not going to be much help here since DIH is a mystery to me, I usually go
with a SolrJ program when DIH gets beyond simple cases. But have you
seen:
http://wiki.apache.org/solr/DataImportHandler#interactive

It's a tool that helps you see what's going on with your query.

Best
Erick

On Mon, Jan 9, 2012 at 8:39 PM, Mike O'Leary  wrote:
> I am trying to index the contents of a database for the first time, and I am 
> only getting the primary key of the table represented by the top level entity 
> in my data-config.xml file to be indexed. The database I am starting with has 
> three tables:
>
> The table called docs has columns called doc_id, type and last_modified. The 
> primary key is doc_id.
> The table called codes has columns called id, doc_id, origin, type, code and 
> last_modified. The primary key is id. doc_id is a foreign key to the doc_id 
> column in the docs table.
> The table called texts has columns called id, doc_id, origin, type, text and 
> last_modified. The primary key is id. doc_id is a foreign key to the doc_id 
> column in the docs table.
>
> My data-config.xml file looks like this:
>
> 
>                url="jdbc:mysql://localhost:3306/bioscope" user="db_user" 
> password=""/>
>  
>                deltaQuery="SELECT doc_id FROM bioscope.docs where last_modified > 
> '${dataimporter.last_index_time}'">
>      
>      
>                    deltaQuery="SELECT doc_id FROM bioscope.codes WHERE 
> last_modified > '${dataimporter.last_index_time}'"
>              parentDeltaQuery="SELECT doc_id from bioscope.docs WHERE 
> doc_id='${codes.doc_id}'">
>        
>        
>        
>        
>        
>      
>                    deltaQuery="SELECT doc_id FROM bioscope.texts WHERE 
> last_modified > '${dataimporter.last_index_time}'"
>              parentDeltaQuery="SELECT doc_id from bioscope.docs WHERE 
> doc_id='${texts.doc_id}'">
>        
>        
>        
>        
>        
>      
>    
>  
> 
>
> I added these lines to the schema.xml file:
>
>  stored="true"/>
>  stored="true"/>
>
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>
> ...
>
> DOC_ID
> NOTE_TEXT
>
> When I run the full-import operation, only the DOC_ID values are written to 
> the index. When I run a program that dumps the index contents as an xml 
> string, the output looks like this:
>
> 
> 
>  
>    
>    
>  
>  
>    
>    
>  
> ...
> 
>
> Since this is new to me, I am sure that I have simply left something out or 
> specified something the wrong way, but I haven't been able to spot what I 
> have been doing wrong when I have gone over the configuration files that I am 
> using. Can anyone help me figure out why the other database contents are not 
> being indexed?
> Thanks,
> Mike
>


Getting started with indexing a database

2012-01-09 Thread Mike O'Leary
I am trying to index the contents of a database for the first time, and I am 
only getting the primary key of the table represented by the top level entity 
in my data-config.xml file to be indexed. The database I am starting with has 
three tables:

The table called docs has columns called doc_id, type and last_modified. The 
primary key is doc_id.
The table called codes has columns called id, doc_id, origin, type, code and 
last_modified. The primary key is id. doc_id is a foreign key to the doc_id 
column in the docs table.
The table called texts has columns called id, doc_id, origin, type, text and 
last_modified. The primary key is id. doc_id is a foreign key to the doc_id 
column in the docs table.

My data-config.xml file looks like this:


  
  

  
  
  





  
  





  

  


I added these lines to the schema.xml file:














...

DOC_ID
NOTE_TEXT

When I run the full-import operation, only the DOC_ID values are written to the 
index. When I run a program that dumps the index contents as an xml string, the 
output looks like this:



  


  
  


  
...


Since this is new to me, I am sure that I have simply left something out or 
specified something the wrong way, but I haven't been able to spot what I have 
been doing wrong when I have gone over the configuration files that I am using. 
Can anyone help me figure out why the other database contents are not being 
indexed?
Thanks,
Mike