Re: Hierarchical xml

2009-12-02 Thread Age Jan Kuperus

Pooja Verlani wrote:

Hi,
I want to index an xml like following:


John
1979-29-17T28:14:48Z


   ABC College
   1998
 
 
   PQRS College
   2001
 
  
   XYZ College
   2003
 





At Wageningen UR Library our data is completely xml based. We are in the process of replacing 
Oracla Text by SOLR for our background engine.


This is how we would do it at Wageningen UR Library (actually I ran your example through a 
minimally modified version of the xslt we use for the transformation)


_id_ is the unique id derived from the outer element (we actually use it combined with an 
attribute here)
_data_ is a stored only field that reproduces the complete record (in escaped (or CDATA, which 
is identical at the solr level) form, because solr doesn't accept xml as data
all other fields names not ending in _s are text fields, representing all full and partial paths 
 to the data

the _s fields are string fields, copying the same data for faceting, sorting 
and (facet) filtering.




  
officer/

John
1979-29-17T28:14:48Z


   ABC College
   1998
 
 
   PQRS College
   2001
 
  
   XYZ College
   2003
 


John 1979-29-17T28:14:48Z ABC College 1998 PQRS College 2001 XYZ 
College 2003
John 

John 

John 

John 

1979-29-17T28:14:48Z 

1979-29-17T28:14:48Z 

1979-29-17T28:14:48Z 

1979-29-17T28:14:48Z 

ABC College 1998 PQRS College 2001 XYZ College 2003 

ABC College 1998 PQRS College 2001 XYZ College 2003 

ABC College 1998 PQRS College 2001 XYZ College 
2003
ABC College 1998 PQRS College 2001 XYZ College 
2003
ABC College 1998 

ABC College 1998 

ABC College 1998 

ABC College 1998 

ABC College 1998 

ABC College 1998 

ABC College 

ABC College 

ABC College 

ABC College 

ABC College 

ABC College 

ABC College 

ABC College 

1998 

1998 

1998 

1998 

1998 

1998 

1998 

1998 

PQRS College 2001 


PQRS College 2001
PQRS College 2001
PQRS College 2001
PQRS College 2001
PQRS College 2001
PQRS College
PQRS College
PQRS College
PQRS College
PQRS College
PQRS College
PQRS College
PQRS College
2001
2001
2001
2001
2001
2001
2001
2001
XYZ College 2003
XYZ College 2003
XYZ College 2003
XYZ College 2003
XYZ College 2003
XYZ College 2003
XYZ College
XYZ College
XYZ College
XYZ College
XYZ College
XYZ College
XYZ College
XYZ College
2003
2003
2003
2003
2003
2003
2003
2003
  



Age Jan Kuperus



Re: Hierarchical xml

2009-12-02 Thread Sascha Szott

Pooja,

have a look at Solr's DataImportHandler. XPathEntityProcessor [1] should 
suit your needs.


Best,
Sascha

[1] http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor

Pooja Verlani schrieb:

Hi,
I want to index an xml like following:


John
1979-29-17T28:14:48Z


   ABC College
   1998
 
 
   PQRS College
   2001
 
  
   XYZ College
   2003
 



I am not able to judge how should be the schema like?
Also, if I flatten such an xml and make collegename & year as multivalued
like this:
ABC College, PQRS College, XYZ College
1998,2001,2003

In such a scenario I can't make a coorespondence between ABC college & year
1998.

In case someone has an efficient way out, do share.
Thanks in anticipation.

Regards,
Pooja





Hierarchical xml

2009-12-01 Thread Pooja Verlani
Hi,
I want to index an xml like following:


John
1979-29-17T28:14:48Z


   ABC College
   1998
 
 
   PQRS College
   2001
 
  
   XYZ College
   2003
 



I am not able to judge how should be the schema like?
Also, if I flatten such an xml and make collegename & year as multivalued
like this:
ABC College, PQRS College, XYZ College
1998,2001,2003

In such a scenario I can't make a coorespondence between ABC college & year
1998.

In case someone has an efficient way out, do share.
Thanks in anticipation.

Regards,
Pooja