Re: xpath processing

2010-11-02 Thread pghorpade


?xml version=1.0 encoding=UTF-8?
mods:mods xmlns:mods=http://www.loc.gov/mods/v3;  
xmlns:xlink=http://www.w3.org/1999/xlink;  
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;  
xsi:schemaLocation=http://www.loc.gov/mods/v3 
http://www.loc.gov/standards/mods/v3/mods-3-0.xsd;

mods:titleInfo
mods:titleAny place I hang my hat is home/mods:title
/mods:titleInfo
mods:titleInfo type=uniform
mods:titleSt. Louis woman/mods:title
mods:partNameAny place I hang my hat is home/mods:partName
/mods:titleInfo
mods:titleInfo type=alternative
mods:titleFree an' easy that's my style/mods:title
/mods:titleInfo
mods:name type=personal
mods:namePartArlen, Harold/mods:namePart
mods:namePart type=date1905-1986/mods:namePart
mods:role
mods:roleTerm authority=marcrelator  
type=textcreator/mods:roleTerm

/mods:role
/mods:name
mods:name type=personal
mods:namePartMercer, Johnny/mods:namePart
mods:namePart type=date1909-/mods:namePart
/mods:name
mods:name type=personal
mods:namePartDavison, R./mods:namePart
/mods:name
mods:name type=personal
mods:namePartBontemps, Arna Wendell/mods:namePart
mods:namePart type=date1902-1973/mods:namePart
/mods:name
mods:name type=personal
mods:namePartCullen, Countee/mods:namePart
mods:namePart type=date1903-1946/mods:namePart
/mods:name
mods:typeOfResourcenotated music/mods:typeOfResource
mods:originInfo
mods:place
mods:placeTerm authority=marccountry  
type=codenyu/mods:placeTerm

/mods:place
mods:place
mods:placeTerm type=textNew York/mods:placeTerm
/mods:place
mods:publisherDe Sylva, Brown amp; Henderson, Inc./mods:publisher
mods:dateIssuedc1946/mods:dateIssued
mods:dateIssued encoding=marc1946/mods:dateIssued
mods:issuancemonographic/mods:issuance
mods:dateOther type=normalized1946/mods:dateOther
mods:dateOther type=normalized1946/mods:dateOther
/mods:originInfo
mods:language
mods:languageTerm authority=iso639-2b  
type=codeeng/mods:languageTerm

/mods:language
mods:physicalDescription
mods:form authority=marcformprint/mods:form
mods:extent1 vocal score (5 p.) : ill. ; 31 cm./mods:extent
/mods:physicalDescription
mods:note type=statement of responsibilitymusic by Harold  
Arlen ; lyrics by Johnny Mercer./mods:note

mods:noteFor voice and piano./mods:note
mods:noteIncludes chord symbols./mods:note
mods:noteIllustration by R. Davison./mods:note
mods:noteFirst line: Free an' easy that's my style./mods:note
mods:noteEdward Gross presents St. Louis Woman ... Book by  
Arna Bontemps amp; Countee Cullen -- Cover./mods:note

mods:notePublisher's advertising includes musical incipits./mods:note
mods:subject authority=lcsh
mods:topicMotion picture music/mods:topic
mods:topicExcerpts/mods:topic
mods:topicVocal scores with piano/mods:topic
/mods:subject
mods:classification authority=lccM1 .S8/mods:classification
mods:identifier type=music plate1403-4 De Sylva, Brown  
Henderson, Inc./mods:identifier

mods:location
mods:physicalLocationLilly Library, Indiana University  
Bloomington/mods:physicalLocation

/mods:location
mods:recordInfo
mods:recordContentSource  
authority=marcorgIUL/mods:recordContentSource
mods:recordCreationDate  
encoding=marc990316/mods:recordCreationDate

mods:recordIdentifierLL-SSM-ALC4888/mods:recordIdentifier
/mods:recordInfo
/mods:mods

Above is my sample xml

dataConfig
dataSource name=myfilereader type=FileDataSource/
document
entity name=f rootEntity=false dataSource=null  
processor=FileListEntityProcessor fileName=.*xml recursive=true  
baseDir=C:\test_xml
entity name=x dataSource=myfilereader  
processor=XPathEntityProcessor url=${f.fileAbsolutePath}  
stream=false forEach=/mods  
transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer

field column=id template=${f.file}/
field column=collectionKey template=uw/
field column=collectionName template=University of Washington  
Pacific Northwest Sheet Music Collection/

field column=fileAbsolutePath template=${f.fileAbsolutePath}/
field column=fileName template=${f.file}/
field column=fileSize template=${f.fileSize}/
field column=fileLastModified template=${f.fileLastModified}/
field column=nameNamePart_keyword xpath=/mods/name/namepa...@type  
!= 'date']/

/entity
/entity
/document
/dataConfig

above is the data config file
The namePart element in the above xml may or may not have type attribute.
How can i get data from the namePart element which has no type attribute?
xpath=/mods/name/namepa...@type != 'date'] This is not working. I  
dont get any errors ,There is no namePart_keyword in the index.



Quoting Ken Stanley doh...@gmail.com:


Re: xpath processing

2010-11-02 Thread Lance Norskog
The XPathEP has the option to run a real XSL script at some point in
its processing chain. I guess you could make an XSL that pulls your
fields out into a simpler XML in the /a/b/c format that the XPath
parser supports.



On Tue, Nov 2, 2010 at 5:37 PM,  pghorp...@ucla.edu wrote:

 ?xml version=1.0 encoding=UTF-8?
 mods:mods xmlns:mods=http://www.loc.gov/mods/v3;
 xmlns:xlink=http://www.w3.org/1999/xlink;
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
 xsi:schemaLocation=http://www.loc.gov/mods/v3
  http://www.loc.gov/standards/mods/v3/mods-3-0.xsd;
    mods:titleInfo
        mods:titleAny place I hang my hat is home/mods:title
    /mods:titleInfo
    mods:titleInfo type=uniform
        mods:titleSt. Louis woman/mods:title
        mods:partNameAny place I hang my hat is home/mods:partName
    /mods:titleInfo
    mods:titleInfo type=alternative
        mods:titleFree an' easy that's my style/mods:title
    /mods:titleInfo
    mods:name type=personal
        mods:namePartArlen, Harold/mods:namePart
        mods:namePart type=date1905-1986/mods:namePart
        mods:role
            mods:roleTerm authority=marcrelator
 type=textcreator/mods:roleTerm
        /mods:role
    /mods:name
    mods:name type=personal
        mods:namePartMercer, Johnny/mods:namePart
        mods:namePart type=date1909-/mods:namePart
    /mods:name
    mods:name type=personal
        mods:namePartDavison, R./mods:namePart
    /mods:name
    mods:name type=personal
        mods:namePartBontemps, Arna Wendell/mods:namePart
        mods:namePart type=date1902-1973/mods:namePart
    /mods:name
    mods:name type=personal
        mods:namePartCullen, Countee/mods:namePart
        mods:namePart type=date1903-1946/mods:namePart
    /mods:name
    mods:typeOfResourcenotated music/mods:typeOfResource
    mods:originInfo
        mods:place
            mods:placeTerm authority=marccountry
 type=codenyu/mods:placeTerm
        /mods:place
        mods:place
            mods:placeTerm type=textNew York/mods:placeTerm
        /mods:place
        mods:publisherDe Sylva, Brown amp; Henderson,
 Inc./mods:publisher
        mods:dateIssuedc1946/mods:dateIssued
        mods:dateIssued encoding=marc1946/mods:dateIssued
        mods:issuancemonographic/mods:issuance
        mods:dateOther type=normalized1946/mods:dateOther
        mods:dateOther type=normalized1946/mods:dateOther
    /mods:originInfo
    mods:language
        mods:languageTerm authority=iso639-2b
 type=codeeng/mods:languageTerm
    /mods:language
    mods:physicalDescription
        mods:form authority=marcformprint/mods:form
        mods:extent1 vocal score (5 p.) : ill. ; 31 cm./mods:extent
    /mods:physicalDescription
    mods:note type=statement of responsibilitymusic by Harold Arlen ;
 lyrics by Johnny Mercer./mods:note
    mods:noteFor voice and piano./mods:note
    mods:noteIncludes chord symbols./mods:note
    mods:noteIllustration by R. Davison./mods:note
    mods:noteFirst line: Free an' easy that's my style./mods:note
    mods:noteEdward Gross presents St. Louis Woman ... Book by Arna
 Bontemps amp; Countee Cullen -- Cover./mods:note
    mods:notePublisher's advertising includes musical incipits./mods:note
    mods:subject authority=lcsh
        mods:topicMotion picture music/mods:topic
        mods:topicExcerpts/mods:topic
        mods:topicVocal scores with piano/mods:topic
    /mods:subject
    mods:classification authority=lccM1 .S8/mods:classification
    mods:identifier type=music plate1403-4 De Sylva, Brown Henderson,
 Inc./mods:identifier
    mods:location
        mods:physicalLocationLilly Library, Indiana University
 Bloomington/mods:physicalLocation
    /mods:location
    mods:recordInfo
        mods:recordContentSource
 authority=marcorgIUL/mods:recordContentSource
        mods:recordCreationDate
 encoding=marc990316/mods:recordCreationDate
        mods:recordIdentifierLL-SSM-ALC4888/mods:recordIdentifier
    /mods:recordInfo
 /mods:mods

 Above is my sample xml

 dataConfig
 dataSource name=myfilereader type=FileDataSource/
 document
 entity name=f rootEntity=false dataSource=null
 processor=FileListEntityProcessor fileName=.*xml recursive=true
 baseDir=C:\test_xml
 entity name=x dataSource=myfilereader processor=XPathEntityProcessor
 url=${f.fileAbsolutePath} stream=false forEach=/mods
 transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer
 field column=id template=${f.file}/
 field column=collectionKey template=uw/
 field column=collectionName template=University of Washington Pacific
 Northwest Sheet Music Collection/
 field column=fileAbsolutePath template=${f.fileAbsolutePath}/
 field column=fileName template=${f.file}/
 field column=fileSize template=${f.fileSize}/
 field column=fileLastModified template=${f.fileLastModified}/
 field column=nameNamePart_keyword xpath=/mods/name/namepa...@type !=
 'date']/
 /entity
 /entity
 /document
 /dataConfig

 above is the data config file
 The namePart element in the above xml may or may not 

Re: xpath processing

2010-10-27 Thread Lance Norskog
The XPathEntityProcessor does not do full XPath. It is a very limited 
set intended to be very fast.
You can add code in any scripting language, but that is not really 
performant.
Is it possible to use the RegexTransformer to find your records with 
regular expressions?


Ken Stanley wrote:

On Fri, Oct 22, 2010 at 11:52 PM,pghorp...@ucla.edu  wrote:

   


dataConfig
dataSource name=myfilereader type=FileDataSource/
document
entity name=f rootEntity=false dataSource=null
processor=FileListEntityProcessor fileName=.*xml recursive=true
baseDir=C:\data\sample_records\mods\starr
entity name=x dataSource=myfilereader processor=XPathEntityProcessor
url=${f.fileAbsolutePath} stream=false forEach=/mods
transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer
field column=id template=${f.file}/
field column=collectionKey template=starr/
field column=collectionName template=starr/
field column=fileAbsolutePath template=${f.fileAbsolutePath}/
field column=fileName template=${f.file}/
field column=fileSize template=${f.fileSize}/
field column=fileLastModified template=${f.fileLastModified}/
field column=classification_keyword xpath=/mods/classification/
field column=accessCondition_keyword xpath=/mods/accessCondition/
field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date']
/
/entity
/entity
/document
/dataConfig
 


The documentation says you don't need a dataSource for your
XPathEntityProcessor entity; in my configuration, I have mine set to the
name of the top-level FileListEntityProcessor. Everything else looks fine.
Can you provide one record from your data? Also, are you getting any errors
in your log?

- Ken

   


Re: xpath processing

2010-10-23 Thread Ben Boggess
 processor=FileListEntityProcessor fileName=.*xml recursive=true 

Shouldn't this be fileName=*.xml?

Ben

On Oct 22, 2010, at 10:52 PM, pghorp...@ucla.edu wrote:

 
 
 dataConfig
 dataSource name=myfilereader type=FileDataSource/
 document
 entity name=f rootEntity=false dataSource=null 
 processor=FileListEntityProcessor fileName=.*xml recursive=true 
 baseDir=C:\data\sample_records\mods\starr
 entity name=x dataSource=myfilereader processor=XPathEntityProcessor 
 url=${f.fileAbsolutePath} stream=false forEach=/mods 
 transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer
 field column=id template=${f.file}/
 field column=collectionKey template=starr/
 field column=collectionName template=starr/
 field column=fileAbsolutePath template=${f.fileAbsolutePath}/
 field column=fileName template=${f.file}/
 field column=fileSize template=${f.fileSize}/
 field column=fileLastModified template=${f.fileLastModified}/
 field column=classification_keyword xpath=/mods/classification/
 field column=accessCondition_keyword xpath=/mods/accessCondition/
 field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date']
 /
 /entity
 /entity
 /document
 /dataConfig
 
 Quoting Ken Stanley doh...@gmail.com:
 
 Parinita,
 
 In its simplest form, what does your entity definition for DIH look like;
 also, what does one record from your xml look like? We need more information
 before we can really be of any help. :)
 
 - Ken
 
 It looked like something resembling white marble, which was
 probably what it was: something resembling white marble.
-- Douglas Adams, The Hitchhikers Guide to the Galaxy
 
 
 On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote:
 
 Quoting pghorp...@ucla.edu:
 Can someone help me please?
 
 
 I am trying to import mods xml data in solr using  the xml/http datasource
 
 This does not work with XPathEntityProcessor of the data import handler
 xpath=/mods/name/namepa...@type = 'date']
 
 I actually have 143 records with type attribute as 'date' for element
 namePart.
 
 Thank you
 Parinita
 
 
 
 
 
 
 


Re: xpath processing

2010-10-23 Thread Ken Stanley
On Fri, Oct 22, 2010 at 11:52 PM, pghorp...@ucla.edu wrote:



 dataConfig
 dataSource name=myfilereader type=FileDataSource/
 document
 entity name=f rootEntity=false dataSource=null
 processor=FileListEntityProcessor fileName=.*xml recursive=true
 baseDir=C:\data\sample_records\mods\starr
 entity name=x dataSource=myfilereader processor=XPathEntityProcessor
 url=${f.fileAbsolutePath} stream=false forEach=/mods
 transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer
 field column=id template=${f.file}/
 field column=collectionKey template=starr/
 field column=collectionName template=starr/
 field column=fileAbsolutePath template=${f.fileAbsolutePath}/
 field column=fileName template=${f.file}/
 field column=fileSize template=${f.fileSize}/
 field column=fileLastModified template=${f.fileLastModified}/
 field column=classification_keyword xpath=/mods/classification/
 field column=accessCondition_keyword xpath=/mods/accessCondition/
 field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date']
 /
 /entity
 /entity
 /document
 /dataConfig


The documentation says you don't need a dataSource for your
XPathEntityProcessor entity; in my configuration, I have mine set to the
name of the top-level FileListEntityProcessor. Everything else looks fine.
Can you provide one record from your data? Also, are you getting any errors
in your log?

- Ken


Re: xpath processing

2010-10-22 Thread pghorpade

Quoting pghorp...@ucla.edu:
Can someone help me please?


I am trying to import mods xml data in solr using  the xml/http datasource

This does not work with XPathEntityProcessor of the data import handler
xpath=/mods/name/namepa...@type = 'date']

I actually have 143 records with type attribute as 'date' for  
element namePart.


Thank you
Parinita






Re: xpath processing

2010-10-22 Thread Ken Stanley
Parinita,

In its simplest form, what does your entity definition for DIH look like;
also, what does one record from your xml look like? We need more information
before we can really be of any help. :)

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, The Hitchhikers Guide to the Galaxy


On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote:

 Quoting pghorp...@ucla.edu:
 Can someone help me please?


 I am trying to import mods xml data in solr using  the xml/http datasource

 This does not work with XPathEntityProcessor of the data import handler
 xpath=/mods/name/namepa...@type = 'date']

 I actually have 143 records with type attribute as 'date' for element
 namePart.

 Thank you
 Parinita






Re: xpath processing

2010-10-22 Thread pghorpade



dataConfig
dataSource name=myfilereader type=FileDataSource/
document
entity name=f rootEntity=false dataSource=null  
processor=FileListEntityProcessor fileName=.*xml recursive=true  
baseDir=C:\data\sample_records\mods\starr
entity name=x dataSource=myfilereader  
processor=XPathEntityProcessor url=${f.fileAbsolutePath}  
stream=false forEach=/mods  
transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer

field column=id template=${f.file}/
field column=collectionKey template=starr/
field column=collectionName template=starr/
field column=fileAbsolutePath template=${f.fileAbsolutePath}/
field column=fileName template=${f.file}/
field column=fileSize template=${f.fileSize}/
field column=fileLastModified template=${f.fileLastModified}/
field column=classification_keyword xpath=/mods/classification/
field column=accessCondition_keyword xpath=/mods/accessCondition/
field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date']
/
/entity
/entity
/document
/dataConfig

Quoting Ken Stanley doh...@gmail.com:


Parinita,

In its simplest form, what does your entity definition for DIH look like;
also, what does one record from your xml look like? We need more information
before we can really be of any help. :)

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
-- Douglas Adams, The Hitchhikers Guide to the Galaxy


On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote:


Quoting pghorp...@ucla.edu:
Can someone help me please?



I am trying to import mods xml data in solr using  the xml/http datasource

This does not work with XPathEntityProcessor of the data import handler
xpath=/mods/name/namepa...@type = 'date']

I actually have 143 records with type attribute as 'date' for element
namePart.

Thank you
Parinita












RE: XPath Processing Applied to Clob

2010-03-18 Thread Craig Christman
You could also do the xpath processing on the oracle end using the extract or 
extractValue functions.  Here's a good reference:  
http://www.psoug.org/reference/xml_functions.html


-Original Message-
From: Neil Chaudhuri [mailto:nchaudh...@potomacfusion.com]
Sent: Wednesday, March 17, 2010 3:24 PM
To: solr-user@lucene.apache.org
Subject: XPath Processing Applied to Clob

I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
and the text of a document. This is an Oracle database, and the document is an 
XML document stored as Oracle's xmltype data type. Since this is nothing more 
than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. 
However, I don't want to index/store all the XML but instead just the XML 
within a set of tags. The XPath itself is trivial, but it seems like the 
XPathEntityProcessor only works for XML file content rather than the output of 
a Transformer.

Here is what I currently have that fails:


document

entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer

field column=EFFECTIVE_DT name=effectiveDate /

field column=ARCHIVE_ID name=id /

field column=TEXT name=text clob=true
entity name=text processor=XPathEntityProcessor 
forEach=/MESSAGE url=${doc.text}
field column=body xpath=//BODY/

/entity

/entity

/document


Is there an easy way to do this without writing my own custom transformer?

Thanks.


RE: XPath Processing Applied to Clob

2010-03-17 Thread Neil Chaudhuri
Incidentally, I tried adding this:

datasource name=f type=FieldReaderDataSource /
document
entity dataSource=f processor=XPathEntityProcessor 
dataField=d.text forEach=/MESSAGE
  field column=body xpath=//BODY/
/entity
/document

But this didn't seem to change anything.

Any insight is appreciated.

Thanks.



From: Neil Chaudhuri
Sent: Wednesday, March 17, 2010 3:24 PM
To: solr-user@lucene.apache.org
Subject: XPath Processing Applied to Clob

I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
and the text of a document. This is an Oracle database, and the document is an 
XML document stored as Oracle's xmltype data type. Since this is nothing more 
than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. 
However, I don't want to index/store all the XML but instead just the XML 
within a set of tags. The XPath itself is trivial, but it seems like the 
XPathEntityProcessor only works for XML file content rather than the output of 
a Transformer.

Here is what I currently have that fails:


document

entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer

field column=EFFECTIVE_DT name=effectiveDate /

field column=ARCHIVE_ID name=id /

field column=TEXT name=text clob=true
entity name=text processor=XPathEntityProcessor 
forEach=/MESSAGE url=${doc.text}
field column=body xpath=//BODY/

/entity

/entity

/document


Is there an easy way to do this without writing my own custom transformer?

Thanks.


Re: XPath Processing Applied to Clob

2010-03-17 Thread Lance Norskog
The XPath parser in the DIH is a limited implementation. The unit test
program is the only enumeration (that I can find) of what it handles:

http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/dataimporthandler/src/test/java/org/apache/solr/handler/dataimport/TestXPathRecordReader.java

//BODY in fact is not allowed, and should throw an Exception. Or at
least some kind of error message. Perhaps there is one in the logs?


On Wed, Mar 17, 2010 at 2:45 PM, Neil Chaudhuri
nchaudh...@potomacfusion.com wrote:
 Incidentally, I tried adding this:

 datasource name=f type=FieldReaderDataSource /
 document
        entity dataSource=f processor=XPathEntityProcessor 
 dataField=d.text forEach=/MESSAGE
                  field column=body xpath=//BODY/
        /entity
 /document

 But this didn't seem to change anything.

 Any insight is appreciated.

 Thanks.



 From: Neil Chaudhuri
 Sent: Wednesday, March 17, 2010 3:24 PM
 To: solr-user@lucene.apache.org
 Subject: XPath Processing Applied to Clob

 I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
 and the text of a document. This is an Oracle database, and the document is 
 an XML document stored as Oracle's xmltype data type. Since this is nothing 
 more than a fancy CLOB, I am using the ClobTransformer to extract the actual 
 XML. However, I don't want to index/store all the XML but instead just the 
 XML within a set of tags. The XPath itself is trivial, but it seems like the 
 XPathEntityProcessor only works for XML file content rather than the output 
 of a Transformer.

 Here is what I currently have that fails:


 document

        entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
 d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer

            field column=EFFECTIVE_DT name=effectiveDate /

            field column=ARCHIVE_ID name=id /

            field column=TEXT name=text clob=true
            entity name=text processor=XPathEntityProcessor 
 forEach=/MESSAGE url=${doc.text}
                field column=body xpath=//BODY/

            /entity

        /entity

 /document


 Is there an easy way to do this without writing my own custom transformer?

 Thanks.




-- 
Lance Norskog
goks...@gmail.com


Re: XPath Processing Applied to Clob

2010-03-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
keep in mind that the xpath is case-sensitive. paste a sample xml

what is dataField=d.text  it does not seem to refer to anything.
where is the enclosing entity?
did you mean dataField=doc.text.

xpath=//BODY is a supported syntax as long as you are using Solr1.4 or higher




On Thu, Mar 18, 2010 at 3:15 AM, Neil Chaudhuri
nchaudh...@potomacfusion.com wrote:
 Incidentally, I tried adding this:

 datasource name=f type=FieldReaderDataSource /
 document
        entity dataSource=f processor=XPathEntityProcessor 
 dataField=d.text forEach=/MESSAGE
                  field column=body xpath=//BODY/
        /entity
 /document

 But this didn't seem to change anything.

 Any insight is appreciated.

 Thanks.



 From: Neil Chaudhuri
 Sent: Wednesday, March 17, 2010 3:24 PM
 To: solr-user@lucene.apache.org
 Subject: XPath Processing Applied to Clob

 I am using the DataImportHandler to index 3 fields in a table: an id, a date, 
 and the text of a document. This is an Oracle database, and the document is 
 an XML document stored as Oracle's xmltype data type. Since this is nothing 
 more than a fancy CLOB, I am using the ClobTransformer to extract the actual 
 XML. However, I don't want to index/store all the XML but instead just the 
 XML within a set of tags. The XPath itself is trivial, but it seems like the 
 XPathEntityProcessor only works for XML file content rather than the output 
 of a Transformer.

 Here is what I currently have that fails:


 document

        entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, 
 d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer

            field column=EFFECTIVE_DT name=effectiveDate /

            field column=ARCHIVE_ID name=id /

            field column=TEXT name=text clob=true
            entity name=text processor=XPathEntityProcessor 
 forEach=/MESSAGE url=${doc.text}
                field column=body xpath=//BODY/

            /entity

        /entity

 /document


 Is there an easy way to do this without writing my own custom transformer?

 Thanks.




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com