Re: xpath processing
?xml version=1.0 encoding=UTF-8? mods:mods xmlns:mods=http://www.loc.gov/mods/v3; xmlns:xlink=http://www.w3.org/1999/xlink; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-0.xsd; mods:titleInfo mods:titleAny place I hang my hat is home/mods:title /mods:titleInfo mods:titleInfo type=uniform mods:titleSt. Louis woman/mods:title mods:partNameAny place I hang my hat is home/mods:partName /mods:titleInfo mods:titleInfo type=alternative mods:titleFree an' easy that's my style/mods:title /mods:titleInfo mods:name type=personal mods:namePartArlen, Harold/mods:namePart mods:namePart type=date1905-1986/mods:namePart mods:role mods:roleTerm authority=marcrelator type=textcreator/mods:roleTerm /mods:role /mods:name mods:name type=personal mods:namePartMercer, Johnny/mods:namePart mods:namePart type=date1909-/mods:namePart /mods:name mods:name type=personal mods:namePartDavison, R./mods:namePart /mods:name mods:name type=personal mods:namePartBontemps, Arna Wendell/mods:namePart mods:namePart type=date1902-1973/mods:namePart /mods:name mods:name type=personal mods:namePartCullen, Countee/mods:namePart mods:namePart type=date1903-1946/mods:namePart /mods:name mods:typeOfResourcenotated music/mods:typeOfResource mods:originInfo mods:place mods:placeTerm authority=marccountry type=codenyu/mods:placeTerm /mods:place mods:place mods:placeTerm type=textNew York/mods:placeTerm /mods:place mods:publisherDe Sylva, Brown amp; Henderson, Inc./mods:publisher mods:dateIssuedc1946/mods:dateIssued mods:dateIssued encoding=marc1946/mods:dateIssued mods:issuancemonographic/mods:issuance mods:dateOther type=normalized1946/mods:dateOther mods:dateOther type=normalized1946/mods:dateOther /mods:originInfo mods:language mods:languageTerm authority=iso639-2b type=codeeng/mods:languageTerm /mods:language mods:physicalDescription mods:form authority=marcformprint/mods:form mods:extent1 vocal score (5 p.) : ill. ; 31 cm./mods:extent /mods:physicalDescription mods:note type=statement of responsibilitymusic by Harold Arlen ; lyrics by Johnny Mercer./mods:note mods:noteFor voice and piano./mods:note mods:noteIncludes chord symbols./mods:note mods:noteIllustration by R. Davison./mods:note mods:noteFirst line: Free an' easy that's my style./mods:note mods:noteEdward Gross presents St. Louis Woman ... Book by Arna Bontemps amp; Countee Cullen -- Cover./mods:note mods:notePublisher's advertising includes musical incipits./mods:note mods:subject authority=lcsh mods:topicMotion picture music/mods:topic mods:topicExcerpts/mods:topic mods:topicVocal scores with piano/mods:topic /mods:subject mods:classification authority=lccM1 .S8/mods:classification mods:identifier type=music plate1403-4 De Sylva, Brown Henderson, Inc./mods:identifier mods:location mods:physicalLocationLilly Library, Indiana University Bloomington/mods:physicalLocation /mods:location mods:recordInfo mods:recordContentSource authority=marcorgIUL/mods:recordContentSource mods:recordCreationDate encoding=marc990316/mods:recordCreationDate mods:recordIdentifierLL-SSM-ALC4888/mods:recordIdentifier /mods:recordInfo /mods:mods Above is my sample xml dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=f rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=.*xml recursive=true baseDir=C:\test_xml entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${f.fileAbsolutePath} stream=false forEach=/mods transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer field column=id template=${f.file}/ field column=collectionKey template=uw/ field column=collectionName template=University of Washington Pacific Northwest Sheet Music Collection/ field column=fileAbsolutePath template=${f.fileAbsolutePath}/ field column=fileName template=${f.file}/ field column=fileSize template=${f.fileSize}/ field column=fileLastModified template=${f.fileLastModified}/ field column=nameNamePart_keyword xpath=/mods/name/namepa...@type != 'date']/ /entity /entity /document /dataConfig above is the data config file The namePart element in the above xml may or may not have type attribute. How can i get data from the namePart element which has no type attribute? xpath=/mods/name/namepa...@type != 'date'] This is not working. I dont get any errors ,There is no namePart_keyword in the index. Quoting Ken Stanley doh...@gmail.com:
Re: xpath processing
The XPathEP has the option to run a real XSL script at some point in its processing chain. I guess you could make an XSL that pulls your fields out into a simpler XML in the /a/b/c format that the XPath parser supports. On Tue, Nov 2, 2010 at 5:37 PM, pghorp...@ucla.edu wrote: ?xml version=1.0 encoding=UTF-8? mods:mods xmlns:mods=http://www.loc.gov/mods/v3; xmlns:xlink=http://www.w3.org/1999/xlink; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-0.xsd; mods:titleInfo mods:titleAny place I hang my hat is home/mods:title /mods:titleInfo mods:titleInfo type=uniform mods:titleSt. Louis woman/mods:title mods:partNameAny place I hang my hat is home/mods:partName /mods:titleInfo mods:titleInfo type=alternative mods:titleFree an' easy that's my style/mods:title /mods:titleInfo mods:name type=personal mods:namePartArlen, Harold/mods:namePart mods:namePart type=date1905-1986/mods:namePart mods:role mods:roleTerm authority=marcrelator type=textcreator/mods:roleTerm /mods:role /mods:name mods:name type=personal mods:namePartMercer, Johnny/mods:namePart mods:namePart type=date1909-/mods:namePart /mods:name mods:name type=personal mods:namePartDavison, R./mods:namePart /mods:name mods:name type=personal mods:namePartBontemps, Arna Wendell/mods:namePart mods:namePart type=date1902-1973/mods:namePart /mods:name mods:name type=personal mods:namePartCullen, Countee/mods:namePart mods:namePart type=date1903-1946/mods:namePart /mods:name mods:typeOfResourcenotated music/mods:typeOfResource mods:originInfo mods:place mods:placeTerm authority=marccountry type=codenyu/mods:placeTerm /mods:place mods:place mods:placeTerm type=textNew York/mods:placeTerm /mods:place mods:publisherDe Sylva, Brown amp; Henderson, Inc./mods:publisher mods:dateIssuedc1946/mods:dateIssued mods:dateIssued encoding=marc1946/mods:dateIssued mods:issuancemonographic/mods:issuance mods:dateOther type=normalized1946/mods:dateOther mods:dateOther type=normalized1946/mods:dateOther /mods:originInfo mods:language mods:languageTerm authority=iso639-2b type=codeeng/mods:languageTerm /mods:language mods:physicalDescription mods:form authority=marcformprint/mods:form mods:extent1 vocal score (5 p.) : ill. ; 31 cm./mods:extent /mods:physicalDescription mods:note type=statement of responsibilitymusic by Harold Arlen ; lyrics by Johnny Mercer./mods:note mods:noteFor voice and piano./mods:note mods:noteIncludes chord symbols./mods:note mods:noteIllustration by R. Davison./mods:note mods:noteFirst line: Free an' easy that's my style./mods:note mods:noteEdward Gross presents St. Louis Woman ... Book by Arna Bontemps amp; Countee Cullen -- Cover./mods:note mods:notePublisher's advertising includes musical incipits./mods:note mods:subject authority=lcsh mods:topicMotion picture music/mods:topic mods:topicExcerpts/mods:topic mods:topicVocal scores with piano/mods:topic /mods:subject mods:classification authority=lccM1 .S8/mods:classification mods:identifier type=music plate1403-4 De Sylva, Brown Henderson, Inc./mods:identifier mods:location mods:physicalLocationLilly Library, Indiana University Bloomington/mods:physicalLocation /mods:location mods:recordInfo mods:recordContentSource authority=marcorgIUL/mods:recordContentSource mods:recordCreationDate encoding=marc990316/mods:recordCreationDate mods:recordIdentifierLL-SSM-ALC4888/mods:recordIdentifier /mods:recordInfo /mods:mods Above is my sample xml dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=f rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=.*xml recursive=true baseDir=C:\test_xml entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${f.fileAbsolutePath} stream=false forEach=/mods transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer field column=id template=${f.file}/ field column=collectionKey template=uw/ field column=collectionName template=University of Washington Pacific Northwest Sheet Music Collection/ field column=fileAbsolutePath template=${f.fileAbsolutePath}/ field column=fileName template=${f.file}/ field column=fileSize template=${f.fileSize}/ field column=fileLastModified template=${f.fileLastModified}/ field column=nameNamePart_keyword xpath=/mods/name/namepa...@type != 'date']/ /entity /entity /document /dataConfig above is the data config file The namePart element in the above xml may or may not
Re: xpath processing
The XPathEntityProcessor does not do full XPath. It is a very limited set intended to be very fast. You can add code in any scripting language, but that is not really performant. Is it possible to use the RegexTransformer to find your records with regular expressions? Ken Stanley wrote: On Fri, Oct 22, 2010 at 11:52 PM,pghorp...@ucla.edu wrote: dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=f rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=.*xml recursive=true baseDir=C:\data\sample_records\mods\starr entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${f.fileAbsolutePath} stream=false forEach=/mods transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer field column=id template=${f.file}/ field column=collectionKey template=starr/ field column=collectionName template=starr/ field column=fileAbsolutePath template=${f.fileAbsolutePath}/ field column=fileName template=${f.file}/ field column=fileSize template=${f.fileSize}/ field column=fileLastModified template=${f.fileLastModified}/ field column=classification_keyword xpath=/mods/classification/ field column=accessCondition_keyword xpath=/mods/accessCondition/ field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date'] / /entity /entity /document /dataConfig The documentation says you don't need a dataSource for your XPathEntityProcessor entity; in my configuration, I have mine set to the name of the top-level FileListEntityProcessor. Everything else looks fine. Can you provide one record from your data? Also, are you getting any errors in your log? - Ken
Re: xpath processing
processor=FileListEntityProcessor fileName=.*xml recursive=true Shouldn't this be fileName=*.xml? Ben On Oct 22, 2010, at 10:52 PM, pghorp...@ucla.edu wrote: dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=f rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=.*xml recursive=true baseDir=C:\data\sample_records\mods\starr entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${f.fileAbsolutePath} stream=false forEach=/mods transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer field column=id template=${f.file}/ field column=collectionKey template=starr/ field column=collectionName template=starr/ field column=fileAbsolutePath template=${f.fileAbsolutePath}/ field column=fileName template=${f.file}/ field column=fileSize template=${f.fileSize}/ field column=fileLastModified template=${f.fileLastModified}/ field column=classification_keyword xpath=/mods/classification/ field column=accessCondition_keyword xpath=/mods/accessCondition/ field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date'] / /entity /entity /document /dataConfig Quoting Ken Stanley doh...@gmail.com: Parinita, In its simplest form, what does your entity definition for DIH look like; also, what does one record from your xml look like? We need more information before we can really be of any help. :) - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote: Quoting pghorp...@ucla.edu: Can someone help me please? I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath=/mods/name/namepa...@type = 'date'] I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita
Re: xpath processing
On Fri, Oct 22, 2010 at 11:52 PM, pghorp...@ucla.edu wrote: dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=f rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=.*xml recursive=true baseDir=C:\data\sample_records\mods\starr entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${f.fileAbsolutePath} stream=false forEach=/mods transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer field column=id template=${f.file}/ field column=collectionKey template=starr/ field column=collectionName template=starr/ field column=fileAbsolutePath template=${f.fileAbsolutePath}/ field column=fileName template=${f.file}/ field column=fileSize template=${f.fileSize}/ field column=fileLastModified template=${f.fileLastModified}/ field column=classification_keyword xpath=/mods/classification/ field column=accessCondition_keyword xpath=/mods/accessCondition/ field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date'] / /entity /entity /document /dataConfig The documentation says you don't need a dataSource for your XPathEntityProcessor entity; in my configuration, I have mine set to the name of the top-level FileListEntityProcessor. Everything else looks fine. Can you provide one record from your data? Also, are you getting any errors in your log? - Ken
Re: xpath processing
Quoting pghorp...@ucla.edu: Can someone help me please? I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath=/mods/name/namepa...@type = 'date'] I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita
Re: xpath processing
Parinita, In its simplest form, what does your entity definition for DIH look like; also, what does one record from your xml look like? We need more information before we can really be of any help. :) - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote: Quoting pghorp...@ucla.edu: Can someone help me please? I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath=/mods/name/namepa...@type = 'date'] I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita
Re: xpath processing
dataConfig dataSource name=myfilereader type=FileDataSource/ document entity name=f rootEntity=false dataSource=null processor=FileListEntityProcessor fileName=.*xml recursive=true baseDir=C:\data\sample_records\mods\starr entity name=x dataSource=myfilereader processor=XPathEntityProcessor url=${f.fileAbsolutePath} stream=false forEach=/mods transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer field column=id template=${f.file}/ field column=collectionKey template=starr/ field column=collectionName template=starr/ field column=fileAbsolutePath template=${f.fileAbsolutePath}/ field column=fileName template=${f.file}/ field column=fileSize template=${f.fileSize}/ field column=fileLastModified template=${f.fileLastModified}/ field column=classification_keyword xpath=/mods/classification/ field column=accessCondition_keyword xpath=/mods/accessCondition/ field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date'] / /entity /entity /document /dataConfig Quoting Ken Stanley doh...@gmail.com: Parinita, In its simplest form, what does your entity definition for DIH look like; also, what does one record from your xml look like? We need more information before we can really be of any help. :) - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, The Hitchhikers Guide to the Galaxy On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote: Quoting pghorp...@ucla.edu: Can someone help me please? I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath=/mods/name/namepa...@type = 'date'] I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita
RE: XPath Processing Applied to Clob
You could also do the xpath processing on the oracle end using the extract or extractValue functions. Here's a good reference: http://www.psoug.org/reference/xml_functions.html -Original Message- From: Neil Chaudhuri [mailto:nchaudh...@potomacfusion.com] Sent: Wednesday, March 17, 2010 3:24 PM To: solr-user@lucene.apache.org Subject: XPath Processing Applied to Clob I am using the DataImportHandler to index 3 fields in a table: an id, a date, and the text of a document. This is an Oracle database, and the document is an XML document stored as Oracle's xmltype data type. Since this is nothing more than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. However, I don't want to index/store all the XML but instead just the XML within a set of tags. The XPath itself is trivial, but it seems like the XPathEntityProcessor only works for XML file content rather than the output of a Transformer. Here is what I currently have that fails: document entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer field column=EFFECTIVE_DT name=effectiveDate / field column=ARCHIVE_ID name=id / field column=TEXT name=text clob=true entity name=text processor=XPathEntityProcessor forEach=/MESSAGE url=${doc.text} field column=body xpath=//BODY/ /entity /entity /document Is there an easy way to do this without writing my own custom transformer? Thanks.
RE: XPath Processing Applied to Clob
Incidentally, I tried adding this: datasource name=f type=FieldReaderDataSource / document entity dataSource=f processor=XPathEntityProcessor dataField=d.text forEach=/MESSAGE field column=body xpath=//BODY/ /entity /document But this didn't seem to change anything. Any insight is appreciated. Thanks. From: Neil Chaudhuri Sent: Wednesday, March 17, 2010 3:24 PM To: solr-user@lucene.apache.org Subject: XPath Processing Applied to Clob I am using the DataImportHandler to index 3 fields in a table: an id, a date, and the text of a document. This is an Oracle database, and the document is an XML document stored as Oracle's xmltype data type. Since this is nothing more than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. However, I don't want to index/store all the XML but instead just the XML within a set of tags. The XPath itself is trivial, but it seems like the XPathEntityProcessor only works for XML file content rather than the output of a Transformer. Here is what I currently have that fails: document entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer field column=EFFECTIVE_DT name=effectiveDate / field column=ARCHIVE_ID name=id / field column=TEXT name=text clob=true entity name=text processor=XPathEntityProcessor forEach=/MESSAGE url=${doc.text} field column=body xpath=//BODY/ /entity /entity /document Is there an easy way to do this without writing my own custom transformer? Thanks.
Re: XPath Processing Applied to Clob
The XPath parser in the DIH is a limited implementation. The unit test program is the only enumeration (that I can find) of what it handles: http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/dataimporthandler/src/test/java/org/apache/solr/handler/dataimport/TestXPathRecordReader.java //BODY in fact is not allowed, and should throw an Exception. Or at least some kind of error message. Perhaps there is one in the logs? On Wed, Mar 17, 2010 at 2:45 PM, Neil Chaudhuri nchaudh...@potomacfusion.com wrote: Incidentally, I tried adding this: datasource name=f type=FieldReaderDataSource / document entity dataSource=f processor=XPathEntityProcessor dataField=d.text forEach=/MESSAGE field column=body xpath=//BODY/ /entity /document But this didn't seem to change anything. Any insight is appreciated. Thanks. From: Neil Chaudhuri Sent: Wednesday, March 17, 2010 3:24 PM To: solr-user@lucene.apache.org Subject: XPath Processing Applied to Clob I am using the DataImportHandler to index 3 fields in a table: an id, a date, and the text of a document. This is an Oracle database, and the document is an XML document stored as Oracle's xmltype data type. Since this is nothing more than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. However, I don't want to index/store all the XML but instead just the XML within a set of tags. The XPath itself is trivial, but it seems like the XPathEntityProcessor only works for XML file content rather than the output of a Transformer. Here is what I currently have that fails: document entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer field column=EFFECTIVE_DT name=effectiveDate / field column=ARCHIVE_ID name=id / field column=TEXT name=text clob=true entity name=text processor=XPathEntityProcessor forEach=/MESSAGE url=${doc.text} field column=body xpath=//BODY/ /entity /entity /document Is there an easy way to do this without writing my own custom transformer? Thanks. -- Lance Norskog goks...@gmail.com
Re: XPath Processing Applied to Clob
keep in mind that the xpath is case-sensitive. paste a sample xml what is dataField=d.text it does not seem to refer to anything. where is the enclosing entity? did you mean dataField=doc.text. xpath=//BODY is a supported syntax as long as you are using Solr1.4 or higher On Thu, Mar 18, 2010 at 3:15 AM, Neil Chaudhuri nchaudh...@potomacfusion.com wrote: Incidentally, I tried adding this: datasource name=f type=FieldReaderDataSource / document entity dataSource=f processor=XPathEntityProcessor dataField=d.text forEach=/MESSAGE field column=body xpath=//BODY/ /entity /document But this didn't seem to change anything. Any insight is appreciated. Thanks. From: Neil Chaudhuri Sent: Wednesday, March 17, 2010 3:24 PM To: solr-user@lucene.apache.org Subject: XPath Processing Applied to Clob I am using the DataImportHandler to index 3 fields in a table: an id, a date, and the text of a document. This is an Oracle database, and the document is an XML document stored as Oracle's xmltype data type. Since this is nothing more than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. However, I don't want to index/store all the XML but instead just the XML within a set of tags. The XPath itself is trivial, but it seems like the XPathEntityProcessor only works for XML file content rather than the output of a Transformer. Here is what I currently have that fails: document entity name=doc query=SELECT d.EFFECTIVE_DT, d.ARCHIVE_ID, d.XML.getClobVal() AS TEXT FROM DOC d transformer=ClobTransformer field column=EFFECTIVE_DT name=effectiveDate / field column=ARCHIVE_ID name=id / field column=TEXT name=text clob=true entity name=text processor=XPathEntityProcessor forEach=/MESSAGE url=${doc.text} field column=body xpath=//BODY/ /entity /entity /document Is there an easy way to do this without writing my own custom transformer? Thanks. -- - Noble Paul | Systems Architect| AOL | http://aol.com