Re: xpath processing
The XPathEP has the option to run a real XSL script at some point in its processing chain. I guess you could make an XSL that pulls your fields out into a simpler XML in the /a/b/c format that the XPath parser supports. On Tue, Nov 2, 2010 at 5:37 PM, wrote: > > > http://www.loc.gov/mods/v3"; > xmlns:xlink="http://www.w3.org/1999/xlink"; > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; > xsi:schemaLocation="http://www.loc.gov/mods/v3 > http://www.loc.gov/standards/mods/v3/mods-3-0.xsd";> > > Any place I hang my hat is home > > > St. Louis woman > Any place I hang my hat is home > > > Free an' easy that's my style > > > Arlen, Harold > 1905-1986 > > type="text">creator > > > > Mercer, Johnny > 1909- > > > Davison, R. > > > Bontemps, Arna Wendell > 1902-1973 > > > Cullen, Countee > 1903-1946 > > notated music > > > type="code">nyu > > > New York > > De Sylva, Brown & Henderson, > Inc. > c1946 > 1946 > monographic > 1946 > 1946 > > > type="code">eng > > > print > 1 vocal score (5 p.) : ill. ; 31 cm. > > music by Harold Arlen ; > lyrics by Johnny Mercer. > For voice and piano. > Includes chord symbols. > Illustration by R. Davison. > First line: Free an' easy that's my style. > "Edward Gross presents St. Louis Woman ... Book by Arna > Bontemps & Countee Cullen" -- Cover. > Publisher's advertising includes musical incipits. > > Motion picture music > Excerpts > Vocal scores with piano > > M1 .S8 > 1403-4 De Sylva, Brown Henderson, > Inc. > > Lilly Library, Indiana University > Bloomington > > > authority="marcorg">IUL > encoding="marc">990316 > LL-SSM-ALC4888 > > > > Above is my sample xml > > > > > processor="FileListEntityProcessor" fileName=".*xml" recursive="true" > baseDir="C:\test_xml"> > url="${f.fileAbsolutePath}" stream="false" forEach="/mods" > transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer"> > > > > > > > > > > > > > > above is the data config file > The namePart element in the above xml may or may not have type attribute. > How can i get data from the namePart element which has no type attribute? > xpath="/mods/name/namepa...@type != 'date']" This is not working. I dont get > any errors ,There is no namePart_keyword in the index. > > > Quoting Ken Stanley : > >> On Fri, Oct 22, 2010 at 11:52 PM, wrote: >> >>> >>> >>> >>> >>> >>> >> processor="FileListEntityProcessor" fileName=".*xml" recursive="true" >>> baseDir="C:\data\sample_records\mods\starr"> >>> >> processor="XPathEntityProcessor" >>> url="${f.fileAbsolutePath}" stream="false" forEach="/mods" >>> transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer"> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> /> >>> >>> >>> >>> >> >> >> The documentation says you don't need a dataSource for your >> XPathEntityProcessor entity; in my configuration, I have mine set to the >> name of the top-level FileListEntityProcessor. Everything else looks fine. >> Can you provide one record from your data? Also, are you getting any >> errors >> in your log? >> >> - Ken >> > > > -- Lance Norskog goks...@gmail.com
Re: xpath processing
http://www.loc.gov/mods/v3"; xmlns:xlink="http://www.w3.org/1999/xlink"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-0.xsd";> Any place I hang my hat is home St. Louis woman Any place I hang my hat is home Free an' easy that's my style Arlen, Harold 1905-1986 type="text">creator Mercer, Johnny 1909- Davison, R. Bontemps, Arna Wendell 1902-1973 Cullen, Countee 1903-1946 notated music type="code">nyu New York De Sylva, Brown & Henderson, Inc. c1946 1946 monographic 1946 1946 type="code">eng print 1 vocal score (5 p.) : ill. ; 31 cm. music by Harold Arlen ; lyrics by Johnny Mercer. For voice and piano. Includes chord symbols. Illustration by R. Davison. First line: Free an' easy that's my style. "Edward Gross presents St. Louis Woman ... Book by Arna Bontemps & Countee Cullen" -- Cover. Publisher's advertising includes musical incipits. Motion picture music Excerpts Vocal scores with piano M1 .S8 1403-4 De Sylva, Brown Henderson, Inc. Lilly Library, Indiana University Bloomington authority="marcorg">IUL encoding="marc">990316 LL-SSM-ALC4888 Above is my sample xml processor="FileListEntityProcessor" fileName=".*xml" recursive="true" baseDir="C:\test_xml"> processor="XPathEntityProcessor" url="${f.fileAbsolutePath}" stream="false" forEach="/mods" transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer"> above is the data config file The namePart element in the above xml may or may not have type attribute. How can i get data from the namePart element which has no type attribute? xpath="/mods/name/namepa...@type != 'date']" This is not working. I dont get any errors ,There is no namePart_keyword in the index. Quoting Ken Stanley : On Fri, Oct 22, 2010 at 11:52 PM, wrote: The documentation says you don't need a dataSource for your XPathEntityProcessor entity; in my configuration, I have mine set to the name of the top-level FileListEntityProcessor. Everything else looks fine. Can you provide one record from your data? Also, are you getting any errors in your log? - Ken
Re: xpath processing
The XPathEntityProcessor does not do full XPath. It is a very limited set intended to be very fast. You can add code in any scripting language, but that is not really performant. Is it possible to use the RegexTransformer to find your records with regular expressions? Ken Stanley wrote: On Fri, Oct 22, 2010 at 11:52 PM, wrote: The documentation says you don't need a dataSource for your XPathEntityProcessor entity; in my configuration, I have mine set to the name of the top-level FileListEntityProcessor. Everything else looks fine. Can you provide one record from your data? Also, are you getting any errors in your log? - Ken
Re: xpath processing
On Fri, Oct 22, 2010 at 11:52 PM, wrote: > > > > > > processor="FileListEntityProcessor" fileName=".*xml" recursive="true" > baseDir="C:\data\sample_records\mods\starr"> > url="${f.fileAbsolutePath}" stream="false" forEach="/mods" > transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer"> > > > > > > > > > > /> > > > > The documentation says you don't need a dataSource for your XPathEntityProcessor entity; in my configuration, I have mine set to the name of the top-level FileListEntityProcessor. Everything else looks fine. Can you provide one record from your data? Also, are you getting any errors in your log? - Ken
Re: xpath processing
> processor="FileListEntityProcessor" fileName=".*xml" recursive="true" Shouldn't this be fileName="*.xml"? Ben On Oct 22, 2010, at 10:52 PM, pghorp...@ucla.edu wrote: > > > > > > processor="FileListEntityProcessor" fileName=".*xml" recursive="true" > baseDir="C:\data\sample_records\mods\starr"> > url="${f.fileAbsolutePath}" stream="false" forEach="/mods" > transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer"> > > > > > > > > > > /> > > > > > > Quoting Ken Stanley : > >> Parinita, >> >> In its simplest form, what does your entity definition for DIH look like; >> also, what does one record from your xml look like? We need more information >> before we can really be of any help. :) >> >> - Ken >> >> It looked like something resembling white marble, which was >> probably what it was: something resembling white marble. >>-- Douglas Adams, "The Hitchhikers Guide to the Galaxy" >> >> >> On Fri, Oct 22, 2010 at 8:00 PM, wrote: >> >>> Quoting pghorp...@ucla.edu: >>> Can someone help me please? >>> >>> I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath="/mods/name/namepa...@type = 'date']" I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita >>> >>> >> > >
Re: xpath processing
processor="FileListEntityProcessor" fileName=".*xml" recursive="true" baseDir="C:\data\sample_records\mods\starr"> processor="XPathEntityProcessor" url="${f.fileAbsolutePath}" stream="false" forEach="/mods" transformer="DateFormatTransformer,RegexTransformer,TemplateTransformer"> Quoting Ken Stanley : Parinita, In its simplest form, what does your entity definition for DIH look like; also, what does one record from your xml look like? We need more information before we can really be of any help. :) - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, "The Hitchhikers Guide to the Galaxy" On Fri, Oct 22, 2010 at 8:00 PM, wrote: Quoting pghorp...@ucla.edu: Can someone help me please? I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath="/mods/name/namepa...@type = 'date']" I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita
Re: xpath processing
Parinita, In its simplest form, what does your entity definition for DIH look like; also, what does one record from your xml look like? We need more information before we can really be of any help. :) - Ken It looked like something resembling white marble, which was probably what it was: something resembling white marble. -- Douglas Adams, "The Hitchhikers Guide to the Galaxy" On Fri, Oct 22, 2010 at 8:00 PM, wrote: > Quoting pghorp...@ucla.edu: > Can someone help me please? > > >> I am trying to import mods xml data in solr using the xml/http datasource >> >> This does not work with XPathEntityProcessor of the data import handler >> xpath="/mods/name/namepa...@type = 'date']" >> >> I actually have 143 records with type attribute as 'date' for element >> namePart. >> >> Thank you >> Parinita >> >> > >
Re: xpath processing
Quoting pghorp...@ucla.edu: Can someone help me please? I am trying to import mods xml data in solr using the xml/http datasource This does not work with XPathEntityProcessor of the data import handler xpath="/mods/name/namepa...@type = 'date']" I actually have 143 records with type attribute as 'date' for element namePart. Thank you Parinita
RE: XPath Processing Applied to Clob
You could also do the xpath processing on the oracle end using the extract or extractValue functions. Here's a good reference: http://www.psoug.org/reference/xml_functions.html -Original Message- From: Neil Chaudhuri [mailto:nchaudh...@potomacfusion.com] Sent: Wednesday, March 17, 2010 3:24 PM To: solr-user@lucene.apache.org Subject: XPath Processing Applied to Clob I am using the DataImportHandler to index 3 fields in a table: an id, a date, and the text of a document. This is an Oracle database, and the document is an XML document stored as Oracle's xmltype data type. Since this is nothing more than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. However, I don't want to index/store all the XML but instead just the XML within a set of tags. The XPath itself is trivial, but it seems like the XPathEntityProcessor only works for XML file content rather than the output of a Transformer. Here is what I currently have that fails: Is there an easy way to do this without writing my own custom transformer? Thanks.
Re: XPath Processing Applied to Clob
keep in mind that the xpath is case-sensitive. paste a sample xml what is dataField="d.text" it does not seem to refer to anything. where is the enclosing entity? did you mean dataField="doc.text". xpath="//BODY" is a supported syntax as long as you are using Solr1.4 or higher On Thu, Mar 18, 2010 at 3:15 AM, Neil Chaudhuri wrote: > Incidentally, I tried adding this: > > > > dataField="d.text" forEach="/MESSAGE"> > > > > > But this didn't seem to change anything. > > Any insight is appreciated. > > Thanks. > > > > From: Neil Chaudhuri > Sent: Wednesday, March 17, 2010 3:24 PM > To: solr-user@lucene.apache.org > Subject: XPath Processing Applied to Clob > > I am using the DataImportHandler to index 3 fields in a table: an id, a date, > and the text of a document. This is an Oracle database, and the document is > an XML document stored as Oracle's xmltype data type. Since this is nothing > more than a fancy CLOB, I am using the ClobTransformer to extract the actual > XML. However, I don't want to index/store all the XML but instead just the > XML within a set of tags. The XPath itself is trivial, but it seems like the > XPathEntityProcessor only works for XML file content rather than the output > of a Transformer. > > Here is what I currently have that fails: > > > > > > > > > > > > forEach="/MESSAGE" url="${doc.text}"> > > > > > > > > > > Is there an easy way to do this without writing my own custom transformer? > > Thanks. > -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: XPath Processing Applied to Clob
The XPath parser in the DIH is a limited implementation. The unit test program is the only enumeration (that I can find) of what it handles: http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/dataimporthandler/src/test/java/org/apache/solr/handler/dataimport/TestXPathRecordReader.java //BODY in fact is not allowed, and should throw an Exception. Or at least some kind of error message. Perhaps there is one in the logs? On Wed, Mar 17, 2010 at 2:45 PM, Neil Chaudhuri wrote: > Incidentally, I tried adding this: > > > > dataField="d.text" forEach="/MESSAGE"> > > > > > But this didn't seem to change anything. > > Any insight is appreciated. > > Thanks. > > > > From: Neil Chaudhuri > Sent: Wednesday, March 17, 2010 3:24 PM > To: solr-user@lucene.apache.org > Subject: XPath Processing Applied to Clob > > I am using the DataImportHandler to index 3 fields in a table: an id, a date, > and the text of a document. This is an Oracle database, and the document is > an XML document stored as Oracle's xmltype data type. Since this is nothing > more than a fancy CLOB, I am using the ClobTransformer to extract the actual > XML. However, I don't want to index/store all the XML but instead just the > XML within a set of tags. The XPath itself is trivial, but it seems like the > XPathEntityProcessor only works for XML file content rather than the output > of a Transformer. > > Here is what I currently have that fails: > > > > > > > > > > > > forEach="/MESSAGE" url="${doc.text}"> > > > > > > > > > > Is there an easy way to do this without writing my own custom transformer? > > Thanks. > -- Lance Norskog goks...@gmail.com
RE: XPath Processing Applied to Clob
Incidentally, I tried adding this: But this didn't seem to change anything. Any insight is appreciated. Thanks. From: Neil Chaudhuri Sent: Wednesday, March 17, 2010 3:24 PM To: solr-user@lucene.apache.org Subject: XPath Processing Applied to Clob I am using the DataImportHandler to index 3 fields in a table: an id, a date, and the text of a document. This is an Oracle database, and the document is an XML document stored as Oracle's xmltype data type. Since this is nothing more than a fancy CLOB, I am using the ClobTransformer to extract the actual XML. However, I don't want to index/store all the XML but instead just the XML within a set of tags. The XPath itself is trivial, but it seems like the XPathEntityProcessor only works for XML file content rather than the output of a Transformer. Here is what I currently have that fails: Is there an easy way to do this without writing my own custom transformer? Thanks.