Re: XPath with ExtractingRequestHandler
Hi Mike. I am going through this too. How did you solve this? Thanks. Arcadius. On 15 December 2011 12:49, Michael Kelleher mj.kelle...@gmail.com wrote: Yeah, I tried: //xhtml:div[@class='**bibliographicData']/**descendant:node() also tried //xhtml:div[@class='**bibliographicData'] Neither worked. The DIV I need also had an ID value, and I tried both variations on ID as well. Still nothing. XPath handling for Tika seems to be pretty basic and does not seem to support most XPath Query syntax. Probably because it's using a Sax parser, I don't know. I guess I will have to write something custom to get it to do what I need it to. Thanks for the reply though. I will post a follow up with how I fixed this. --mike
Re: XPath with ExtractingRequestHandler
Hi, maybe I am wrong, but the // should be at the beggining of the expression, like //xhtml:div[@class='bibliographicData']/descendant:node(), or if you want to search the div inside body, you have to use descendant like /xhtml:html/xhtml:body/descendant::xhtml:div[@class='bibliographicData']/descendant:node() Péter 2011/12/14 Michael Kelleher mj.kelle...@gmail.com: I want to restrict the HTML that is returned by Tika to basically: /xhtml:html/xhtml:body//xhtml:div[@class='bibliographicData']/descendant:node() and it seems that the XPath class being used does not support the '//' syntax. Is there anyway to configure Tika to use a different XPath evaluation class? -- Péter Király eXtensible Catalog http://eXtensibleCatalog.org http://drupal.org/project/xc
Re: XPath with ExtractingRequestHandler
Yeah, I tried: //xhtml:div[@class='bibliographicData']/descendant:node() also tried //xhtml:div[@class='bibliographicData'] Neither worked. The DIV I need also had an ID value, and I tried both variations on ID as well. Still nothing. XPath handling for Tika seems to be pretty basic and does not seem to support most XPath Query syntax. Probably because it's using a Sax parser, I don't know. I guess I will have to write something custom to get it to do what I need it to. Thanks for the reply though. I will post a follow up with how I fixed this. --mike