Re: XPath with ExtractingRequestHandler

2013-01-19 Thread Arcadius Ahouansou
Hi Mike.

I am going through this too.

How did you solve this?

Thanks.

Arcadius.


On 15 December 2011 12:49, Michael Kelleher mj.kelle...@gmail.com wrote:

 Yeah, I tried:


 //xhtml:div[@class='**bibliographicData']/**descendant:node()

 also tried

 //xhtml:div[@class='**bibliographicData']

 Neither worked.  The DIV I need also had an ID value, and I tried both
 variations on ID as well.  Still nothing.


 XPath handling for Tika seems to be pretty basic and does not seem to
 support most XPath Query syntax.  Probably because it's using a Sax parser,
 I don't know.  I guess I will have to write something custom to get it to
 do what I need it to.

 Thanks for the reply though.

 I will post a follow up with how I fixed this.

 --mike



Re: XPath with ExtractingRequestHandler

2011-12-15 Thread Péter Király
Hi,

maybe I am wrong, but the // should be at the beggining of the
expression, like
//xhtml:div[@class='bibliographicData']/descendant:node(),
or if you want to search the div inside body, you have to use descendant like
/xhtml:html/xhtml:body/descendant::xhtml:div[@class='bibliographicData']/descendant:node()

Péter

2011/12/14 Michael Kelleher mj.kelle...@gmail.com:
 I want to restrict the HTML that is returned by Tika to basically:


  /xhtml:html/xhtml:body//xhtml:div[@class='bibliographicData']/descendant:node()


 and it seems that the XPath class being used does not support the '//'
 syntax.

 Is there anyway to configure Tika to use a different XPath evaluation class?





-- 
Péter Király
eXtensible Catalog
http://eXtensibleCatalog.org
http://drupal.org/project/xc


Re: XPath with ExtractingRequestHandler

2011-12-15 Thread Michael Kelleher

Yeah, I tried:

//xhtml:div[@class='bibliographicData']/descendant:node()

also tried

//xhtml:div[@class='bibliographicData']

Neither worked.  The DIV I need also had an ID value, and I tried both 
variations on ID as well.  Still nothing.



XPath handling for Tika seems to be pretty basic and does not seem to 
support most XPath Query syntax.  Probably because it's using a Sax 
parser, I don't know.  I guess I will have to write something custom to 
get it to do what I need it to.


Thanks for the reply though.

I will post a follow up with how I fixed this.

--mike