On 2019-01-03 6:00 a.m., Christian Grün wrote:
If you use Java, there is quite a variety on running queries. Maybe
you could give us some insight into your use case first? For example,
what do you want to do with the result?
Yes, bit spaghetti-ish, pardon. The notion is to first drop the
database, then populate, then query. For grabbing xml from w3schools,
popping in a database, running an xquery, that works fine.
Moving to html, it then sortof works. The db is dropped, a db is
created and then populated. Browsing in the GUI I can see, for example,
a list of book categories -- so there's data to work from. (Which
tagsoup has fixed so that basex can parse it.)
That's really the end goal: just running XQuery against html.
The only query I can get working against the html is for the query
string to be "text()" or perhaps "/text()" which then returns all the
html. Rather, I'd want to traverse to pick out specific parts.
It's related, to a degree, with Selenium efforts.
---
The upshot being that the way tagsoup fixes malformed html either causes
(me) problems with running xquery queries, or, more likely, I'm not
understanding how to run xpath and xquery against the db properly.
The GUI is very interesting in this respect because it allows me to
visualize the raw data, it's "clickable", and I can run type xpath
queries right in the GUI.
However, the *only* xpath query I can get results on is "text()". Not
so with "raw" xml from w3schools. With that xml I can drill down to
varying degrees as expected.
-------
Either tagsoup is mashing the html too extremely, or it's my lack of
knowledge.
Hey, I appreciate the input. Hope I made sense.
-Thufir