Norman Dunbar wrote:
On 31/12/10 12:47, Dilwyn Jones wrote:
Does anyone know of any software that will convert a html table into
auto-spaced plain text?

I'm trying to convert some HTML tables into plain text columns for
the QL. Most HTML-text converters just put a space between columns
in the table, with the result that the text looks awful. It'd be a
big job to do this by hand (scan table, pass 1, find all TR and TD
tags and keep a record of the longest in each column, scan 2,
extract the text and pad each one with spaces), and although I'm
sure I could write such a converter, why do so if there's already QL
code to do just that?

Don't forget, when you find a table, there is a possibility that there
is another table embedded in between the TD tags! So, you'll have to
get recursive!
Curses and Recurses! Actually, the tables concerned are pretty simple text column tables, no nests of tables, so anything which produces a file I can handle more easily with less hand conversion should be OK. Most non-space delimiters such as tabs between columns could be easily handled anyway.

Must admit I hadn't thought of the Entity characters, but even those aren't too hard to process if you make the assumption that each entity=1 text character (which they are likely to be in the particular files I'm converting. There are loads of tags and entities which I might run into, but luckily the text tables in these files are pretty simple affairs,.

Theres a PC program here http://www.nirsoft.net/utils/htmlastext.html that allows "Simple tables can be delimited by spaces, tab characters,
commas, or CRLF" so it's probably not what you need.
Had a quick look, will give this a try, looks pretty promising from the quick read. And its free too!

Tony Firshman wrote:
... and mis-configured tables.
Beware - browsers will cope (successfully in the main) with all sorts of
bad html.
I know that to my cost when I took over Ann's craft association site. Not that the html previously used was bad, it was just optimised for Firefox and the previous webmaster didn't like IE so never checked it, despite the fact that by number, more people probably use IE than most other browsers. See an example of this (unless it's been fixed by now) by looking at Quanta website's home page in IE and Firefox to see how things can go wrong with IE!

This is terrible of course, not just because it encourages bad coding,
but also the programmer then does not see that the code is bad.
I know what you mean. Luckily, what I want to do is pretty simple-minded so the program Norman suggested MIGHT just do enough for what I want. If it doesn't, I'll just write a simple routine to cope with the basic tables I want to process. Even if I can't make a 100% perefct conversion of the tables I'll content myself with producing something that needs less work at the QL end.

... much like I suppose Superbasic's habit of implicitly adding missing
END statements.
Yup. Just try throwing some of those programs at SBASIC, it'll soon tell you there's something missing, though not necessarily the correct point in the listing!

Debugging some older SuperBASIC programs like that can take quite a while.

Thanks,

Dilwyn Jones


_______________________________________________
QL-Users Mailing List
http://www.q-v-d.demon.co.uk/smsqe.htm

Reply via email to