Norman Dunbar wrote:
On 31/12/10 12:47, Dilwyn Jones wrote:
Does anyone know of any software that will convert a html table
into
auto-spaced plain text?
I'm trying to convert some HTML tables into plain text columns for
the QL. Most HTML-text converters just put a space between columns
in the table, with the result that the text looks awful. It'd be a
big job to do this by hand (scan table, pass 1, find all TR and TD
tags and keep a record of the longest in each column, scan 2,
extract the text and pad each one with spaces), and although I'm
sure I could write such a converter, why do so if there's already
QL
code to do just that?
Don't forget, when you find a table, there is a possibility that
there
is another table embedded in between the TD tags! So, you'll have to
get recursive!
Curses and Recurses! Actually, the tables concerned are pretty simple
text column tables, no nests of tables, so anything which produces a
file I can handle more easily with less hand conversion should be OK.
Most non-space delimiters such as tabs between columns could be easily
handled anyway.
Must admit I hadn't thought of the Entity characters, but even those
aren't too hard to process if you make the assumption that each
entity=1 text character (which they are likely to be in the particular
files I'm converting. There are loads of tags and entities which I
might run into, but luckily the text tables in these files are pretty
simple affairs,.
Theres a PC program here
http://www.nirsoft.net/utils/htmlastext.html
that allows "Simple tables can be delimited by spaces, tab
characters,
commas, or CRLF" so it's probably not what you need.
Had a quick look, will give this a try, looks pretty promising from
the quick read. And its free too!
Tony Firshman wrote:
... and mis-configured tables.
Beware - browsers will cope (successfully in the main) with all
sorts of
bad html.
I know that to my cost when I took over Ann's craft association site.
Not that the html previously used was bad, it was just optimised for
Firefox and the previous webmaster didn't like IE so never checked it,
despite the fact that by number, more people probably use IE than most
other browsers. See an example of this (unless it's been fixed by now)
by looking at Quanta website's home page in IE and Firefox to see how
things can go wrong with IE!
This is terrible of course, not just because it encourages bad
coding,
but also the programmer then does not see that the code is bad.
I know what you mean. Luckily, what I want to do is pretty
simple-minded so the program Norman suggested MIGHT just do enough for
what I want. If it doesn't, I'll just write a simple routine to cope
with the basic tables I want to process. Even if I can't make a 100%
perefct conversion of the tables I'll content myself with producing
something that needs less work at the QL end.
... much like I suppose Superbasic's habit of implicitly adding
missing
END statements.
Yup. Just try throwing some of those programs at SBASIC, it'll soon
tell you there's something missing, though not necessarily the correct
point in the listing!
Debugging some older SuperBASIC programs like that can take quite a
while.
Thanks,
Dilwyn Jones
_______________________________________________
QL-Users Mailing List
http://www.q-v-d.demon.co.uk/smsqe.htm