Edward Rayl wrote:
> I just tested 0.8 with my regular new sites, and the following sites
> have parsing errors:
>
> 1. http://www.fool.com/partners/avantgo/index.htm depth=2
>
>     News items have partial URI garbage at the top of the text

I just downloaded the site. I see URI garbage just here:
http://www.fool.com/partners/avantgo/news/take/2002/mft/mft02121204.htm

I also see the garbage in IE and Mozilla so I think it's just this
particular page.

>
> 2. http://wireless.metacritic.com/avantgo/ depth=3
>
>     The actual film reviews (level 3) have a graphic and lots of
>     space, but no review text.  After that, pressing the back-arrow
>     key produces documents that are blank as well, even though they
>     were fine before.

This is a bug in font color handling. Actually there is a redundant <font
color="#ffffff"> at the top that throws the parser off. The color is not
reset to black, which it should be. I think I just fixed this. MetaCritic
now renders well, but I'll have to test it some more.

The workaround for now is to set "Use Text Color" to "no", then you will be
able to see the text. Actually I recommend setting this flag to "no" by
default because you cannot change the background color in the Viewer. I
received a couple of reports about missing text and it mostly turned out to
be text that was colored white, which makes it invisible against the default
Viewer background. Also colored text is harder to read on a grayscale
device. (Which was the original reason why I put this option in.)

>
> 3. http://www.industryweek.com/avantgo/ depth=2
>\
>     The document text is missing.

This page is missing a <body> tag, so the parser synthesizes one in the
event stream. However, synthesized tags appear in uppercase and the class
that creates Plucker document expects a lower case tag name. Just put in a
workaround for this.

> Also, now that it appears that we have table support, are you
> considering this code to JPluck X?

Definitely. I still have to take a look at the specs that Chris sent me, but
judging from a cursory glance it shouldn't be too hard to implement.

>  I use Jason Day's SlashPluck.  Any chance you will do a JPluck X
> equivalent, or allow it to run as an external program?

Have to look into this. I don't read Slashdot myself. What is the advantage
that SlashPluck has over http://slashdot.org/palm/?

Anyway, JPluck will support screen-scraping and reformatting of HTML (which
is what SlashPluck seems to do)  through XSL stylesheet transformation. I
have this
working but you have to edit the JXL file manually as this isn't available
in the GUI yet.

JPluck 0.8.1 will be out later this week. It should have the fixes for the
two issues you're having, as well as some other enhancements.


Thanks
-Laurens

_______________________________________________
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Reply via email to