Plucker-1.1.13.exe mirror, 1.1.14 update

2001-11-03 Thread David A. Desrosiers


I've mirrored the Windows Plucker-1.1.13.exe package onto plkr.org,
based on about a dozen emails I've gotten from people who have tried to
download it from prohosting.

Dirk, I'll work on a more secure way that you can upload the package
to the proper directory automagically, and not have to worry about
prohosting. Just have to clean up a bit of the scripts around the web stuff
first.

Did you manage to solve the bugtracker issue on your end?

I also dropped the 1.1.14 packages in there, and plopped them on the
download page. I'll make a front-page announcement later on today. I slashed
my hand pretty bad in a fall yesterday, so it's hard to type. EEK!

Let me know if I typo'd something.

I also noticed that Mozilla completely breaks the page, viewing it,
so I'll put some time in this weekend and try to find out what happened
there. Probably my nasty nested table rendering stuff.

Incidentally, I got a message from the Beyond2000 guys about
promoting their PDA-sized page on the Plucker site, so I'm going to start a
new section there "Plucks of the Week" or something, and have some of those
sites highlighted. Anyone graphically inclined want to make a small "Plucker
Friendly Site" banner that we can pass around to these "partner" sites? I
made one awhile ago, but I'm sure we could update it a bit more.

http://plkr.org/images/pf.gif



/d





Plucking stdin

2001-11-03 Thread Bill Janssen

So here's what I'm planning:

Read all the stdin till we hit end-of-file, treat it as whatever type
is specified on the command line, and process it as the home document.

Probably need two new command-line switches, --stdin-type="foo" and
--stdin-url="foo", to allow you to specify the stdin Content-Type and
URL data.  The charset can be specified as part of --stdin-type, as
per usual.  We need a special home url to trigger this, too; perhaps
--home-url="".

Sound reasonable?

Bill

> Yes, I was thinking of something like that myself.  Sure, pretty easy
> to do.
> 
> Bill
> 
> > 
> > While we're on the subject of using standard redirection in the
> > parser, so we can do plucker-build [...] > foo.pdb, I wonder if we can also
> > use the opposite for input. I have a need now to do things like:
> > 
> > plucker-build [...] < $content_stream
> > 
> > ..which is raw html itself, not a url reference or a file. Possible?





Re: Support for Japanese characters.-2

2001-11-03 Thread Bill Janssen

Nori,

Is this for text documents, or do you also process S-JIS HTML?  Does
the parser work OK on S-JIS HTML?

Bill

> Hi Bill,
> 
> At Wed, 31 Oct 2001 15:59:29 PST,
> Bill Janssen wrote:
> 
> > There's a question here of what the output character set should be for
> > a Plucker doc, something we really haven't discussed.  If they are
> > going to be viewed in S-JIS environments, clearly S-JIS would be a
> > convenient encoding.
> 
>  Because (official) Japanese Palm can work only with S-JIS, so it is
> much easier for Plucker viewer to draw Japanese characters with the
> code than with others. Of course it is possible to convert on the fry
> in the viewer, but it could slow down. I think Plucker doc might has
> to be encoded for each language type, or by Unicode for multiple
> language in a same doc. There is one more way which is used in
> emacs(mule?), but i'm not sure about the detail.
> --
>  Nori
>   





Re: While we're gutting the python parser...

2001-11-03 Thread Bill Janssen

>   My goal was to set it at pluck time, to set a default font, which
> can be overridden via Preferences. If no font is specified at pluck time,
> the generic Standard font is used.

I'm still not sure how this would work.  The user already sets the
font preferred to read documents in via the Preferences panel.
Presumably this setting would always override any "default viewing
font" specified in the document.  So what's the point of putting it
in?

Bill




behavior of missing tag?

2001-11-03 Thread Bill Janssen

I see that the current way a missing tag (like a URL with

  http://foo.bar.com/bletch.html#tag

but there's no "tag" in bletch.html) is handled is to point the link
to the beginning of the page which presumably would have contained the
link (in the above example, to paragraph 0 of bletch.html).

This is extremely counter-intuitive.  The user is jumped to some place
that may be wildly out of context (for a long page).  I'm going to
change the parser to treat such URLs as excluded, just as with any
other URL which doesn't exist.

Bill



Re: behavior of missing tag?

2001-11-03 Thread David A. Desrosiers


> This is extremely counter-intuitive.  The user is jumped to some place
> that may be wildly out of context (for a long page).  I'm going to
> change the parser to treat such URLs as excluded, just as with any other
> URL which doesn't exist.

I have seen pages do this *INTENTIONALLY* by design. Yes, lazy html
authors who use #top and don't include an  anywhere in the top
of the content, but they know it'll jam the user back up to the top of the
page. Are we sure changing this by default is right? Shouldn't we act like
browsers would in this situation, not like an AI would?



/d





Re: While we're gutting the python parser...

2001-11-03 Thread David A. Desrosiers


> I'm still not sure how this would work.  The user already sets the font
> preferred to read documents in via the Preferences panel.

No, they don't "set" the font, they *CHANGE* the font via the
Preferences panel. What I'm suggesting is that we have a way to set a
default font for that pdb at pluck time. From there, the user can change the
font to be whatever they want, but if they beam it to someone, restore it
from a backup, whatever, the default font is "narrow" for example (if that's
what they used at pluck time).

> Presumably this setting would always override any "default viewing font"
> specified in the document.  So what's the point of putting it in?

Because it's stored in the main pdb, not in the secondary pdb, which
nobody beams across when sending content or restoring from backups, or
distributing to users.



/d





Plucking Slate.com, a Python example

2001-11-03 Thread Bill Janssen

The MSN change has affected Slate.com, an online magazine owned by MS.
The re-styling is so bad that I figured I'd start plucking it instead
of looking at it in a browser.  Unfortunately, it's in UTF-8 and
XHTML, and contains a number of the standard "odd" characters.  I
wrote a little csh/Python script to convert it, and thought it might
be of interest to others, if only as an example of how Python 2's
Unicode and regexp support works.

Apparently the various re.sub calls can be amalgamated into one big
one, but my head would explode if I then tried to debug the resulting
RE.

Bill

#!/bin/csh

setenv PATH /sbin:/usr/sbin:/usr/bin:/etc:/usr/ccs/bin:/usr/ucb:/usr/openwin/bin
setenv PATH /import/netpbm/sparc-sun-solaris2/bin:$PATH
source /import/python-1.5.2/top/enable
source /import/Plucker/1.1.13/top/enable

/import/python-2.1/python <<'EOF'
import sys, re, time, urllib
input = urllib.urlopen('http://slate.msn.com/toolbar.aspx?id=toc&action=print')
uline = unicode(input.read(), 'utf-8')
input.close()
# first remove various non-Latin1 punctuation
uline = re.sub(u'\u2019', "'", uline)
uline = re.sub(u'\u2018', "`", uline)
uline = re.sub(u'\u201a', ",", uline)
uline = re.sub(u'\u201c', "``", uline)
uline = re.sub(u'\u201d', "''", uline)
uline = re.sub(u'\u2013', "-", uline)
uline = re.sub(u'\u2014', "--", uline)
uline = re.sub(u'\u2026', "...", uline)
# we don't know about XHTML yet, so translate anchors to HTML
uline = re.sub(u'', '', uline)
# remove advertisements
uline = re.sub(u'(?si)', '', uline)
# remove any trailing javascript
uline = re.sub(u'(?si)', '', uline)
# remove any tracking images
uline = re.sub(u'', '', uline)
# set the title to something meaningful
timestamp = time.strftime("%m/%d/%y, %I:%M %p", time.localtime(time.time()))
uline = re.sub(u'.+?', 'Slate Magazine, ' + timestamp + 
'', uline)
uline = re.sub(u'Slate.com', 'Slate Magazine' + timestamp 
+ '', uline, 1)
# change the indicated charset
uline = re.sub(u'charset=utf-8', "charset=iso-8859-1", uline)
# and output the results
output = open('/tmp/slate.html', 'w')
# write it out, replacing any non-Latin-1 characters remaining with '?'
output.write(uline.encode('iso8859-1', 'replace'))
output.close()
'EOF'

plucker-build --verbosity=0 --zlib-compression -H file:/tmp/slate.html -f 
SlateMagazine -N "Slate Magazine" -M 1 --bpp=0 -p ~/.plucker




Re: Plucking Slate.com, a Python example

2001-11-03 Thread David A. Desrosiers


> The MSN change has affected Slate.com, an online magazine owned by MS.

Even their own page has broken links:
http://slate.msn.com/?id=111843 (many of these are broken now)

I used to use this link:
http://slate.msn.com/Code/TodaysPapers/TodaysPapersHHF.asp

That's also broken. I'll keep digging though. It's there.

Personally, I'm all for not supporting this at all, and letting
their readership drop like a rock. I don't like supporting a company that
consistantly lies to it's readers, then slaps those of us supporting
standards in the face like they've done recently. 

Let them hemmorage on their own. They're doing a good job of that.

/d





Re: Support for Japanese characters.-2

2001-11-03 Thread Nori Kanazawa

Hi Bill,

At Fri, 2 Nov 2001 19:31:30 PST,
Bill Janssen wrote:

> Is this for text documents, or do you also process S-JIS HTML?  Does
> the parser work OK on S-JIS HTML?

 What do you mean about S-JIS HTML?
# Sorry, I am not so familiar about HTML.
--
 Nori



Color support in CVS

2001-11-03 Thread Robert O'Connor

Code has been added to CVS, for the viewer and the parser to support color
elements.

A somewhat difficult merge--probably missed at least one thing somewhere
along the way.

If someone could test it out and see what things need correcting, especially
that the charset work by Bill still is working as expected.

Best wishes,
Robert




Re: behavior of missing tag?

2001-11-03 Thread MJ Ray

>   http://foo.bar.com/bletch.html#tag
> but there's no "tag" in bletch.html) is handled is to point the link
> to the beginning of the page which presumably would have contained the
> link (in the above example, to paragraph 0 of bletch.html).
[...]
> change the parser to treat such URLs as excluded, just as with any
> other URL which doesn't exist.

I have to support the other message which said that we should support
current (bad) practice, rather than taking a hardline on semantic bugs.  I'm
all for excluding browser-specific features, but using the page instead of
avoiding it for a missing id is probably arguably in the spirit of the spec,
if not the letter.
-- 
MJR