Re: Once again, we revisit the missing --stay{$FOO} options

2001-10-15 Thread Alys

On Sun, Oct 14, 2001 at 10:50:57PM -0700, David A. Desrosiers wrote:
[snip]
>   In any case, there's a missing option here (and always has been
> missing); --stayondomain. With --stayondomain=slashdot.org, for example,
> images.slashdot.org, www.slashdot.org, banjo.slashdot.org, and slashdot.org
> can be maintained, and you can "package up" the content so that it never
> leaves this domain. I could spider it to a maxdepth of 100, and be assured
> that it would never get out of hand and go offsite (yes, the file would be
> large, but it would be very self-contained).
[snip]
>   --stayondomain: Will never leave the network you specify, so that
>  www.foo.com, images.foo.com, and foo.com will all be
>  assumed to be included in the same "pluck". Content
>  from all "member domains" will be included.


Ummm... I've had some mostly-completed code that does pretty much
that sitting on my PC for a while now. I got to the stage of testing
it and thinking about the finer points but then was distracted by
an increase in my workload that still hasn't let up much. Sorry...

A diff -ur output is below, in case you find it of any use. The diff
command compares these two directories:
/usr/lib/python1.5/site-packages/PyPlucker-1.1-pure/  # unmodified code
/usr/lib/python1.5/site-packages/PyPlucker/   # my version

The changes have been made on what's probably an old version now:
Spider.py   $Id: Spider.py,v 1.31 2001/02/08 22:20:37 janssen Exp $

I'd be happy to make the same changes to the newest version if you
want me to, but I wouldn't be able to start for at least three weeks
(a massive project deadline is coming up). You might already have
written better code yourself though; this was my first attempt
at Python.

It has been tested a bit, but could probably do with some more. I've
been using my modified version every day for quite a while now with
no problems, but I haven't actually used the new stayondomain option
much at all.

The code also includes stayoffdomain and stayoffhost options which
I found useful for downloading the slashdot home page and the
non-slashdot articles without the slashdot comment pages.

There's also a small, unrelated hack to allow date/time information
in the filename and database name; I've left that in the diff output
to avoid messing up the line numbering.

In my code, the domain name is assumed to be the host
name without its first element.  e.g., for the host name
www.healthywaterways.env.qld.gov.au, the domain name is
healthywaterways.env.qld.gov.au. I did it that way instead of letting
the domain name be the last two parts of the host name, because that
obviously wouldn't work for hosts like
www.healthywaterways.env.qld.gov.au (you really DON'T want to
download all the Australian government web pages at once...).

The only exception to this is if the host name contains only two
parts, in which case the domain name will be the same as the host
name.  e.g., if the host name is specified as either www.cnet.com
or cnet.com, then the domain name will be cnet.com.

This is probably okay as a first approximation for the domain for
most web sites, but it would be nice to be able to give the user
some control over this. I can think of a few ways of doing this;
the user could:

1. specify the domain name explicitly (he could then
choose, for example, healthywaterways.env.qld.gov.au or
env.qld.gov.au);

2. specify how many elements should be chopped off the
front of the host to make the domain (e.g., chop off
1 element to get healthywaterways.env.qld.gov.au or 2
elements to get env.qld.gov.au);

3. specify how many elements counting back from the end
of the host are used to make the domain (e.g., 3 elements
get qld.gov.au, 4 elements to get env.qld.gov.au);

4. any of the above.

I'm not at all sure which of those options should be coded,
or what would be the least confusing way to present them in the
config files and command line parameters. I could come up with a
system for specifying them myself, but I have little experience
with designing software for mass use and so my system might not be
intuitive to anyone but myself. :)

Anyways, below my sig is the diff output, in case it's of any use.
Don't hesitate to ask me if you have any questions.

Alys

--
Alice Harris
Internet Services, CITEC, Brisbane, Australia
+61 7 322 22578
[EMAIL PROTECTED], [EMAIL PROTECTED]



diff -ru /usr/lib/python1.5/site-packages/PyPlucker/Spider.py 
/usr/lib/python1.5/site-packages/PyPlucker-1.1-pure/Spider.py
--- /usr/lib/python1.5/site-packages/PyPlucker/Spider.pyFri May 25 10:31:25 
2001
+++ /usr/lib/python1.5/site-packages/PyPlucker-1.1-pure/Spider.py   Thu May 31 
+09:23:59 2001
@@ -1,19 +1,10 @@
 #!/usr/bin/env python
-# TODO: check what happens if OFF and ON are used together for both HOST and DOMAIN
-# TODO: check combinations of d

Re: making plucker-build easier to use?

2001-10-15 Thread Bill Janssen

> Is there are a way to tell Python to output this in Binary mode?

Yes, I think I can do that.  Creating another handle on stdout with

  newhandle = os.fdopen(sys.stdout.fileno(), "wb")

should do the trick, even on Windows.

If you want to try it, change line 494 in parser/python/PyPlucker/Writer.py
from

self._pdb_file = prc.File (sys.stdout, read=0, write=1)

to

self._pdb_file = prc.File (os.fdopen(sys.stdout.fileno(), 'wb'), read=0, 
write=1)

Seems to work for me.

Bill



Re: added charset info to plucker doc; metadata record type

2001-10-15 Thread Bill Janssen

> Getting the Default charset (no Charset set in command line or
> plucker.ini) with Python 2.0b1 work fine but using Python 1.52 it does
> not work. Do you have an idea?

No, it doesn't work on UNIX either; the locale.getlocale() call isn't
in 1.5.2.  I just punt on that case, and say that the default charset
is unknown, which is a case we have to handle anyway.  Presumably
people are moving to Python 2.0, 2.1, and real soon now 2.2.

Bill




Re: making plucker-build easier to use?

2001-10-15 Thread Bill Janssen

> Why should the *status messages* go to the stderr (AFAIK the stderr
> are for error messages)?

That's the UNIX way.  The idea is that the output of one program is
the input of another program, so only the 'main output' gets written
to stdout, while various messages get sent to stderr, so that it can
be separately re-directed.

Now, we could argue about what the 'main output' of Plucker is, but
I'm proposing that it's the document, not the status messages :-).

In any case, I haven't removed any existing functionality, just taken
a couple of error cases and given them useful meaning, and made it a
bit easier for UNIX-heads to grok.

Bill




Re: added charset info to plucker doc; metadata record type

2001-10-15 Thread Dirk Heiser

"Bill" == Bill Janssen <[EMAIL PROTECTED]> writes:

Bill>   locale.setlocale(locale.LC_ALL, "")
Bill>   encoding = locale.getlocale()[1]

Bill> If that also fails to produce a charset, the default charset is left
Bill> unassigned.  (Using a Solaris 2.6 machine in California, the standard
Bill> operation for me is to have no default charset.)  Note that on POSIX
Bill> machines, the locale setting can be manipulated via the LANG
Bill> environment variable.

Bill> I'm uncertain of what to on Windows machines; I'm experimenting with
Bill> one right now to see what kind of possibilities there are.

Getting the Default charset (no Charset set in command line or
plucker.ini) with Python 2.0b1 work fine but using Python 1.52 it does
not work. Do you have an idea?

cu,
 Dirk

-- 
Permanent URLs to the latest Version (1.1.13) of the Plucker Windows installer
 - For the Webpage: http://www.dirk-heiser.de/plucker
 - Direct Download: http://www.dirk-heiser.de/plucker/plucker.exe [2.08MB]



Re: making plucker-build easier to use?

2001-10-15 Thread Dirk Heiser

"Dirk" == Dirk Heiser <[EMAIL PROTECTED]> writes:

Bill>>   plucker-build foo.txt >foo.pdb

And short note: I have tried the latest CVS version that do this, but
it fail on Windows because there all bytes "0A" are replaced with "0D
0A".

Is there are a way to tell Python to output this in Binary mode?


cu,
 Dirk

-- 
Permanent URLs to the latest Version (1.1.13) of the Plucker Windows installer
 - For the Webpage: http://www.dirk-heiser.de/plucker
 - Direct Download: http://www.dirk-heiser.de/plucker/plucker.exe [2.08MB]



Re: making plucker-build easier to use?

2001-10-15 Thread Dirk Heiser

"Bill" == Bill Janssen <[EMAIL PROTECTED]> writes:

Bill>   plucker-build foo.txt >foo.pdb

Bill> or

Bill>   plucker-build http://www.iana.org/assignments/character-sets >csets.pdb

Bill> ??

Just a question, why is it an improvement to send the binary stuff we
want as a file to the screen and redirect the screen to a file, and
were we get the status messages (will this messages stored in a file
;-)?

BTW: how does it work if the parser call an bevore_command and this
called program write to standard out?

Bill>   If no 'doc_file' is given, the DB is written to stdout.  Verbosity
Bill>   is set to zero.  (Bit of an issue there; we send a lot of status
Bill>   stuff to stdout right now that should go to stderr.)  

Why should the *status messages* go to the stderr (AFAIK the stderr
are for error messages)?


But maybe that "the way" doing it in LINUX ;-)

cu,
 Dirk

-- 
Permanent URLs to the latest Version (1.1.13) of the Plucker Windows installer
 - For the Webpage: http://www.dirk-heiser.de/plucker
 - Direct Download: http://www.dirk-heiser.de/plucker/plucker.exe [2.08MB]



RE: Once again, we revisit the missing --stay{$FOO} options

2001-10-15 Thread Robert O'Connor


>   --stayondomain: Will never leave the network you specify, so that
>  www.foo.com, images.foo.com, and foo.com will all be
>  assumed to be included in the same "pluck". Content
>  from all "member domains" will be included.
>
>   Sound feasable?

An excellent idea. Am surprised that something this useful isn't in the
parser already.

Best wishes,
Robert




Re: Plucker conduit

2001-10-15 Thread Dirk Heiser

"aardvarko" == aardvarko <[EMAIL PROTECTED]> writes:


>> Is anyone maintaining bmp2tbmp?

Yes, here

aardvarko> I E-mailed the creator asking about 16-bit support around
aardvarko> two months ago and haven't received any response.

Sorry i could not find your mail in my archive :-( But i just upload
version 1.1.13 with support for bpp16 (using the PIL Palm Plugin).

cu,
 Dirk

-- 
Permanent URLs to the latest Version (1.1.13) of the Plucker Windows installer
 - For the Webpage: http://www.dirk-heiser.de/plucker
 - Direct Download: http://www.dirk-heiser.de/plucker/plucker.exe [2.08MB]



ANN: Plucker Installer for Windows Version 1.1.13 is now available!

2001-10-15 Thread Dirk Heiser

Plucker Installer for Windows Version 1.1.13 is now available from
  - http://www.dirk-heiser.de/plucker/Plucker-1.1.13.exe (2.08 MB)


Please note:

 - The file are stored on the free webspace provided by prohosting.
   And because they want to show you some ads you need to open the URL
   in your browser and may switch of your download managers. (Sorry but
   i currently do not own other web space)
   
 - This is the first version that use the PIL libary to convert the
   images to the palm format (ImageMagick are no longer used). I case of
   problems in converting inages please report the bug with an example
   and i try to fix it fast.

  
Whats new:[ + Added feature]
  [ - Bug fixed]
  [ * Improved/changed feature ]


  1.1.13:
   All Files updated to the ones from "plucker_bin-1.1.13.tar.bz2".
Installer:
 [+] Added readme for pluck-comics.
 [+] "Whats new in Plucker" page added.
 [*] Some spelling errors fixed.
 [-] Now set the "compression" key (DOC or ZLIB) and no longer "zlib_compression"
 [*] Now use PIL for the image conversation and no longer ImageMagick
Files:
 [-] updated DBTool (Delete Database) to use the new doc_file and doc_name
 ini keys.
 [+] Add URL for the plucker-list archive to the help file.
 [+] Now support for bpp16 (High Color).
 NOTE: If you set bpp16 and still see grayscale pictures, its because
 the pictures are to large (in bytes) and are converted to grayscale. See
 the "try_reduce_bpp" ini key. Also please note that the tbmp_compression are
 not supported in bpp16.
 [*] Better handling for transparent pictures. (sometime a picture paint
 black on transparent backround end in a black on black picture, thats now 
fixed)

  1.1.11SR1:
Installer:
 [-] Installer failed to update the bmp_to_tbmp_parameter key.
 This cause that the convertation of images no longer work.
Files:
 [-] Boolean ini keys do not work if specified as true or false
 only 1 or 0 works.
 [-] the category key are now empty as default and not set to "Unfiled"
 [*] Help files updated:
 - Added links to some tools (INIPluck/Sitescooper)
 - Added link to http://bugs.plkr.org


Permanent URLs to the latest Version (1.1.13) of the Plucker Windows installer
 - For the Webpage: http://www.dirk-heiser.de/plucker
 - Direct Download: http://www.dirk-heiser.de/plucker/plucker.exe [2.08MB]




RE: Plucker conduit

2001-10-15 Thread aardvarko

> According to http://sourceforge.net/projects/netpbm/, netpbm
> will run on both Windows and Linux, and the 16-bit support is
> already in netpbm (as is support for the bmp format).

I was referring to the Win32 executable 'Bmp2Tbmp.exe', made by Dirk Heiser
(isn't he a listmember and Plucker contributor?)
(http://www.dirk-heiser.de/Bmp2Tbmp/) - does the netpbm version work as a
swap-in replacement?

--
-aardvarko
http://aardvarko.com
webmaster at aardvarko dot com


> -Original Message-
> From: Bill Janssen [mailto:[EMAIL PROTECTED]]
> Sent: Monday, October 15, 2001 13:49
> To: Plucker Development List
> Subject: Re: Plucker conduit




Re: Plucker conduit

2001-10-15 Thread Bill Nalen/Towers Perrin


Right.  I've built one piece of it, but the configuration on a non-*nix
system is complicated.  Then I've got to figure out how to stream to memory
instead of disk.  The code to stdout seems to be throughout the code that
I've looked at.

Bill




   

   

  To: Plucker Development List 
<[EMAIL PROTECTED]> 
Bill Janssen  cc: (bcc: Bill Nalen/Towers Perrin)  

 

   

10/15/2001 

02:48 PM   

Please respond 

to Plucker 

Development

List   

   

   




According to http://sourceforge.net/projects/netpbm/, netpbm
will run on both Windows and Linux, and the 16-bit support is
already in netpbm (as is support for the bmp format).

Bill

> > Is anyone maintaining bmp2tbmp?
>
> I E-mailed the creator asking about 16-bit support around two months ago
and
> haven't received any response.
>
> --
> -aardvarko







Re: Plucker conduit

2001-10-15 Thread Bill Janssen

According to http://sourceforge.net/projects/netpbm/, netpbm
will run on both Windows and Linux, and the 16-bit support is
already in netpbm (as is support for the bmp format).

Bill

> > Is anyone maintaining bmp2tbmp?
> 
> I E-mailed the creator asking about 16-bit support around two months ago and
> haven't received any response.
> 
> --
> -aardvarko



RE: Plucker conduit

2001-10-15 Thread aardvarko

> Is anyone maintaining bmp2tbmp?

I E-mailed the creator asking about 16-bit support around two months ago and
haven't received any response.

--
-aardvarko
http://aardvarko.com
webmaster at aardvarko dot com


> -Original Message-
> From: Bill Janssen [mailto:[EMAIL PROTECTED]]
> Sent: Monday, October 15, 2001 11:57
> To: Plucker Development List
> Subject: Re: Plucker conduit




Re: Question about plucker 1.1.13

2001-10-15 Thread Bill Janssen

> I am seeing this error:
> 
> Runtime error parsing document http://www.pokemon.com/images/nintendo_logo.gif: 
> Can't determine image size from output of ImageMagick 'identify' program:  /ldat
> [EMAIL PROTECTED] 85x21+0+0 PseudoClass 32c 610b GIF 0.0u 0:00
> 

This looks like a bug in the parsing of that output line.  I've fixed it,
and checked it in.  I've also added the size checking on the images.

Bill



update question

2001-10-15 Thread Larry W. Virden

Just a few minutes ago, Chris Hawks posted a fix to the ImageMagick identify
problem that plucker 1.1.13 has with the older identify.

However, I'm uncertain if just modifying the installed .py file is sufficient.
When I did that, it _looks_ as if plucker no longer is plucking...  Is
there some additional step I need to take?
-- 
Never apply a Star Trek solution to a Babylon 5 problem.
Larry W. Virden  http://www.purl.org/NET/lvirden/>
Even if explicitly stated to the contrary, nothing in this posting should 
be construed as representing my employer's opinions.
-><-



Re: Plucker conduit

2001-10-15 Thread Bill Nalen/Towers Perrin


I was going to start working on including the netpbm code directly into my
code so it wouldn't have to save to disk as an intermediate step.  I found
the netpbm code very difficult to compile under Windows though.

Bill




   

   

  To: Plucker Development List 
<[EMAIL PROTECTED]> 
Bill Janssen  cc: (bcc: Bill Nalen/Towers Perrin)  

 

   

10/15/2001 

12:56 PM   

Please respond 

to Plucker 

Development

List   

   

   




> I've also got the images working using calls to imagemagick and
> bmp2tbmp for now.

Is anyone maintaining bmp2tbmp?  The code in netpbm and PIL is to be
preferred right now.  Alternatively, someone could port my netpbm code
to work in ImageMagick natively.

Bill









Re: mailto links update

2001-10-15 Thread MJ Ray

"David A. Desrosiers" <[EMAIL PROTECTED]> writes:

>   By non-standard you must mean "not used often". They're used quite a
> bit in academia, and are also in rfc2368.
>   http://www.ics.uci.edu/pub/ietf/uri/rfc2368.txt

I'll read that RFC.  It seems my knowledge has been obsoleted again.
I remember a bulletin from either WaSP or IACT being particularly
vicious about them in the past.

> > On an semi-related point, is anyone else seeing plucker occasionally
> > goof when presented with a long URL to copy to memo?
>   How long?

Around the 200-character length.  It seems to "lose" the front of the
URL.  I'm not sure which version of the viewer the afflicted palm is
using, though.

-- 
MJR

Do you need advice about the Internet or particular net services?  Why
not talk to my employers?  See http://www.luminas.co.uk/ for details.



Re: Plucker Desktop GUI Manager

2001-10-15 Thread MJ Ray

Andy Rabagliati <[EMAIL PROTECTED]> writes:

> Well, I am fond of sitescooper, because perl is less esoteric than
> python, and sitescooper can slice and dice better than the plucker
> frontend - picking out certain porions of a page, for example, or
> converting the URL to the "print format" on the fly.

I feel that I simply have to comment about perl being described as
"less esoteric than python".  How do you decide that?  I've seen some
really disgusting perl, but only mildly distasteful python.  Both have
their flaws, though.

> Scoop ... transcode ... view.
> My vote would be for "channel" or "scoop".

Would we risk confusing the issues if we call them "scoops" instead of
"plucks"?

-- 
MJR

Do you need advice about the Internet or particular net services?  Why
not talk to my employers?  See http://www.luminas.co.uk/ for details.



Re: Plucker conduit

2001-10-15 Thread Bill Nalen/Towers Perrin


Sorry, my mistake, it just uses the BSD style socket api calls, which I
think are fairly standard across platforms as opposed to the Windows socket
library calls.



   

   

  To: Plucker Development List 
<[EMAIL PROTECTED]> 
"David A. cc: (bcc: Bill Nalen/Towers Perrin)  

Desrosiers"   Subject: Re: Plucker conduit 

 

   

10/15/2001 

12:28 PM   

Please respond 

to Plucker 

Development

List   

   

   





> Although I've done this for my company, the code should be GPL still
> since it's just a port (spelling mistakes and all :-)

   You mentioned there's some BSD code in there. Is that code under
the
BSD license? Or is it *nix code, developed under BSD, using the GPL
license?



/d









Re: Question about plucker 1.1.13

2001-10-15 Thread Chris Hawks

---On Mon, 15 Oct 2001 12:33:22 -0400 (EDT),  Larry W. Virden said

> I updated today to the latest version of plucker (I somehow missed the 
> announcement for it until now).
> 
> I am seeing this error:
> 
> Runtime error parsing document http://www.pokemon.com/images/nintendo_logo.gif:
> 
> Can't determine image size from output of ImageMagick 'identify' program: 
> /ldat
> [EMAIL PROTECTED] 85x21+0+0 PseudoClass 32c 610b GIF 0.0u 0:00

I just upgraded recently as well and saw the error. Change line #135(?) in
the ImageParser.py file from:
match = re.search(r"\s([0-9]+)x([0-9]+)\s", info)
to:
match = re.search(r"\D([0-9]+)x([0-9]+)\D", info)

evidently, different versions of identify format the output differently.
This should work in any case.

> I am plucking a document for a Palm Vx running PalmOS 3.5.3 .  I currently
> have plucker set for 4 bit gray (as I think that's all the Palm Vx
> supports).  I also have plucker set up to use netpbm2 .

altho, this in in the imagemagick conversion routines _not_ netpbm2.

--re: Question about plucker 1.1.13
Chris

Christopher R. Hawks Software Engineer
Syscon Plantstar a Division of Syscon International
-
> Linux is not user-friendly. 
It _is_ user-friendly.  It is not ignorant-friendly and idiot-friendly.
-- Seen somewhere on the net







Re: Attempt to report broken link fails...

2001-10-15 Thread David A. Desrosiers


> When i attempted to follow the advice of the plkr.org web site in
> reporting a broken link, I got this error msg...

Fixed. anti-SirCam measures.


/d





Re: Plucker conduit

2001-10-15 Thread Bill Janssen

> I've also got the images working using calls to imagemagick and
> bmp2tbmp for now.

Is anyone maintaining bmp2tbmp?  The code in netpbm and PIL is to be
preferred right now.  Alternatively, someone could port my netpbm code
to work in ImageMagick natively.

Bill





Re: mailto links update

2001-10-15 Thread Bill Janssen

> 
>   This may be in Chris' court, and if he doesn't have time to look at
> it, I'll poke around in emailform.c, but.. subject and mailto modifiers are
> ignored when they're parsed into Plucker. Example:
> 
>   mailto:[EMAIL PROTECTED]?Subject=help";>Help
> 
>   When clicked in Plucker, brings up the proper "To:" address, pulled
> from the record, but does not populate either the Cc: or Subject: of the
> form displayed. A feature add? With the addition of this 'feature', we could
> start populating some "online" bug reports for sites, and content providers
> could "load" mailto links with useful information.

I think this is a bug, not a feature -- Plucker should definitely
parse mailto URLs properly.  Let's report it, and I'll take a look at
it.

Bill



Re: 1.1.13 fall down, go boom

2001-10-15 Thread Bill Janssen

> The Plucker document you have created is not according to the spec.
> Some simple math will tell you why. At 8bpp you can (at this moment)
> only use quite small images, since they MUST be less than 64k
> uncompressed (the viewer will still not display the image if it is
> more than 6 bytes, though.)

We've been meaning to put in a check for this for months.  I'll do
it today.

Bill



Attempt to report broken link fails...

2001-10-15 Thread Larry W. Virden

When i attempted to follow the advice of the plkr.org web site in
reporting a broken link, I got this error msg...


Forwarded mail follows:
>From MAILER-DAEMON  Mon Oct 15 12:40:11 2001
Date: Mon, 15 Oct 2001 12:40:11 -0400 (EDT)
>From: Mail Delivery Subsystem <[EMAIL PROTECTED]>
Message-Id: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
MIME-Version: 1.0
Subject: Returned mail: see transcript for details
Auto-Submitted: auto-generated (failure)

This is a MIME-encapsulated message

--f9FGeB713383.1003164011/srv01.cas.org

The original message was received at Mon, 15 Oct 2001 12:39:59 -0400 (EDT)
from lwv26awu [134.243.55.72]

   - The following addresses had permanent fatal errors -
<[EMAIL PROTECTED]>
(reason: 550 5.1.1 <[EMAIL PROTECTED]>... User unknown)

   - Transcript of session follows -
... while talking to gnu-designs.com.:
>>> RCPT To:<[EMAIL PROTECTED]>
<<< 550 5.1.1 <[EMAIL PROTECTED]>... User unknown
550 5.1.1 <[EMAIL PROTECTED]>... User unknown

--f9FGeB713383.1003164011/srv01.cas.org
Content-Type: message/delivery-status

Reporting-MTA: dns; srv01.cas.org
Received-From-MTA: DNS; lwv26awu
Arrival-Date: Mon, 15 Oct 2001 12:39:59 -0400 (EDT)

Final-Recipient: RFC822; [EMAIL PROTECTED]
Action: failed
Status: 5.1.1
Remote-MTA: DNS; gnu-designs.com
Diagnostic-Code: SMTP; 550 5.1.1 <[EMAIL PROTECTED]>... User unknown
Last-Attempt-Date: Mon, 15 Oct 2001 12:40:11 -0400 (EDT)

--f9FGeB713383.1003164011/srv01.cas.org
Content-Type: message/rfc822

Return-Path: <[EMAIL PROTECTED]>
Received: from lwv26awu.cas.org (lwv26awu [134.243.55.72])
by srv01.cas.org (8.10.2+Sun/m4_8.9.3/CAS_MAIL_HUB-1.14) with ESMTP id 
f9FGdx713381
for <[EMAIL PROTECTED]>; Mon, 15 Oct 2001 12:39:59 -0400 (EDT)
Received: from cas.org (localhost [127.0.0.1])
by lwv26awu.cas.org (8.10.2+Sun/m4_8.9.3/CAS_CLIENT-1.17) with ESMTP id 
f9FGdvN12406
for <[EMAIL PROTECTED]>; Mon, 15 Oct 2001 12:39:57 -0400 (EDT)
Sender: [EMAIL PROTECTED]
Message-ID: <[EMAIL PROTECTED]>
Date: Mon, 15 Oct 2001 12:39:57 -0400
>From: "Larry W. Virden" <[EMAIL PROTECTED]>
Organization: Nedriv Software and Shoe Shiners, Uninc.
X-Mailer: Mozilla 4.78 [en] (X11; U; SunOS 5.8 sun4u)
X-Accept-Language: en
MIME-Version: 1.0
To: [EMAIL PROTECTED]
Subject: BROKEN LINK
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Broken link found on Monday, October 15, 2001 - 09:39:10 AM (Link was:
/index.pl/links clicked from: http://www.plkr.org/)

-- 
"I know of vanishingly few people ... who choose to use ksh." "I'm a
minority!"
mailto:[EMAIL PROTECTED]> http://www.purl.org/NET/lvirden/>
Even if explicitly stated to the contrary, nothing in this posting
should be construed as representing my employer's opinions.
-><-

--f9FGeB713383.1003164011/srv01.cas.org--

-- 
Never apply a Star Trek solution to a Babylon 5 problem.
Larry W. Virden  http://www.purl.org/NET/lvirden/>
Even if explicitly stated to the contrary, nothing in this posting should 
be construed as representing my employer's opinions.
-><-



Note on the plucker requirements doc

2001-10-15 Thread Larry W. Virden


In the plucker requirements document, the link for imagemagic should change now
to http://www.imagemagick.org/ .
-- 
Never apply a Star Trek solution to a Babylon 5 problem.
Larry W. Virden  http://www.purl.org/NET/lvirden/>
Even if explicitly stated to the contrary, nothing in this posting should 
be construed as representing my employer's opinions.
-><-



Question about plucker 1.1.13

2001-10-15 Thread Larry W. Virden

I updated today to the latest version of plucker (I somehow missed the 
announcement for it until now).

I am seeing this error:

Runtime error parsing document http://www.pokemon.com/images/nintendo_logo.gif: 
Can't determine image size from output of ImageMagick 'identify' program:  /ldat
[EMAIL PROTECTED] 85x21+0+0 PseudoClass 32c 610b GIF 0.0u 0:00

xv reports this image as being GIF89, 5 bits per pixel, interlaced.
 (610 bytes)
85x21 resolution
No cropping
Expansion: 108.24% x 100% (92x21)
16 colors

I am uncertain what, if anything, I am supposed to do with this error.

I am plucking a document for a Palm Vx running PalmOS 3.5.3 .  I currently
have plucker set for 4 bit gray (as I think that's all the Palm Vx
supports).  I also have plucker set up to use netpbm2 .
-- 
Never apply a Star Trek solution to a Babylon 5 problem.
Larry W. Virden  http://www.purl.org/NET/lvirden/>
Even if explicitly stated to the contrary, nothing in this posting should 
be construed as representing my employer's opinions.
-><-



Re: Plucker conduit

2001-10-15 Thread David A. Desrosiers


> Although I've done this for my company, the code should be GPL still
> since it's just a port (spelling mistakes and all :-)

You mentioned there's some BSD code in there. Is that code under the
BSD license? Or is it *nix code, developed under BSD, using the GPL license?



/d





Re: mailto links update

2001-10-15 Thread David A. Desrosiers


> If you change "Subject" to "subject" it will work just fine, i.e. it is
> a minor "case sensitive" problem in the parser. Take a look at the
> 'parse' function in PluckerDocs.py if you want to fix it.

Hrm, thanks Mike, I'll try it.



/d






Re: http unknown error

2001-10-15 Thread Zailong Bian

404 is document not found. 

If it is not really the case (test it with your browser), you can assume the
parse is broken--cannot get the url right from the file...something to do with
the proxy?

Zailong
--- [EMAIL PROTECTED] wrote:
> Hi,
> 
> I am trying to use Plucker for the first time and I'm getting the same
> error as Anthony.  I couldn't find his response nor a final resolution, so
> I thought I'd stand in.
> 
> First, I'm running Windows NT 4.0 sp 6 on a WAN using a proxy server.  I
> did install the latest Windows installer.
> 
> When I right-clicked the home.html file and selected "Convert to Plucker",
> as you suggested, I got the following messages (for privacy/security, I
> replaced my proxy info with "myProxy.com:"):
> 
> Working for pluckerdir C:\Program Files\Plucker\Daily.DB
> Using proxy 'myProxy.com:'
> Processing file:C:\Program Files\Plucker\Daily.DB\home.html.
>0 collected, 0 still to do
>   Retrieved ok
> Processing http://www.dispatch.com/football/football.php.
>1 collected, 0 still to do
>   Retrieved failed: 404 -- [Errno url error] unknown url type: 'http'
> 
> Writing out collected data...
> Writing document 'Daily' to file C:\Program
> Files\Plucker\Daily.DB\home.html.pdb
> 
> Converted file:C:\Program Files\Plucker\Daily.DB\home.html
> Wrote 1 <= plucker:/~special~/index
> Wrote 2 <= file:C:\Program Files\Plucker\Daily.DB\home.html
> Wrote 3 <= plucker:/~special~/pluckerlinks
> Wrote 12 <= plucker:/~special~/links1
> Done!
> Press any key to continue...
> 
> Any clues?
> 
> Thanks,
> Gary
> 
> -
> "Anthony" == Anthony Schellenberg <[EMAIL PROTECTED]> writes:
> 
> 
> Anthony> Hi, I've been using plucker for a couple of months. I just
> Anthony> reimaged my harddrive yesturday and today I was setting up my
> Anthony> plucker stuff. I set up the ini files, added my http proxy
> Anthony> server info, and copied in my home.html file exactly as it
> Anthony> was before. When I try to build my file I get this error
> Anthony> message for every link I have on my home page:
> 
> 
> Do you running on Windows? (you say INI file, so i guess yes) Have you
> tryed to use the latest installer (see my sig)?
> 
> 
> Anthony> Retrieved failed: 404 -- [Errno url error] unknown url type:
> 'http'
> 
> 
> Try to use the right click menu item "convert to plucker" on an HTML
> file in the Explorer and tell me the result please.
> 
> 
> cu,
>  Dirk
> 
> 
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is
> for the sole use of the intended recipient(s) and may contain confidential
> and privileged information.  Any unauthorized review, use, disclosure or
> distribution is prohibited.  If you are not the intended recipient, please
> contact the sender by reply e-mail and destroy all copies of the original
> message.
> 


__
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com



Re: mailto links update

2001-10-15 Thread Michael Nordström

On Sun, Oct 14, 2001, David A. Desrosiers wrote:

>   mailto:[EMAIL PROTECTED]?Subject=help";>Help

If you change "Subject" to "subject" it will work just fine, i.e.
it is a minor "case sensitive" problem in the parser. Take a look
at the 'parse' function in PluckerDocs.py if you want to fix it.

/Mike



Re: Plucker Desktop GUI Manager

2001-10-15 Thread Michael Nordström

On Sat, Oct 13, 2001, David A. Desrosiers wrote:
> I'm not sure that this should be inside the Plucker cvs directly,
> even though it's a supporting tool, it's not part of the same codebase.

Isn't that the reason to why we have a tools directory? It could be 
added to that dir together with a Makefile. Since it will use the 
python parser I would say it is close enough to be "part of the
codebase".

/Mike



Re: http unknown error

2001-10-15 Thread Gary . Reiner

Hi,

I am trying to use Plucker for the first time and I'm getting the same
error as Anthony.  I couldn't find his response nor a final resolution, so
I thought I'd stand in.

First, I'm running Windows NT 4.0 sp 6 on a WAN using a proxy server.  I
did install the latest Windows installer.

When I right-clicked the home.html file and selected "Convert to Plucker",
as you suggested, I got the following messages (for privacy/security, I
replaced my proxy info with "myProxy.com:"):

Working for pluckerdir C:\Program Files\Plucker\Daily.DB
Using proxy 'myProxy.com:'
Processing file:C:\Program Files\Plucker\Daily.DB\home.html.
   0 collected, 0 still to do
  Retrieved ok
Processing http://www.dispatch.com/football/football.php.
   1 collected, 0 still to do
  Retrieved failed: 404 -- [Errno url error] unknown url type: 'http'

Writing out collected data...
Writing document 'Daily' to file C:\Program
Files\Plucker\Daily.DB\home.html.pdb

Converted file:C:\Program Files\Plucker\Daily.DB\home.html
Wrote 1 <= plucker:/~special~/index
Wrote 2 <= file:C:\Program Files\Plucker\Daily.DB\home.html
Wrote 3 <= plucker:/~special~/pluckerlinks
Wrote 12 <= plucker:/~special~/links1
Done!
Press any key to continue...

Any clues?

Thanks,
Gary

-
"Anthony" == Anthony Schellenberg <[EMAIL PROTECTED]> writes:


Anthony> Hi, I've been using plucker for a couple of months. I just
Anthony> reimaged my harddrive yesturday and today I was setting up my
Anthony> plucker stuff. I set up the ini files, added my http proxy
Anthony> server info, and copied in my home.html file exactly as it
Anthony> was before. When I try to build my file I get this error
Anthony> message for every link I have on my home page:


Do you running on Windows? (you say INI file, so i guess yes) Have you
tryed to use the latest installer (see my sig)?


Anthony> Retrieved failed: 404 -- [Errno url error] unknown url type:
'http'


Try to use the right click menu item "convert to plucker" on an HTML
file in the Explorer and tell me the result please.


cu,
 Dirk


CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is
for the sole use of the intended recipient(s) and may contain confidential
and privileged information.  Any unauthorized review, use, disclosure or
distribution is prohibited.  If you are not the intended recipient, please
contact the sender by reply e-mail and destroy all copies of the original
message.




Re: Plucker Desktop GUI Manager

2001-10-15 Thread Larry W. Virden


From: MJ Ray <[EMAIL PROTECTED]>

>"Robert O'Connor" <[EMAIL PROTECTED]> writes:
>
>> I am certainly open to what to call the things other than channels; it is
>> easier to specify the term now than more time spent rewriting things. [...]
>
>I vote for "a pluckable (site)", rather than a channel.  

Well, if we are looking for terms, what kinds of things does one pluck
in the real world?

Feathers (of a bird while preparing for consumption)
Strings (of an instrument while performing music )
Marks (as in victims of a crime)

I think the term is also used in magic, as one 'plucks things from thin air'...


-- 
Never apply a Star Trek solution to a Babylon 5 problem.
Larry W. Virden  http://www.purl.org/NET/lvirden/>
Even if explicitly stated to the contrary, nothing in this posting should 
be construed as representing my employer's opinions.
-><-



Plucker conduit

2001-10-15 Thread Bill Nalen/Towers Perrin


After reading the many emails regarding syncing lately, I thought I'd
reiterate the project I'm working on.  I have an almost fully ported
version of the Plucker parser that works standalone or as a full Palm
conduit.  I've done this for Windows, but the only code that is windows
specific is the page fetching (and I've got some BSD code in there for
non-windows builds).  It seems to work pretty well.  I'm at the point where
I'm testing and cleaning up the code for mistakes in the html  parsing
part.  I've also got the images working using calls to imagemagick and
bmp2tbmp for now.  Those are the only dependencies too.  The rest of the
code is self contained in the exe or conduit dll.

Is there an interest in this?  I've done this because we want to sync some
of our internal web content to the Palm and our users are savy enough to
install python, tcl, and the plucker stuff.  Although I've done this for my
company, the code should be GPL still since it's just a port (spelling
mistakes and all :-)

Bill





Send me please the latest version of Plucker

2001-10-15 Thread Ciorici, Vitalii

I have not acces to inet but only to email. Send me please the zip file with
Plucker for Windows. Thank you!!!

Best regards, 
   Vitali CIORICI




Re: Once again, we revisit the missing --stay{$FOO} options

2001-10-15 Thread Kjetil Torgrim Homme

"David A. Desrosiers" <[EMAIL PROTECTED]> writes:

>   Here's what I propose:
> 
>   --stayonhost: Will not ever leave the FQDN you specify in your -H
>syntax.
> 
>   --staybelow: (should take a URI as an argument, not a URL) Will
>  restrict ascention to the supplied URL as a parent.

Is this different from maxdepth 1?  I'm a bit confused about what you
said about up and down.  I think --stayonhost should be a special case
for --staybelow=http://fqdn/.  Hmm, perhaps not: what should be done
about references to ftp://fqdn/?  With --stayonhost they should
included, with --staybelow they should not.

>   --stayondomain: Will never leave the network you specify, so that
>  www.foo.com, images.foo.com, and foo.com will all be
>  assumed to be included in the same "pluck". Content
>  from all "member domains" will be included.

I hacked in this feature in WWWOFFLE, for the same reasons (well, IGN
rather than slashdot).


Kjetil T.



Re: Plucker Desktop GUI Manager

2001-10-15 Thread Andy Rabagliati

On Sun, 14 Oct 2001, Robert O'Connor wrote:

> > > All the other GUI subscribed portable Palm web systems I have ever seen
> > > use the term channel, probably for easier first user/migration, and the
> > > fact that new people are familiar with the concept of a 'channel'
> >
> > The only one I know of that does this is AvantGo.
> >
> >  Sitescooper uses "scoops" or "nightly scoops", for example. I'm not
> > aware of any others that actually ACTIVELY pull content from sites
> > and parse it for a digestable view on the Palm.

Scoop has the connotation of News, and fresh news at that. However,
many of plucker users are not native english speakers, and maybe
it sounds like Flooz.

> I certainly am not a lawyer, but I don't think there can possibly be a
> trademark on the term 'channel'. SiteScooper isn't reknowned for being
> the easiest of the cabal to be used.

Well, I am fond of sitescooper, because perl is less esoteric than
python, and sitescooper can slice and dice better than the plucker
frontend - picking out certain porions of a page, for example, or
converting the URL to the "print format" on the fly.

> >  With Plucker, the user initiates a script/gui/whatever, which goes
> > out on the LIVE internet, gathers content from live sites, which may
> > be up, down, or slow, and then parses it locally, which creates a
> > file they have to sync to their Palm.
> 
> There is certainly a different achitecture, in an AvantGo push vs. Plucker
> pull model. From the casual user point of view though, the results on the
> Palm look indistinguishable.

The results are indistinguishable, but, surfing from a country where
we pay for phone calls, (South Africa) I *really* like to be able to
do the scooping from a cronjob, and the syncing separately.

> >  Calling what we do a "channel" is misleading in this aspect.
> > Channels in the vein of telecommunications or television, indicates
> > that you have something "canned" for your viewing pleasure (or
> > displeasure of late), which has been audited/edited to suit your
> > medium. This doesn't apply in our case, for a majority (most) of our
> > users who use Plucker.
> 
> I disagree with you on this. This is exactly what Plucker is doing,
> and what is supposed to be doing. I tell it the source of the
> information, then it is auditing out the javascript, the HTML
> comments, and the other crap I don't want, and formatting it to suit
> my medium which is a Palm pilot with a small screen and low processing
> power.

The power is in your hands - where it is not on TV. However, a TV
channel represents the only power the viewer has - to change channels.

> I am quite happy myself with a full-client side solution. That certainly
> shouldn't exclude other mechanisms which would be useful for others, similar
> to the way we have parsers for python, but also different languages.

I personally must have an agent to do the transcoding.

Scoop ... transcode ... view.

My vote would be for "channel" or "scoop".

Cheers,Andy!

-- 
Free Palm game - http://www.wizzy.com/owari/



Re: mailto links update

2001-10-15 Thread David A. Desrosiers


> I'm mixed about this.  Those modifiers are distinctly non-standard and
> I'm unsure that they're used in any consistent manner by the world at
> large.

By non-standard you must mean "not used often". They're used quite a
bit in academia, and are also in rfc2368.

http://www.ics.uci.edu/pub/ietf/uri/rfc2368.txt

I also found a few other random references:
http://www.sightspecific.com/~mosh/WWW_FAQ/mailsubj.html
http://lists.w3.org/Archives/Public/www-talk/1998JulAug/0002.html

> On an semi-related point, is anyone else seeing plucker occasionally
> goof when presented with a long URL to copy to memo?

How long?



/d





Re: mailto links update

2001-10-15 Thread MJ Ray

"David A. Desrosiers" <[EMAIL PROTECTED]> writes:

>   This may be in Chris' court, and if he doesn't have time to look at
> it, I'll poke around in emailform.c, but.. subject and mailto modifiers are
> ignored when they're parsed into Plucker. Example:

I'm mixed about this.  Those modifiers are distinctly non-standard and
I'm unsure that they're used in any consistent manner by the world at
large.

On an semi-related point, is anyone else seeing plucker occasionally
goof when presented with a long URL to copy to memo?

-- 
MJR