Re: HTML to text

2003-06-30 Thread Mark Overmeer
* Ian Malpass ([EMAIL PROTECTED]) [030629 20:59]:
 I'm trying to write something that converts HTML into nicely formatted
 text.

 $fh-open(lynx -dump -stdin  temp.txt |)

HTML::FormatText produces a nice result.
-- 
   MarkOv   %-]


drs Mark A.C.J. OvermeerMARKOV Solutions
   [EMAIL PROTECTED]  [EMAIL PROTECTED]
http://Mark.Overmeer.net   http://solutions.overmeer.net



Re: The Community Guide to Birmingham

2003-06-30 Thread David Hodgkinson
On Sunday, June 29, 2003, at 03:35 PM, Jody Belka wrote:
right now there isn't any content whatsoever on the site, so anyone in 
the
birmingham area or who knows the birmingham area please feel welcome to
come along and add something.

Jody,

There's a very active Brum community on Ecademy (yeah, I know) run by
Donato Esposito who is a top bloke, you might tap into that.
Dave




Re: Using LWP for protected pages

2003-06-30 Thread Piers
Colin Magee said:

 Maybe I can't do this using this module? Does anyone have experience of
 trying to access protected pages via a login who could help point me in
 the right direction?

 Thanks
 Colin Magee

Well time to jump in I think. hello list!



Nothing to do with LWP I'm afraid, but in the end the most fool proof
route is to start with a sniffer (you can get e.g. ethereal for windows, I
believe), and see exactly what is being sent with the query for the
protected page.

Things to think about are, Referer: headers, cookies set via javascript,
hidden form fields.

I speak from experience of trying (and succeeding in the end) to automate
sending a text via o2.co.uk's text service. I didn't use LWP, (prefering
to roll my own for my 'home' projects), I used more 'raw' perl, wrapping
openssl's s_client function as a co-process.

There is _no_ more bizarre, obscure, over-complex (and technically broken)
login process than o2's web site. After logging in you get bounced (via
_both_ 302 redirects, and javascript automatic form submissions) in turn
to about three different login servers. Some of the cookies set have a
path or domain component that is so far as I could see illegal. You have
to go SSL and not SSL. Anyway it was fun but now you get about 5 free
texts a month, not worth it!


-
Piers
(a different one!)






Re: Using LWP for protected pages

2003-06-30 Thread Aaron Trevena
On Sun, 29 Jun 2003, Leon Brocard wrote:
  Anyone got any time to write a Javascript library and integrate it into
  WWW:Mechanize?

 Handily, the mozilla guys went and wrote a JavaScript library for us:
 http://www.mozilla.org/js/spidermonkey/

 And waddaya know, it already has a Perl wrapper:
 http://search.cpan.org/author/CLAESJAC/JavaScript/


You can also use the web scraping proxy mentioned in last months dr dobbs
(src code available at the site as usual) I was thinking of using it to
build a test suite for our site (regression testing is a long and tedious
job that could be automated for 70%) as well as monitoring (checking a
quote or result appears rather than a 500, etc)

A.

-- 
Aaron J Trevena - Perl Hacker, Kung Fu Geek, Internet Consultant
AutoDia --- Automatic UML and HTML Specifications from Perl, C++
and Any Datasource with a Handler. http://droogs.org/autodia



Re: HTML to text

2003-06-30 Thread Struan Donald
* at 30/06 10:32 +0200 Mark Overmeer said:
 * Ian Malpass ([EMAIL PROTECTED]) [030629 20:59]:
  I'm trying to write something that converts HTML into nicely formatted
  text.
 
  $fh-open(lynx -dump -stdin  temp.txt |)
 
 HTML::FormatText produces a nice result.

Although in order to get it to produce lynx like output some
subclassing is needed to handle links:

package HTML::MyFormatText;

use strict;
use URI::WithBase;
use base qw(HTML::FormatText);

sub configure {
my $self = shift;
my $hash = shift;

# a base uri so we can resolve relative uris

$self-{base} = $hash-{base};
delete $hash-{base};
$self-{base} =~ s#(.*?)/[^/]*$#$1/#;
$self-SUPER::configure($hash);
}

sub a_start {
my $self = shift;
my $node = shift;
# local urls are no use so we have to make them absolute
my $href = $node-attr('href') || '';
if ($href =~ m#^http:|^mailto:#) {
push @{$self-{_links}}, $href;
} else {
my $u = URI::WithBase-new($href, $self-{base});
push @{$self-{_links}}, $u-abs();
}
$self-out( '[' . $#{$self-{_links}} .'] ' );

$self-SUPER::a_start();
}

sub html_end {
my $self = shift;
if ( $self-{_links} ) {
$self-nl; $self-nl; # be tidy
$self-goto_lm;
for (0 .. $#{$self-{_links}}) {
$self-goto_lm;
$self-out([$_] . $self-{_links}-[$_]);
$self-nl;
}
}
$self-SUPER::end();
}
1;

__END__

HTH 

Struan



(no subject)

2003-06-30 Thread Daniel Glyn-Jones
Calling all Mod Perl developers..


Working for this groundbreaking independent company, this is an excellent
position for a talented web developer with the full range of skills and
experience. The ideal candidate will have flawless Perl/mod_perl(5.005+),
Apache  Linux skills. Day to day you will work on all aspects of a project,
often unsupervised  always to very definite specifications and in very
definite timescales. This is a business that prides itself in it's staff.
Excellent career potential. Send cv today for further details

This though a highly technical position is one that will involve you closely
with the day to day activities of the business itself, you can expect to
help in the company's strategic decisions of how they will define themselves
in the market place. Expect ongoing project activity and as such a role
where no two weeks are the same. 

Working from an attractive west country location is another huge plus with
this company. Salary banding is between £25 - 30, 000 
Please contact Daniel if you have further queries, 

Best regards

Daniel.



Daniel Glyn-Jones
Senior Consultant
ElanIT
1st Floor New Minster House
27-29 Baldwin Street
Bristol
BS1 1LT
[EMAIL PROTECTED]
Tel:  0117 9309700
Fax: 0117 9304205
Website: www.ElanIT.co.uk



***
This email and any files transmitted with it are confidential, also intended solely 
for the use of the individual or entity to whom they are addressed.
If you have recieved this email in error please notify your system manager.




This email has been scanned for all viruses by the MessageLabs Email
Security System. For more information on a proactive email security
service working around the clock, around the globe, visit
http://www.messagelabs.com




Re: (no subject)

2003-06-30 Thread Nicholas Clark
On Mon, Jun 30, 2003 at 10:48:46AM +0100, Daniel Glyn-Jones wrote:

http://london.pm.org/about/faq.html

How do I advertise a job on the list?

If you're a recruiter, you don't. Anyone else should advertise it with
a subject line containing [JOB].

Not having a subject at all is a really bad plan.

 Working from an attractive west country location is another huge plus with

Near Budgens in Bradford upon Avon?

There was an amazing amount of information missing from your message.

 ***
 This email and any files transmitted with it are confidential, also intended solely 
 for the use of the individual or entity to whom they are addressed.
 If you have recieved this email in error please notify your system manager.
 

You are aware that the list is publicly archived?

BTW it's received.

Nicholas Clark



RE: (no subject)

2003-06-30 Thread Daniel Glyn-Jones
Hi Nicholas, 

I appreciate your feedback on this one. First time using the site was just
now so apologies if I've blundered ! 
As a recruitment agency, would you recommend that I do not advertise via
this site at all or just that I make sure that I identify this in the
subject line ? 

Thanks for your help, 

Daniel.

Daniel Glyn-Jones
Senior Consultant
ElanIT
1st Floor New Minster House
27-29 Baldwin Street
Bristol
BS1 1LT
[EMAIL PROTECTED]
Tel:  0117 9309700
Fax: 0117 9304205
Website: www.ElanIT.co.uk


-Original Message-
From: Nicholas Clark [mailto:[EMAIL PROTECTED]
Sent: Monday, June 30, 2003 11:26 AM
To: [EMAIL PROTECTED]
Subject: Re: (no subject)


On Mon, Jun 30, 2003 at 10:48:46AM +0100, Daniel Glyn-Jones wrote:

http://london.pm.org/about/faq.html

How do I advertise a job on the list?

If you're a recruiter, you don't. Anyone else should advertise it with
a subject line containing [JOB].

Not having a subject at all is a really bad plan.

 Working from an attractive west country location is another huge plus with

Near Budgens in Bradford upon Avon?

There was an amazing amount of information missing from your message.


***
 This email and any files transmitted with it are confidential, also
intended solely 
 for the use of the individual or entity to whom they are addressed.
 If you have recieved this email in error please notify your system
manager.



You are aware that the list is publicly archived?

BTW it's received.

Nicholas Clark



This email has been scanned for all viruses by the MessageLabs Email
Security System. For more information on a proactive email security
service working around the clock, around the globe, visit
http://www.messagelabs.com



***
This email and any files transmitted with it are confidential, also intended solely 
for the use of the individual or entity to whom they are addressed.
If you have recieved this email in error please notify your system manager.




This email has been scanned for all viruses by the MessageLabs Email
Security System. For more information on a proactive email security
service working around the clock, around the globe, visit
http://www.messagelabs.com




Re: (no subject)

2003-06-30 Thread Nicholas Clark
On Mon, Jun 30, 2003 at 11:30:02AM +0100, Daniel Glyn-Jones wrote:
 Hi Nicholas, 
 
 I appreciate your feedback on this one. First time using the site was just
 now so apologies if I've blundered ! 
 As a recruitment agency, would you recommend that I do not advertise via
 this site at all or just that I make sure that I identify this in the
 subject line ? 

As a recruitment agency I would recommend that you do not advertise via
this list, even for jobs in London or its environs. I'd encourage you
to advertise on the perl jobs list ( [EMAIL PROTECTED] )

see 
  http://lists.perl.org/showlist.cgi?name=jobs
and the connected site
  http://jobs.perl.org/

All perl job adverts are most welcome there.

Nicholas Clark



Re: HTML to text

2003-06-30 Thread alex
On Sun, 2003-06-29 at 20:00, Ian Malpass wrote:
 I'm trying to write something that converts HTML into nicely formatted
 text.
 
 How can I improve it? I haven't really found anything on CPAN to do what I
 want (there are remove HTML tags scripts and thing, but nothing with the
 formatting power of lynx that I can see).

I use HTML::TokeParser for turning html into text, for mailouts.  I'm
very happy with it, but perhaps I need more control over the output than
you.

I pass the output from tokeparser though Text::Autoformat to make sure
it's all pretty and formatted to the 72nd column.

There's also HTML::TokeParser::Simple, but I decided to avoid it, and
define a few constants to make the code simpler instead.

Here's the code I wrote, in the form of a Template::Toolkit plugin - 
you'll need to extract the bits you want, or start from an example in
the documentation instead.  

My code shows links in square brackets after the link text, converts
headings to uppercase, and puts asterixes around bold text.  Other tags
are ignored and removed.  It also does some stuff to resolve relative
links which you'll probably want to remove.



package state51::Template::Plugin::HTMLToker;
use strict;

use base 'state51::Template::Plugin::Base';

use constant START_TAG   = 'S';
use constant END_TAG = 'E';
use constant TEXT= 'T';
use constant COMMENT = 'C';
use constant DECLARATION = 'D';
use constant PROCESS = 'PI';

use HTML::TokeParser;

##

sub htmltotext {
my ($self, $html) = @_;

my $domain = ($self-params-[0] || '');

my $toker = HTML::TokeParser-new(\$html)
  or die couldn't tokeparse that html;

my $result = '';

my @links;
my $upper = 0;
while (my $token = $toker-get_token) {
my $type = $token-[0];

if ($type eq START_TAG or $type eq END_TAG) {
my $tag = $token-[1];
if ($tag eq 'b') {
$result .= '*';
}
elsif ($tag eq 'br') {
$result .= \n
  if $type eq START_TAG;
}
elsif ($tag =~ /^h\d+$/) {
if ($type eq START_TAG) {
++$upper;
}
else {
$upper = 0;
}
}
elsif ($tag eq 'a') {
if ($type eq START_TAG) {
my $attr = $token-[2];
if (exists $attr-{href}) {
push @links, $attr-{href};
}
}
else {
if (scalar @links) {
my $link = pop @links;
if ($link =~ m,^/,) {
$link = http://$domain$link;;
}
$result .= (' [ '
. $link
. ' ]'
   );
}
}
}
}
elsif ($type eq TEXT) {
my $text = $token-[1];
if ($upper) {
$text = uc($text);
}
$result .= $text;
}
}
return $result;
}

##

1;

-- 
alex [EMAIL PROTECTED]




UK money, again (again)

2003-06-30 Thread Paul Mison
On 26/06/2003 at 10:19 -0300, Luis Campos de Carvalho wrote:

  This is the first time I meet a monetary system that is not based on
  the relation
  100 - 50 - 20 - 10 - 5 - 1 - 0.50 - 0.25 - 0.10 - 0.01
As other people have mentioned, although not explicitly, the British 
pound (and the Euro) have different sub-unit currency subdivisions, 
ie:

100 50 20 10 5 2 1

http://www.royalmint.com/talk/specifications.asp
http://www.eurocoins.co.uk/ireland.html
as opposed to the US model:

100 50 25 10 5 1

http://www.usmint.gov/faqs/circulating_coins/index.cfm?action=faq_circulating_coin

Of course, the US has to give their coins cutesy names, just to 
confuse people; a habit that's thankfully died out here (cf previous 
discussion of florins).

I vaguely recall seeing a survey that recommended an 18/100 unit coin 
as the optimum for currencies, but the mental arithmetic would be 
horrific. I don't know if they pronounced on whether 20 is better 
than 25 or not, but it's interesting that the US doesn't issue 25 
dollar bills.

--
:: paul
:: compiles with canadian cs1471 protocol


Re: UK money, again

2003-06-30 Thread Paul Mison
On 26/06/2003 at 15:47 +0100, Iain Tatch wrote:
On Thursday, June 26, 2003, 3:27:21 PM, Nicholas Clark wrote:
Has the inscription Standing on the shoulders of giants around the edge.

I think this one's broke. It's got Deoxyribonucleic Acid written round
the edge. And a rather cool double helix printed on the tails side. Hmm I
quite like that. I'll try to remember to put it to one side.
It's a special commemorative edition. They come out periodically for 
high value coins (these days, that's 2 pound and 50 pence) to mark 
some anniversary. This one is for the 50th anniversary of the 
decoding of the structure of, um, well, DNA.

http://www.royalmint.com/news/pnewsitem.asp?news_id=19

Pound coins have their own rotating series of national designs, the 
newest set of which (using bridges, just like Euro notes) have been 
previewed:

http://www.timesonline.co.uk/article/0,,2-718623,00.html

http://2lmc.org/spool/id/2806 has more coin geeking and a slight jab 
at the lack of interesting bridges in Northern Ireland.

--
:: paul
:: compiles with canadian cs1471 protocol


Re: UK money, again

2003-06-30 Thread Dave Cross

From: Paul Mison [EMAIL PROTECTED]
Date: 6/30/03 1:57:25 PM

 Pound coins have their own rotating series of national 
 designs, the newest set of which (using bridges, just like
 Euro notes) have been previewed:

 http://www.timesonline.co.uk/article/0,,2-718623,00.html

IIRC, one of Ian McEwan's novels (I think it was Child in Time[1])
features a character who sat on the board that approved these
designs.

Dave...

[1] Which I heartily recommend if you haven't already read it[2].
[2] In fact, read all[3] of McEwan's books whilst you're at it.
The man's a bloody star.
[3] Except perhaps Atonement. Not enjoying that as much as
the others.
-- 
http://www.dave.org.uk

Let me see you make decisions, without your television
   - Depeche Mode (Stripped)







Re: UK money, again (again)

2003-06-30 Thread Nicholas Clark
On Mon, Jun 30, 2003 at 02:52:53PM +0100, Paul Mison wrote:

 As other people have mentioned, although not explicitly, the British 
 pound (and the Euro) have different sub-unit currency subdivisions, 
 ie:
 
 100 50 20 10 5 2 1

 as opposed to the US model:
 
 100 50 25 10 5 1

 horrific. I don't know if they pronounced on whether 20 is better 
 than 25 or not, but it's interesting that the US doesn't issue 25 
 dollar bills.

My experience was that 25 sucks. When calculating amounts above 10 cents
I had to keep track of both units and tens changing when I added/removed
a 25 cent coin from an amount. Adding/removing 20 only changes the tens.

Likewise I found the lack of a US 2 cent coin really really annoying,
because I had to deal with up to 4 coins just to get the last few cents
right.

Nicholas Clark



[Cool] [Nice] Dave's Recursive Footnotes

2003-06-30 Thread Luis Campos de Carvalho
Dave Cross wrote:
From: Paul Mison [EMAIL PROTECTED]
Date: 6/30/03 1:57:25 PM
[body removed]

[1] Which I heartily recommend if you haven't already read it[2].
[2] In fact, read all[3] of McEwan's books whilst you're at it.
The man's a bloody star.
[3] Except perhaps Atonement. Not enjoying that as much as
the others.
  Hey! Look at it! It's recursive footnotes! Is there any Perl Module 
to handle this? (peharps in TeX?)

  Thank you Dave! It was nice and cool!
  []'z!
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
  Luis Campos de Carvalho
  Computer Scientist,
  Unix Sys Admin  Certified Oracle DBA
  http://br.geocities.com/monsieur_champs/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



Re: UK Money, again

2003-06-30 Thread Piers Cawley
muppet [EMAIL PROTECTED] writes:

 David Cantrell said:
 The hundredweight is 112 lbs, or 8 stone, or 1/20 ton.

 suddenly i have a new understanding of weighin' in at nineteen stone, from
 whole lotta rosie.  indeed, that is a whole lot of woman.  wow.

There was apparently an occasion when some eejits at MIT demanded that
all classes be taught using the furlong/stone/fortnight system of
measurements...

-- 
Piers



Linux firewall / web server

2003-06-30 Thread Martin Bower
I'm going to build a Linux firewall  web server at home (not necessarily 
the same box) and wondered if anyone can advise of the best route to go.

I've seen smoothwall,  but would I be better hardening a linux install ?  if 
so, which flavour ?
I'll be installing Apache, mod_perl, + db on the same box, so maybe this 
will influence your recommendation ?

any ideas/tips welcomed.

_
On the move? Get Hotmail on your mobile phone http://www.msn.co.uk/msnmobile



Re: Linux firewall / web server

2003-06-30 Thread Jason Clifford
On Mon, 30 Jun 2003, Martin Bower wrote:

 I'm going to build a Linux firewall  web server at home (not necessarily 
 the same box) and wondered if anyone can advise of the best route to go.
 
 I've seen smoothwall,  but would I be better hardening a linux install ?  if 
 so, which flavour ?
 I'll be installing Apache, mod_perl, + db on the same box, so maybe this 
 will influence your recommendation ?

As far as I know none of the pre-cooked firewall distros are suitable if 
you want to install a fully functioning webserver on the box - certainly 
Smoothwall/IPCop are not.

What the best approach will be depends, as ever, upon what you really want 
to do with it, what you want to protect and the level of protection you 
want.

I suspect you'll need to roll your own.

Jason Clifford
-- 
UKFSN.ORG   Finance Free Software while you surf the 'net
http://www.ukfsn.org/   ADSL Available Now




Re: Linux firewall / web server

2003-06-30 Thread Luis Campos de Carvalho
Martin Bower wrote:
I'm going to build a Linux firewall  web server at home (not 
necessarily the same box) and wondered if anyone can advise of the best 
route to go.
 I've seen smoothwall,  but would I be better hardening a linux install
 ?  if so, which flavour ?
  Hello, Martin.

  I know nothing about the smoothwall.
  I would tell you to harden a linux box.
  If I was you, I would use a Debian Linux distro. It have good 
maintenance facilities built-in, and is easier to set up as a server box 
(no X, nor other non-server services).

I'll be installing Apache, mod_perl, + db on the same box, so maybe this 
will influence your recommendation ?
  Maybe you will have some trouble with Debian and X windows, I allways 
need about an hour to set up an X windows when I install a Debian linux 
from scratch. I don't think that this will hurt you that much.

  Just my two pence.
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
  Luis Campos de Carvalho
  Computer Scientist,
  Unix Sys Admin  Certified Oracle DBA
  http://br.geocities.com/monsieur_champs/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=



Hundredweight was Re: UK Money, again

2003-06-30 Thread Steve Mynott
Roger Horne wrote:

On Fri 27 Jun, Philip Newton wrote:

 

You have: cwt
You want:
   Definition: hundredweight = 100 pounds = 45.359237 kg
which sounds as if it *is* 100 somethings.


But is wrong. There are 112 pounds in a hundredweight (or were when I was at
school). 

See http://home.clara.net/brianp/weights.html
You are both right depends whether you are talking about an American or 
English hundredweight.

GNU units has 'brhundredweight' defined whereas the FreeBSD 4.5 units(1) 
doesn't (and probably should).

http://www.bartleby.com/61/55/H0325500.html

A unit of weight in the U.S. Customary System equal to 100 pounds (45.36 
kilograms). Also called cental, short hundredweight. 2. A unit of weight 
in the British Imperial System equal to 112 pounds (50.80 kilograms).

-- Steve




Re: UK money, again

2003-06-30 Thread Tom Lancaster
On Mon, Jun 30, 2003 at 04:35:54PM +0100, Andy Mendelsohn wrote:
 
 On Monday, June 30, 2003, at 03:07  pm, Dave Cross wrote:
 [3] Except perhaps Atonement. Not enjoying that as much as
 the others.
 -- 
 
 
 Oh no, keep at it Dave, it has a great ending. The missus and I read it 
 out loud to each other, a chapter at a time.
 I think, along with a Child in Time, it's now my favourite McEwan.
 

I have to agree. For me it's the most horrifying of all his books. Like a slow-mo 
plane crash.

Tom

-- 
Manly's Maxim: Logic is a systematic method of coming to the wrong conclusion with 
confidence.




Re: Linux firewall / web server

2003-06-30 Thread Chris Benson
On Mon, Jun 30, 2003 at 03:58:36PM +, Martin Bower wrote:
 I'm going to build a Linux firewall  web server at home (not necessarily 
 the same box) and wondered if anyone can advise of the best route to go.
 
 I've seen smoothwall,  but would I be better hardening a linux install ?  if 
 
 so, which flavour ?
 I'll be installing Apache, mod_perl, + db on the same box, so maybe this 

Uhm, this sort-of contradicts (not necessarily the same box) above.
If you want a minimal-maintenance, standalone firewall, ISTM that
Smoothwall is as easy as it gets ...

But as others have said, if you're putting a webserver on it, then you
have to go with plan B.

 will influence your recommendation ?

The distro you're most familiar with?  I use SuSE because I'm used to
it.  But I'm forcing myself to try Debian again, again :-)

Another suggestion:-

There's a Knoppix-based (which is itself Debian based) distro, 
Knoppix-STD (nope, it stands for Security Tools Distribution) that has
two sorts of firewall as well as honey-pots, ID-tools and so forth.
It can run from CDROM which is nice from a recoverability PoV.

http://www.knoppix-std.org/
 
 any ideas/tips welcomed.

I had a lot of difficulty thinking about the f/wall rules for a system
acting as f/wall and server until I separated the data streams (and
setup a table/chain for each stream):

inet- f/wall
inet- internal network
inet- dmz network
dmz - inet
dmz - internal
dmz - f/wall
int'l   - f/wall
int'l   - dmz
int'l   - inet
f/wall  - inet
f/wall  - dmz
f/wall  - internal

I've a perl script that writes a set of iptables commands from a
simplified config file ...
-- 
Chris Benson