Re: HTML to text
* Ian Malpass ([EMAIL PROTECTED]) [030629 20:59]: I'm trying to write something that converts HTML into nicely formatted text. $fh-open(lynx -dump -stdin temp.txt |) HTML::FormatText produces a nice result. -- MarkOv %-] drs Mark A.C.J. OvermeerMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net
Re: The Community Guide to Birmingham
On Sunday, June 29, 2003, at 03:35 PM, Jody Belka wrote: right now there isn't any content whatsoever on the site, so anyone in the birmingham area or who knows the birmingham area please feel welcome to come along and add something. Jody, There's a very active Brum community on Ecademy (yeah, I know) run by Donato Esposito who is a top bloke, you might tap into that. Dave
Re: Using LWP for protected pages
Colin Magee said: Maybe I can't do this using this module? Does anyone have experience of trying to access protected pages via a login who could help point me in the right direction? Thanks Colin Magee Well time to jump in I think. hello list! Nothing to do with LWP I'm afraid, but in the end the most fool proof route is to start with a sniffer (you can get e.g. ethereal for windows, I believe), and see exactly what is being sent with the query for the protected page. Things to think about are, Referer: headers, cookies set via javascript, hidden form fields. I speak from experience of trying (and succeeding in the end) to automate sending a text via o2.co.uk's text service. I didn't use LWP, (prefering to roll my own for my 'home' projects), I used more 'raw' perl, wrapping openssl's s_client function as a co-process. There is _no_ more bizarre, obscure, over-complex (and technically broken) login process than o2's web site. After logging in you get bounced (via _both_ 302 redirects, and javascript automatic form submissions) in turn to about three different login servers. Some of the cookies set have a path or domain component that is so far as I could see illegal. You have to go SSL and not SSL. Anyway it was fun but now you get about 5 free texts a month, not worth it! - Piers (a different one!)
Re: Using LWP for protected pages
On Sun, 29 Jun 2003, Leon Brocard wrote: Anyone got any time to write a Javascript library and integrate it into WWW:Mechanize? Handily, the mozilla guys went and wrote a JavaScript library for us: http://www.mozilla.org/js/spidermonkey/ And waddaya know, it already has a Perl wrapper: http://search.cpan.org/author/CLAESJAC/JavaScript/ You can also use the web scraping proxy mentioned in last months dr dobbs (src code available at the site as usual) I was thinking of using it to build a test suite for our site (regression testing is a long and tedious job that could be automated for 70%) as well as monitoring (checking a quote or result appears rather than a 500, etc) A. -- Aaron J Trevena - Perl Hacker, Kung Fu Geek, Internet Consultant AutoDia --- Automatic UML and HTML Specifications from Perl, C++ and Any Datasource with a Handler. http://droogs.org/autodia
Re: HTML to text
* at 30/06 10:32 +0200 Mark Overmeer said: * Ian Malpass ([EMAIL PROTECTED]) [030629 20:59]: I'm trying to write something that converts HTML into nicely formatted text. $fh-open(lynx -dump -stdin temp.txt |) HTML::FormatText produces a nice result. Although in order to get it to produce lynx like output some subclassing is needed to handle links: package HTML::MyFormatText; use strict; use URI::WithBase; use base qw(HTML::FormatText); sub configure { my $self = shift; my $hash = shift; # a base uri so we can resolve relative uris $self-{base} = $hash-{base}; delete $hash-{base}; $self-{base} =~ s#(.*?)/[^/]*$#$1/#; $self-SUPER::configure($hash); } sub a_start { my $self = shift; my $node = shift; # local urls are no use so we have to make them absolute my $href = $node-attr('href') || ''; if ($href =~ m#^http:|^mailto:#) { push @{$self-{_links}}, $href; } else { my $u = URI::WithBase-new($href, $self-{base}); push @{$self-{_links}}, $u-abs(); } $self-out( '[' . $#{$self-{_links}} .'] ' ); $self-SUPER::a_start(); } sub html_end { my $self = shift; if ( $self-{_links} ) { $self-nl; $self-nl; # be tidy $self-goto_lm; for (0 .. $#{$self-{_links}}) { $self-goto_lm; $self-out([$_] . $self-{_links}-[$_]); $self-nl; } } $self-SUPER::end(); } 1; __END__ HTH Struan
(no subject)
Calling all Mod Perl developers.. Working for this groundbreaking independent company, this is an excellent position for a talented web developer with the full range of skills and experience. The ideal candidate will have flawless Perl/mod_perl(5.005+), Apache Linux skills. Day to day you will work on all aspects of a project, often unsupervised always to very definite specifications and in very definite timescales. This is a business that prides itself in it's staff. Excellent career potential. Send cv today for further details This though a highly technical position is one that will involve you closely with the day to day activities of the business itself, you can expect to help in the company's strategic decisions of how they will define themselves in the market place. Expect ongoing project activity and as such a role where no two weeks are the same. Working from an attractive west country location is another huge plus with this company. Salary banding is between £25 - 30, 000 Please contact Daniel if you have further queries, Best regards Daniel. Daniel Glyn-Jones Senior Consultant ElanIT 1st Floor New Minster House 27-29 Baldwin Street Bristol BS1 1LT [EMAIL PROTECTED] Tel: 0117 9309700 Fax: 0117 9304205 Website: www.ElanIT.co.uk *** This email and any files transmitted with it are confidential, also intended solely for the use of the individual or entity to whom they are addressed. If you have recieved this email in error please notify your system manager. This email has been scanned for all viruses by the MessageLabs Email Security System. For more information on a proactive email security service working around the clock, around the globe, visit http://www.messagelabs.com
Re: (no subject)
On Mon, Jun 30, 2003 at 10:48:46AM +0100, Daniel Glyn-Jones wrote: http://london.pm.org/about/faq.html How do I advertise a job on the list? If you're a recruiter, you don't. Anyone else should advertise it with a subject line containing [JOB]. Not having a subject at all is a really bad plan. Working from an attractive west country location is another huge plus with Near Budgens in Bradford upon Avon? There was an amazing amount of information missing from your message. *** This email and any files transmitted with it are confidential, also intended solely for the use of the individual or entity to whom they are addressed. If you have recieved this email in error please notify your system manager. You are aware that the list is publicly archived? BTW it's received. Nicholas Clark
RE: (no subject)
Hi Nicholas, I appreciate your feedback on this one. First time using the site was just now so apologies if I've blundered ! As a recruitment agency, would you recommend that I do not advertise via this site at all or just that I make sure that I identify this in the subject line ? Thanks for your help, Daniel. Daniel Glyn-Jones Senior Consultant ElanIT 1st Floor New Minster House 27-29 Baldwin Street Bristol BS1 1LT [EMAIL PROTECTED] Tel: 0117 9309700 Fax: 0117 9304205 Website: www.ElanIT.co.uk -Original Message- From: Nicholas Clark [mailto:[EMAIL PROTECTED] Sent: Monday, June 30, 2003 11:26 AM To: [EMAIL PROTECTED] Subject: Re: (no subject) On Mon, Jun 30, 2003 at 10:48:46AM +0100, Daniel Glyn-Jones wrote: http://london.pm.org/about/faq.html How do I advertise a job on the list? If you're a recruiter, you don't. Anyone else should advertise it with a subject line containing [JOB]. Not having a subject at all is a really bad plan. Working from an attractive west country location is another huge plus with Near Budgens in Bradford upon Avon? There was an amazing amount of information missing from your message. *** This email and any files transmitted with it are confidential, also intended solely for the use of the individual or entity to whom they are addressed. If you have recieved this email in error please notify your system manager. You are aware that the list is publicly archived? BTW it's received. Nicholas Clark This email has been scanned for all viruses by the MessageLabs Email Security System. For more information on a proactive email security service working around the clock, around the globe, visit http://www.messagelabs.com *** This email and any files transmitted with it are confidential, also intended solely for the use of the individual or entity to whom they are addressed. If you have recieved this email in error please notify your system manager. This email has been scanned for all viruses by the MessageLabs Email Security System. For more information on a proactive email security service working around the clock, around the globe, visit http://www.messagelabs.com
Re: (no subject)
On Mon, Jun 30, 2003 at 11:30:02AM +0100, Daniel Glyn-Jones wrote: Hi Nicholas, I appreciate your feedback on this one. First time using the site was just now so apologies if I've blundered ! As a recruitment agency, would you recommend that I do not advertise via this site at all or just that I make sure that I identify this in the subject line ? As a recruitment agency I would recommend that you do not advertise via this list, even for jobs in London or its environs. I'd encourage you to advertise on the perl jobs list ( [EMAIL PROTECTED] ) see http://lists.perl.org/showlist.cgi?name=jobs and the connected site http://jobs.perl.org/ All perl job adverts are most welcome there. Nicholas Clark
Re: HTML to text
On Sun, 2003-06-29 at 20:00, Ian Malpass wrote: I'm trying to write something that converts HTML into nicely formatted text. How can I improve it? I haven't really found anything on CPAN to do what I want (there are remove HTML tags scripts and thing, but nothing with the formatting power of lynx that I can see). I use HTML::TokeParser for turning html into text, for mailouts. I'm very happy with it, but perhaps I need more control over the output than you. I pass the output from tokeparser though Text::Autoformat to make sure it's all pretty and formatted to the 72nd column. There's also HTML::TokeParser::Simple, but I decided to avoid it, and define a few constants to make the code simpler instead. Here's the code I wrote, in the form of a Template::Toolkit plugin - you'll need to extract the bits you want, or start from an example in the documentation instead. My code shows links in square brackets after the link text, converts headings to uppercase, and puts asterixes around bold text. Other tags are ignored and removed. It also does some stuff to resolve relative links which you'll probably want to remove. package state51::Template::Plugin::HTMLToker; use strict; use base 'state51::Template::Plugin::Base'; use constant START_TAG = 'S'; use constant END_TAG = 'E'; use constant TEXT= 'T'; use constant COMMENT = 'C'; use constant DECLARATION = 'D'; use constant PROCESS = 'PI'; use HTML::TokeParser; ## sub htmltotext { my ($self, $html) = @_; my $domain = ($self-params-[0] || ''); my $toker = HTML::TokeParser-new(\$html) or die couldn't tokeparse that html; my $result = ''; my @links; my $upper = 0; while (my $token = $toker-get_token) { my $type = $token-[0]; if ($type eq START_TAG or $type eq END_TAG) { my $tag = $token-[1]; if ($tag eq 'b') { $result .= '*'; } elsif ($tag eq 'br') { $result .= \n if $type eq START_TAG; } elsif ($tag =~ /^h\d+$/) { if ($type eq START_TAG) { ++$upper; } else { $upper = 0; } } elsif ($tag eq 'a') { if ($type eq START_TAG) { my $attr = $token-[2]; if (exists $attr-{href}) { push @links, $attr-{href}; } } else { if (scalar @links) { my $link = pop @links; if ($link =~ m,^/,) { $link = http://$domain$link;; } $result .= (' [ ' . $link . ' ]' ); } } } } elsif ($type eq TEXT) { my $text = $token-[1]; if ($upper) { $text = uc($text); } $result .= $text; } } return $result; } ## 1; -- alex [EMAIL PROTECTED]
UK money, again (again)
On 26/06/2003 at 10:19 -0300, Luis Campos de Carvalho wrote: This is the first time I meet a monetary system that is not based on the relation 100 - 50 - 20 - 10 - 5 - 1 - 0.50 - 0.25 - 0.10 - 0.01 As other people have mentioned, although not explicitly, the British pound (and the Euro) have different sub-unit currency subdivisions, ie: 100 50 20 10 5 2 1 http://www.royalmint.com/talk/specifications.asp http://www.eurocoins.co.uk/ireland.html as opposed to the US model: 100 50 25 10 5 1 http://www.usmint.gov/faqs/circulating_coins/index.cfm?action=faq_circulating_coin Of course, the US has to give their coins cutesy names, just to confuse people; a habit that's thankfully died out here (cf previous discussion of florins). I vaguely recall seeing a survey that recommended an 18/100 unit coin as the optimum for currencies, but the mental arithmetic would be horrific. I don't know if they pronounced on whether 20 is better than 25 or not, but it's interesting that the US doesn't issue 25 dollar bills. -- :: paul :: compiles with canadian cs1471 protocol
Re: UK money, again
On 26/06/2003 at 15:47 +0100, Iain Tatch wrote: On Thursday, June 26, 2003, 3:27:21 PM, Nicholas Clark wrote: Has the inscription Standing on the shoulders of giants around the edge. I think this one's broke. It's got Deoxyribonucleic Acid written round the edge. And a rather cool double helix printed on the tails side. Hmm I quite like that. I'll try to remember to put it to one side. It's a special commemorative edition. They come out periodically for high value coins (these days, that's 2 pound and 50 pence) to mark some anniversary. This one is for the 50th anniversary of the decoding of the structure of, um, well, DNA. http://www.royalmint.com/news/pnewsitem.asp?news_id=19 Pound coins have their own rotating series of national designs, the newest set of which (using bridges, just like Euro notes) have been previewed: http://www.timesonline.co.uk/article/0,,2-718623,00.html http://2lmc.org/spool/id/2806 has more coin geeking and a slight jab at the lack of interesting bridges in Northern Ireland. -- :: paul :: compiles with canadian cs1471 protocol
Re: UK money, again
From: Paul Mison [EMAIL PROTECTED] Date: 6/30/03 1:57:25 PM Pound coins have their own rotating series of national designs, the newest set of which (using bridges, just like Euro notes) have been previewed: http://www.timesonline.co.uk/article/0,,2-718623,00.html IIRC, one of Ian McEwan's novels (I think it was Child in Time[1]) features a character who sat on the board that approved these designs. Dave... [1] Which I heartily recommend if you haven't already read it[2]. [2] In fact, read all[3] of McEwan's books whilst you're at it. The man's a bloody star. [3] Except perhaps Atonement. Not enjoying that as much as the others. -- http://www.dave.org.uk Let me see you make decisions, without your television - Depeche Mode (Stripped)
Re: UK money, again (again)
On Mon, Jun 30, 2003 at 02:52:53PM +0100, Paul Mison wrote: As other people have mentioned, although not explicitly, the British pound (and the Euro) have different sub-unit currency subdivisions, ie: 100 50 20 10 5 2 1 as opposed to the US model: 100 50 25 10 5 1 horrific. I don't know if they pronounced on whether 20 is better than 25 or not, but it's interesting that the US doesn't issue 25 dollar bills. My experience was that 25 sucks. When calculating amounts above 10 cents I had to keep track of both units and tens changing when I added/removed a 25 cent coin from an amount. Adding/removing 20 only changes the tens. Likewise I found the lack of a US 2 cent coin really really annoying, because I had to deal with up to 4 coins just to get the last few cents right. Nicholas Clark
[Cool] [Nice] Dave's Recursive Footnotes
Dave Cross wrote: From: Paul Mison [EMAIL PROTECTED] Date: 6/30/03 1:57:25 PM [body removed] [1] Which I heartily recommend if you haven't already read it[2]. [2] In fact, read all[3] of McEwan's books whilst you're at it. The man's a bloody star. [3] Except perhaps Atonement. Not enjoying that as much as the others. Hey! Look at it! It's recursive footnotes! Is there any Perl Module to handle this? (peharps in TeX?) Thank you Dave! It was nice and cool! []'z! -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Luis Campos de Carvalho Computer Scientist, Unix Sys Admin Certified Oracle DBA http://br.geocities.com/monsieur_champs/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Re: UK Money, again
muppet [EMAIL PROTECTED] writes: David Cantrell said: The hundredweight is 112 lbs, or 8 stone, or 1/20 ton. suddenly i have a new understanding of weighin' in at nineteen stone, from whole lotta rosie. indeed, that is a whole lot of woman. wow. There was apparently an occasion when some eejits at MIT demanded that all classes be taught using the furlong/stone/fortnight system of measurements... -- Piers
Linux firewall / web server
I'm going to build a Linux firewall web server at home (not necessarily the same box) and wondered if anyone can advise of the best route to go. I've seen smoothwall, but would I be better hardening a linux install ? if so, which flavour ? I'll be installing Apache, mod_perl, + db on the same box, so maybe this will influence your recommendation ? any ideas/tips welcomed. _ On the move? Get Hotmail on your mobile phone http://www.msn.co.uk/msnmobile
Re: Linux firewall / web server
On Mon, 30 Jun 2003, Martin Bower wrote: I'm going to build a Linux firewall web server at home (not necessarily the same box) and wondered if anyone can advise of the best route to go. I've seen smoothwall, but would I be better hardening a linux install ? if so, which flavour ? I'll be installing Apache, mod_perl, + db on the same box, so maybe this will influence your recommendation ? As far as I know none of the pre-cooked firewall distros are suitable if you want to install a fully functioning webserver on the box - certainly Smoothwall/IPCop are not. What the best approach will be depends, as ever, upon what you really want to do with it, what you want to protect and the level of protection you want. I suspect you'll need to roll your own. Jason Clifford -- UKFSN.ORG Finance Free Software while you surf the 'net http://www.ukfsn.org/ ADSL Available Now
Re: Linux firewall / web server
Martin Bower wrote: I'm going to build a Linux firewall web server at home (not necessarily the same box) and wondered if anyone can advise of the best route to go. I've seen smoothwall, but would I be better hardening a linux install ? if so, which flavour ? Hello, Martin. I know nothing about the smoothwall. I would tell you to harden a linux box. If I was you, I would use a Debian Linux distro. It have good maintenance facilities built-in, and is easier to set up as a server box (no X, nor other non-server services). I'll be installing Apache, mod_perl, + db on the same box, so maybe this will influence your recommendation ? Maybe you will have some trouble with Debian and X windows, I allways need about an hour to set up an X windows when I install a Debian linux from scratch. I don't think that this will hurt you that much. Just my two pence. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Luis Campos de Carvalho Computer Scientist, Unix Sys Admin Certified Oracle DBA http://br.geocities.com/monsieur_champs/ =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Hundredweight was Re: UK Money, again
Roger Horne wrote: On Fri 27 Jun, Philip Newton wrote: You have: cwt You want: Definition: hundredweight = 100 pounds = 45.359237 kg which sounds as if it *is* 100 somethings. But is wrong. There are 112 pounds in a hundredweight (or were when I was at school). See http://home.clara.net/brianp/weights.html You are both right depends whether you are talking about an American or English hundredweight. GNU units has 'brhundredweight' defined whereas the FreeBSD 4.5 units(1) doesn't (and probably should). http://www.bartleby.com/61/55/H0325500.html A unit of weight in the U.S. Customary System equal to 100 pounds (45.36 kilograms). Also called cental, short hundredweight. 2. A unit of weight in the British Imperial System equal to 112 pounds (50.80 kilograms). -- Steve
Re: UK money, again
On Mon, Jun 30, 2003 at 04:35:54PM +0100, Andy Mendelsohn wrote: On Monday, June 30, 2003, at 03:07 pm, Dave Cross wrote: [3] Except perhaps Atonement. Not enjoying that as much as the others. -- Oh no, keep at it Dave, it has a great ending. The missus and I read it out loud to each other, a chapter at a time. I think, along with a Child in Time, it's now my favourite McEwan. I have to agree. For me it's the most horrifying of all his books. Like a slow-mo plane crash. Tom -- Manly's Maxim: Logic is a systematic method of coming to the wrong conclusion with confidence.
Re: Linux firewall / web server
On Mon, Jun 30, 2003 at 03:58:36PM +, Martin Bower wrote: I'm going to build a Linux firewall web server at home (not necessarily the same box) and wondered if anyone can advise of the best route to go. I've seen smoothwall, but would I be better hardening a linux install ? if so, which flavour ? I'll be installing Apache, mod_perl, + db on the same box, so maybe this Uhm, this sort-of contradicts (not necessarily the same box) above. If you want a minimal-maintenance, standalone firewall, ISTM that Smoothwall is as easy as it gets ... But as others have said, if you're putting a webserver on it, then you have to go with plan B. will influence your recommendation ? The distro you're most familiar with? I use SuSE because I'm used to it. But I'm forcing myself to try Debian again, again :-) Another suggestion:- There's a Knoppix-based (which is itself Debian based) distro, Knoppix-STD (nope, it stands for Security Tools Distribution) that has two sorts of firewall as well as honey-pots, ID-tools and so forth. It can run from CDROM which is nice from a recoverability PoV. http://www.knoppix-std.org/ any ideas/tips welcomed. I had a lot of difficulty thinking about the f/wall rules for a system acting as f/wall and server until I separated the data streams (and setup a table/chain for each stream): inet- f/wall inet- internal network inet- dmz network dmz - inet dmz - internal dmz - f/wall int'l - f/wall int'l - dmz int'l - inet f/wall - inet f/wall - dmz f/wall - internal I've a perl script that writes a set of iptables commands from a simplified config file ... -- Chris Benson