Re: Hi, how to extract five texts on each side of an URI? I post my own perl script and its use.

2006-11-14 Thread Jenda Krynicky
From: ťÔ Íő [EMAIL PROTECTED]
   my $text;
   for my $left_index (1..WIDTH) {
last if $start_index  $left_index;
  $text .= $texts_arr[$start_index - $left_index] . ' ';
   }
   $text .= join( , @texts_arr[$start_index..$end_index]) . ' ';
for my $right_index (1..WIDTH) {
 last if $end_index + $right_index  $#texts_arr;
  $text .= $texts_arr[$end_index + $right_index] . ' ';
   }
$text_hash{$url} = $text;

As far as I can tell this could easily be rewriten with no loops. If
I understand it correctly you want to get all the texts from
$start_index-WIDTH to $end_index+WIDTH so something like:


my $left_index = $start_index - WIDTH;
$left_index = 0 if $left_index  0;
my $right_index = $end_index + WIDTH;
$right_index = $#texts_arr if $right_index  $#texts_arr;

my $text = join( , @texts_arr[$left_index .. $right_index]);

should do what you are after. There are probable other things, but
this caught my eyes.

Jenda
= [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: Hi, how to extract five texts on each side of an URI? I post my own perl script and its use.

2006-11-11 Thread Robin Sheat
On Sunday 12 November 2006 13:17, 辉 王 wrote:
 I can make my program do its job at last, but it runs slowly.
 Can anybody tell me how to improve the running speed of this  
 program? Thanks.
Have you had a look with the Perl profiler to see which bits are going slow. 
That way you know to look at make them run faster. See perldoc Devel::DProf 
for more information.

-- 
Robin [EMAIL PROTECTED] JabberID: [EMAIL PROTECTED]

Hostes alienigeni me abduxerunt. Qui annus est?

PGP Key 0xA99CEB6D = 5957 6D23 8B16 EFAB FEF8  7175 14D3 6485 A99C EB6D


pgpIhJEoay9Ke.pgp
Description: PGP signature


RE: Hi, how to extract five texts on each side of an URI? I post my own perl script and its use.

2006-11-11 Thread Charles K. Clarkson
Hui Wang mailto:[EMAIL PROTECTED] wrote:

: Can anybody tell me how to improve the running speed of this
: program? Thanks.

I don't know if this is faster, but it is a more accurate
solution. Your submitted code failed under some untested
circumstances. I created another page similar to the CPAN page you
used and fed it more complicated tests.

Chakrabarti placed relevance on distance from the link. I
changed your report to reflect this relevance. Instead of
squashing all text together, it now shows a report of text token
relevance. This change allowed me to test more thoroughly as well.
Here is the sample report for one link with multiple texts inside
the anchor.

http://www.clarksonenergyhomes.com/scripts/index.html
-5: 3401 MB 280 mirrors
-4: 5501 authors 10789 modules
-3: Welcome to CPAN! Here you will find All Things Perl.
-2: Browsing
-1: Perl modules
 0: Perl
 0: scripts
+1: Perl binary distributions (ports)
+2: Perl source code
+3: Perl recent arrivals
+4: recent
+5: Perl modules

You can find the modified code here (for a short time):

Script: http://www.clarksonenergyhomes.com/chakrabarti.txt
Module: http://www.clarksonenergyhomes.com/chakrabarti.pm


HTH,

Charles K. Clarkson
--
Mobile Homes Specialist
Free Market Advocate
Web Programmer

254 968-8328

http://www.clarksonenergyhomes.com/

Don't tread on my bandwidth. Trim your posts.



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response