Bizarre problem: Known good script (in 2011) fails to work in 2012

2012-01-04 Thread Hamann, T.D. (Thomas)
Hi,

I am having a rather unusual problem with a script that I wrote last year to 
clean unwanted contents out of UTF-8 encoded text files. It worked fine in the 
past, but when I try to run it now I get an error message and somehow all 
newlines are removed from the resulting file. Nothing was changed between 2011 
and 2012 in the script, which I give below:

#!/usr/bin/perl
# filecleaner.plx

use warnings;
use strict;
use utf8;
use open ':encoding(utf8)';

my $source = shift @ARGV;
my $destination = shift @ARGV;

open IN, $source or die Can't read source file $source: $!\n;
open OUT, $destination or die can't write on file $destination: $!\n;

while (IN) {
# Replaces all tab-characters with spaces:
s/\t/ /g;
# Replaces all hyphens that are both preceded and trailed by a space by 
long dashes preceded and trailed by a space:
s/ - / — /g; 
# Removes the leading space(s) from a variety of unwanted combinations:
s/( +)( |\.|,|:|;|\!|\]|\)|\n)/$2/g;
# Removes multiple dots:
s/\.+/./g;
# Removes multiple commas:
s/,+/,/g;
# Removes multiple colons:
s/:+/:/g;
# Removes multiple semi-colons:
s/;+/;/g;
# Removes commas before dots:
s/(,+)(\.)/$2/g;
# Removes the trailing spaces and dots behind two types of brackets:
s/(\(|\[)( +|\.+)/$1/g;
# Removes empty sets of brackets:
s/(\(|\[)(\)|\])//g;
# Removes whitespace at beginning of line:
s/^\s+//;
# Removes whitespace at end of line:
s/\s+$//;
# Prints all non-empty lines to file:
if (!/^\s*$/) {
print OUT $_;
}
}

close IN;
close OUT;

The error message (Malformed UTF-8 character (unexpected continuation byte 
0x97, with no preceding start byte) at filecleaner.plx line 23) seems to refer 
to the long dash in line 23. This was copied out of a UTF-8 encoded file in 
2011. If I change that to another UTF-8 long dash copied from another UTF-8 
file downloaded off 
the internet, the error message goes away. However, if I copy the dash out of a 
supposedly UTF-8 encoded file made in Word I get the error message.

With the dash fixed, however, the newlines still get stripped out of the file, 
which leaves me at a complete loss, since nothing in the code ought to chomp 
off newline characters.

What could cause such behaviour? Corrupt script file? Corrupted perl 
installation? Some stupid recent Windows update that screwed up UTF-8 and/or 
file handling in Windows XP? Before I went on Christmas holidays 
things were fine...

Any ideas?

Thanks,

Thomas
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




RE: Bizarre problem: Known good script (in 2011) fails to work in 2012

2012-01-04 Thread Hamann, T.D. (Thomas)
Okay, some further testing using a family member's Windows XP PC and a fresh 
install of ActivePerl seems to have revealed the culprit:

Changing 
s/\s+$//;
to:
s/(\s+$)(\n)/$2/;

fixed the issue. 

Since the script worked fine until about 3 weeks ago and I copied the original 
code from http://www.perlmonks.org/?node_id=2258, I can only surmise that 
Microsoft must have changed the way Windows XP deals with newlines in a very 
recent update. Which they could have communicated with the outer world. :( 

(oh well, another reason to dislike Microsoft, I guess).

Now for another question: How much code will this change break? 

Thomas



Van: Hamann, T.D. (Thomas) [ham...@nhn.leidenuniv.nl]
Verzonden: woensdag 4 januari 2012 12:27
Aan: beginners@perl.org
Onderwerp: Bizarre problem: Known good script (in 2011) fails to work in 2012

Hi,

I am having a rather unusual problem with a script that I wrote last year to 
clean unwanted contents out of UTF-8 encoded text files. It worked fine in the 
past, but when I try to run it now I get an error message and somehow all 
newlines are removed from the resulting file. Nothing was changed between 2011 
and 2012 in the script, which I give below:

#!/usr/bin/perl
# filecleaner.plx

use warnings;
use strict;
use utf8;
use open ':encoding(utf8)';

my $source = shift @ARGV;
my $destination = shift @ARGV;

open IN, $source or die Can't read source file $source: $!\n;
open OUT, $destination or die can't write on file $destination: $!\n;

while (IN) {
# Replaces all tab-characters with spaces:
s/\t/ /g;
# Replaces all hyphens that are both preceded and trailed by a space by 
long dashes preceded and trailed by a space:
s/ - / — /g;
# Removes the leading space(s) from a variety of unwanted combinations:
s/( +)( |\.|,|:|;|\!|\]|\)|\n)/$2/g;
# Removes multiple dots:
s/\.+/./g;
# Removes multiple commas:
s/,+/,/g;
# Removes multiple colons:
s/:+/:/g;
# Removes multiple semi-colons:
s/;+/;/g;
# Removes commas before dots:
s/(,+)(\.)/$2/g;
# Removes the trailing spaces and dots behind two types of brackets:
s/(\(|\[)( +|\.+)/$1/g;
# Removes empty sets of brackets:
s/(\(|\[)(\)|\])//g;
# Removes whitespace at beginning of line:
s/^\s+//;
# Removes whitespace at end of line:
s/\s+$//;
# Prints all non-empty lines to file:
if (!/^\s*$/) {
print OUT $_;
}
}

close IN;
close OUT;

The error message (Malformed UTF-8 character (unexpected continuation byte 
0x97, with no preceding start byte) at filecleaner.plx line 23) seems to refer 
to the long dash in line 23. This was copied out of a UTF-8 encoded file in 
2011. If I change that to another UTF-8 long dash copied from another UTF-8 
file downloaded off
the internet, the error message goes away. However, if I copy the dash out of a 
supposedly UTF-8 encoded file made in Word I get the error message.

With the dash fixed, however, the newlines still get stripped out of the file, 
which leaves me at a complete loss, since nothing in the code ought to chomp 
off newline characters.

What could cause such behaviour? Corrupt script file? Corrupted perl 
installation? Some stupid recent Windows update that screwed up UTF-8 and/or 
file handling in Windows XP? Before I went on Christmas holidays
things were fine...

Any ideas?

Thanks,

Thomas
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: segmentation fault

2012-01-04 Thread Motaz SAAD
On Dec 24 2011, 9:07 pm, shlo...@shlomifish.org (Shlomi Fish) wrote:
 Hi Motaz,

 On Thu, 22 Dec 2011 10:57:48 -0800 (PST)

 Motaz SAAD motaz.s...@gmail.com wrote:
  Hello,

  Thanks very much, it is really helpful tool.

 You're welcome.

  my script spend 10 min running until I get segmentation fault error,
  but when I traced my script and it spend 2 days and still running !!!
  I run tracing using -d flag (perl -d:Trace p.pl)
  is this normal ?

 Well, the -d:Trace flag slows down the execution, but it shouldn't be such a
 dramatic difference. I guess you've ran into a Heisenbug:

 http://en.wikipedia.org/wiki/Unusual_software_bug#Heisenbug

 I'm not sure what's causing it, but I guess you can try doing manual traces
 using prints.

 Regards,

         Shlomi Fish

 --
 -
 Shlomi Fish      http://www.shlomifish.org/
 Chuck Norris/etc. Facts -http://www.shlomifish.org/humour/bits/facts/

 And the top story for today: wives live longer than husbands because they are
 not married to women.
     — Colin Mochrie in Who’s Line is it, Anyway?

 Please reply to list if it's a mailing list post -http://shlom.in/reply.

Hello,
Thanks for reply,

I traced the code with print statements, I also run the script with -d
option and the debugger pointed to line
  while(defined($frPage = $frPages-next)) {
which cause the segmentation fault !!!
it is very strange that this statement worked thousands of time then
cause segmentation fault !. I am afraid it is a problem of memory or
it is a bug in cpan Parse::MediaWikiDump package

any comments, tips, will be appreciated

thanks
best regards,
Motaz


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Bizarre problem: Known good script (in 2011) fails to work in 2012

2012-01-04 Thread Jim Gibson

At 11:27 AM + 1/4/12, Hamann, T.D. (Thomas) wrote:
Hi, I am having a rather unusual problem with a script that I wrote 
last year to clean unwanted contents out of UTF-8 encoded text 
files. It worked fine in the past, but when I try to run it now I 
get an error message and somehow all newlines are removed from the 
resulting file. Nothing was changed between 2011 and 2012 in the 
script, which I give below: #!/usr/bin/perl # filecleaner.plx use 
warnings; use strict; use utf8; use open ':encoding(utf8)'; my 
$source = shift @ARGV; my $destination = shift @ARGV; open IN, 
$source or die Can't read source file $source: $!\n; open OUT, 
$destination or die can't write on file $destination: $!\n; 
while (IN) { # Replaces all tab-characters with spaces: 
s/\t/ /g; # Replaces all hyphens that are both preceded and 
trailed by a space by long dashes preceded and trailed by a space: 
s/ - / - /g; # Removes the leading space(s) from a variety of 
unwanted combinations: s/( +)( |\.|,|:|;|\!|\]|\)|\n)/$2/g;


Character classes can save you some typing and improve readability, 
and it is not necessary to capture what you don't want:


s/ +([ .,!])\n])/$1/g;

# Removes multiple dots: s/\.+/./g; # Removes multiple 
commas: s/,+/,/g; # Removes multiple colons: s/:+/:/g; 
# Removes multiple semi-colons: s/;+/;/g; # Removes commas 
before dots: s/(,+)(\.)/$2/g;


You have already replaced successive commas with a single comma, so + 
isn't needed here.


# Removes the trailing spaces and dots behind two types of 
brackets: s/(\(|\[)( +|\.+)/$1/g; # Removes empty sets of 
brackets: s/(\(|\[)(\)|\])//g; # Removes whitespace at 
beginning of line: s/^\s+//; # Removes whitespace at end of 
line: s/\s+$//;


Whitespace includes the new line character!

# Prints all non-empty lines to file: if (!/^\s*$/) { 
print OUT $_; } } close IN; close OUT; The error message 
(Malformed UTF-8 character (unexpected continuation byte 0x97, with 
no preceding start byte) at filecleaner.plx line 23) seems to refer 
to the long dash in line 23. This was copied out of a UTF-8 encoded 
file in 2011. If I change that to another UTF-8 long dash copied 
from another UTF-8 file downloaded off the internet, the error 
message goes away. However, if I copy the dash out of a supposedly 
UTF-8 encoded file made in Word I get the error message.



Sounds like the Word long space isn't valid UTF8.

With the dash fixed, however, the newlines still get stripped out of 
the file, which leaves me at a complete loss, since nothing in the 
code ought to chomp off newline characters.



I suggest you chomp the input and add a newline when you print.

What could cause such behaviour? Corrupt script file? Corrupted perl 
installation? Some stupid recent Windows update that screwed up 
UTF-8 and/or file handling in Windows XP? Before I went on Christmas 
holidays things were fine...


Can't help you there.

--
Jim Gibson
j...@gibson.org

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Bizarre problem: Known good script (in 2011) fails to work in 2012

2012-01-04 Thread Rob Dixon

On 04/01/2012 14:02, Hamann, T.D. (Thomas) wrote:


Okay, some further testing using a family member's Windows XP PC and
a  fresh install of ActivePerl seems to have revealed the culprit:

Changing
 s/\s+$//;
to:
s/(\s+$)(\n)/$2/;

fixed the issue.

Since the script worked fine until about 3 weeks ago and I copied
the  original code from http://www.perlmonks.org/?node_id=2258, I can only
surmise that Microsoft must have changed the way Windows XP deals with
newlines in a very recent update. Which they could have communicated
with the outer world. :(

(oh well, another reason to dislike Microsoft, I guess).

Now for another question: How much code will this change break?


I'm afraid something else must have changed, as /\s+/ has always matched
HT, LF, CR, FF, and space, so that line would always remove a trailing
newline.

In your eagerness to find fuel for your hatred for Microsoft you are
forgetting that Perl normalizes all native file records so that they end
with \n when they are read from the file. Such arbitrary nonsense
impedes proper bug-fixing and has no place on this list - Microsoft is
not a football team.

The usual solution to your problem is to 'chomp' the line terminator
from the end of the line before applying the edits, and then adding it
back again on output.

Precisely why you program has changed behaviour I cannot tell, but be
assured that the code you show has always removed trailing newlines and
the problem must lie elsewhere

Rob

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




perl equivelent of which in bash

2012-01-04 Thread Jim Green
Greetings!
basically I need a perl equivalent in bash that does which and gives me the 
binary path. I need this because I run my script in different systems I want 
the binary automatically adjusted.

Thanks!
Jim.


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: perl equivelent of which in bash

2012-01-04 Thread John SJ Anderson
On Wednesday, January 4, 2012 at 16:37 , Jim Green wrote:
 Greetings!
 basically I need a perl equivalent in bash that does which and gives me the 
 binary path. I need this because I run my script in different systems I want 
 the binary automatically adjusted.
 

Does https://metacpan.org/module/File::Which look like it would fit the bill?

chrs,
john.

 
 
 
 




Re: perl equivelent of which in bash

2012-01-04 Thread Robert Wohlfarth
On Wed, Jan 4, 2012 at 3:37 PM, Jim Green student.northwest...@gmail.comwrote:

 basically I need a perl equivalent in bash that does which and gives me
 the binary path. I need this because I run my script in different systems I
 want the binary automatically adjusted.


A quick search of CPAN for the word which turned up
File::Whichhttp://search.cpan.org/~adamk/File-Which-1.09/lib/File/Which.pm.
It may meet your needs.

-- 
Robert Wohlfarth


Re: perl equivelent of which in bash

2012-01-04 Thread Shawn H Corey

On 12-01-04 04:37 PM, Jim Green wrote:

Greetings!
basically I need a perl equivalent in bash that does which and gives me the 
binary path. I need this because I run my script in different systems I want 
the binary automatically adjusted.

Thanks!
Jim.




Perhaps FindBin is the module you want. It finds the binary of the 
Perl script. See `perldoc FindBin` for details.


FindBin is a standard module and is installed when the rest of Perl is. 
For a list of standard pragmatics and modules, see `perldoc perlmodlib`.



--
Just my 0.0002 million dollars worth,
  Shawn

Programming is as much about organization and communication
as it is about coding.

Never give up your dreams.  Give up your goals, plans,
strategy, tactics, and anything that's not working but never
give up your dreams.

http://www.youtube.com/watch?v=cM5A1K6TxxM
Never, never, never give up.
  Winston Churchill

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




files checksum perl program help

2012-01-04 Thread ram ram
Hi ,
   Wish you a Very Happy and Wonderful New Year.
I am a beginner in perl programming. 

I request your help to write a perl program that takes 2 different directories 
on 2 different servers and find all the files in these 2 directories have the 
same files with same checksums.
can anybody help me on this?

Regards,
Ramp


Re: files checksum perl program help

2012-01-04 Thread Jim Gibson
On 1/4/12 Wed  Jan 4, 2012  3:38 PM, ram ram ram_p...@yahoo.com
scribbled:

 Hi ,
    Wish you a Very Happy and Wonderful New Year.
 I am a beginner in perl programming.
 
 I request your help to write a perl program that takes 2 different directories
 on 2 different servers and find all the files in these 2 directories have the
 same files with same checksums.
 can anybody help me on this?

How are you going to access the files on the two servers? NFS? SMB? FTP?
SCP?

What kind of checksum do you want to use? Are the checksums already
computed, or do you need to compute them yourself?



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: perl equivelent of which in bash

2012-01-04 Thread Jim Green
Thank you! I should have searched cpan.

Jim.


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: perl equivelent of which in bash

2012-01-04 Thread John W. Krahn

Jim Green wrote:

Greetings!


Hello,


basically I need a perl equivalent in bash that does which and gives
me the binary path. I need this because I run my script in different
systems I want the binary automatically adjusted.



use Env q/@PATH/;

my $file = shift or exit 1;

my @which_files = grep -x, map $_/$file, @PATH;




John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Bizarre problem: Known good script (in 2011) fails to work in 2012

2012-01-04 Thread John W. Krahn

Hamann, T.D. (Thomas) wrote:

Hi,


Hello,

I see that you've found the prolem but I'd like to make some comments.



I am having a rather unusual problem with a script that I wrote last
year to clean unwanted contents out of UTF-8 encoded text files. It
worked fine in the past, but when I try to run it now I get an error
message and somehow all newlines are removed from the resulting file.
Nothing was changed between 2011 and 2012 in the script, which I give
below:

#!/usr/bin/perl
# filecleaner.plx

use warnings;
use strict;
use utf8;
use open ':encoding(utf8)';

my $source = shift @ARGV;
my $destination = shift @ARGV;


It might be better to have some error checking here:

@ARGV == 2 or die usage: filecleaner.plx source file name 
destination file name\n;


my ( $source, $destination ) = @ARGV;



open IN, $source or die Can't read source file $source: $!\n;
open OUT, $destination or die can't write on file $destination: $!\n;

while (IN) {
 # Replaces all tab-characters with spaces:
 s/\t/ /g;


Replacing single characters would be better using the tr/// operator:

  tr/\t/ /;



 # Replaces all hyphens that are both preceded and trailed by a space by 
long dashes preceded and trailed by a space:
 s/ - / — /g;
 # Removes the leading space(s) from a variety of unwanted combinations:
 s/( +)( |\.|,|:|;|\!|\]|\)|\n)/$2/g;


It is better to use a character class instead of alternation for single 
character alternatives:


   s/( +)([ .,:;!\])\n])/$2/g;

And you don't need to capture $1 if you are not going to use it:

   s/ +([ .,:;!\])\n])/$1/g;

Nor do you need to capture anything at all:

   s/ +(?=[ .,:;!\])\n])//g;



 # Removes multiple dots:
 s/\.+/./g;
 # Removes multiple commas:
 s/,+/,/g;
 # Removes multiple colons:
 s/:+/:/g;
 # Removes multiple semi-colons:
 s/;+/;/g;


Those four substitution operators can be replaced with one transliteration:

  tr/.,:;//s;



 # Removes commas before dots:
 s/(,+)(\.)/$2/g;


Again, no need to capture anything:

   s/,+(?=\.)//g;



 # Removes the trailing spaces and dots behind two types of brackets:


This removes trailing spaces OR dots, not trailing spaces AND dots


 s/(\(|\[)( +|\.+)/$1/g;


   s/(?=[([])( +|\.+)//g;



 # Removes empty sets of brackets:
 s/(\(|\[)(\)|\])//g;


   s/\(\)|\[\]//g;



 # Removes whitespace at beginning of line:
 s/^\s+//;
 # Removes whitespace at end of line:
 s/\s+$//;
 # Prints all non-empty lines to file:
 if (!/^\s*$/) {


   if ( /\S/ ) {



 print OUT $_;
 }
}

close IN;
close OUT;




John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/