hi,how to get the correct statistic

2006-02-21 Thread Joodhawk Lin
hi all,
 
i copy source from 
http://www.planet-source-code.com/vb/scripts/ShowCodeAsText.asp?txtCodeId=481&lngWId=6,
 the piece source aims to  merges 2 or more text files into one more manageable 
file. and it also remove the duplicates and comments (start with #).
 
i run it on the windows 2003 server dos console: 
---
C:\Inetpub\wwwroot\cgi-bin\test>perl WorldListMerger.pl
Moth Merger 1.0 Wordlist Merger
Please specify an output file:test7
Please specify an input file:a.txt
5 added to test7
1 duplicates in a.txtAdd another file?(y/n):y
Please specify an input file:b.txt
7 added to test7
2 duplicates in b.txtAdd another file?(y/n):y
Please specify an input file:c.txt
1 added to test7
0 duplicates in c.txtAdd another file?(y/n):n
---
 
in a.txt:
a
b
c
#comments by joodhawk.
d
a
c
in b.txt
e
e
f
g
h
i
j
g
f
in c.txt
zzz
 
it is the incorrect result.
as we excepted,  such as in the a.txt, we know 2 duplicates apparently. 
how to correct it ?
 
thanks in advance.


RE: hi,how to get the correct statistic

2006-02-21 Thread Charles K. Clarkson
Joodhawk Lin wrote:
: hi all,
:
: i copy source from
:
http://www.planet-source-code.com/vb/scripts/ShowCodeAsText.asp?txtCodeId=48
1&lngWId=6,
: the piece source aims to  merges 2 or more text files into one more
: manageable file. and it also remove the duplicates and comments
: (start with #).   
[snip]
:
: it is the incorrect result. as we excepted, such as in the
: a.txt, we know 2 duplicates apparently.

Not necessarily. One of the duplicate words in a.txt is
the last line of the file. Does that line end with a new
line character? Many text files do not. If it doesn't, this
script will chop() the 'c' of the end of the word which
will not match the previous line with a 'c' because on that
line the line ending was chopped off. ('c' != '')

Also, we cannot tell from your example that there is no
stray white space in the files. The dated code you are using
does not check for line endings (it uses chop()) and it does
not strip for white space characters. The very fact that you
didn't mention white space characters in your message leads me
to believe they may be there.


: how to correct it ?

Rewrite it.

The script was probably written as a utility for a very
short term solution and was unlikely meant to be publicly
used or traded. The author does not verify I/O operations,
uses chop() where chomp() is more appropriate, has no error
checking, is not using lexical variables, and seems a little
unorganized.

My advice would be to check your data files first to be
certain your perceived errors are real errors and to stay
away from this script if you are planning to put this into a
production environment. Write your own script which follows
more modern perl standards and checks for stray white space
characters and missing last line line endings.


HTH,

Charles K. Clarkson
-- 
Mobile Homes Specialist
254 968-8328



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]