RE: Looking for suggestions - how to sort & merge comma separated quoted string files

2004-11-26 Thread McGlinchy, Alistair

> The problem is that some of the double-quoted strings contain 
> newlines. OK, there are a lot of newlines in these quoted 
> strings - and they have to remain in the data.

You could be lucky and get away with setting the record delimiter to be
a qw(\"\n). It depends on how well you can trust your source to have a
\n iff it is the end of a record or an embedded quite. If this isn't
good enough check out Text::CSV and similar modules on CPAN.



while() {
print "Record $. is :".$_

"1101","today is the best 
day of my life","old","no"
Evil case with \n being the first char of a quoted string" 
"1102","He said \"
Is this a case we may have to deal with?\""
"1103","She replied ""
No, but perhaps this style of escapeing this is possible."""


Registered Office:
Marks and Spencer plc
Waterside House
35 North Wharf Road
W2 1NW

Registered No. 214436 in England and Wales.

Telephone (020) 7935 4422
Facsimile (020) 7487 2670


Please note that electronic mail may be monitored.

This e-mail is confidential. If you received it by mistake, please let us know 
and then delete it from your system; you should not copy, disclose, or 
distribute its contents to anyone nor act in reliance on this e-mail, as this 
is prohibited and may be unlawful.

Perl-Win32-Users mailing list
To unsubscribe:

Looking for suggestions - how to sort & merge comma separated quoted string files

2004-11-24 Thread Kalarness, Bill

I have three files of data which consists of records of double-quoted
strings that are comma separated.  The first data "field" for each
record contains a unique sequence number.  All the records in the files
contain the same number of "fields".

Step 1: I need to merge two of these files of records together, sorted
on the the sequence number. (into a separate output file)
Step 2: I need to compare this merged output file (call it A) with the
third input file.  This time, I need to look for missing sequence
numbers - (they increment by 1) - and insert dummy records for these
missing sequence numbers into the merged output file (A).

I wouldn't think this is too difficult, but...

The problem is that some of the double-quoted strings contain newlines.
OK, there are a lot of newlines in these quoted strings - and they have
to remain in the data.

So, when I try to open a filehandle to attempt to separate the file
contents on the commas as I'm trying to put the file contents into an
array - I only get the first line of file data.

Here's my code  (try not to laugh too hard!)

  select STDOUT;
  select STDERR;

  $InFile1 = (qq/$ENV{'TEMP'}\\InFile1.txt/);

open INFILE1, "< $InFile1";
@InArray1 = split(",", );
print "did it\n";
close INFILE1;

foreach $InArray1(@InArray1)
print $InArray1;

My resulting output is:

C:\Temp>perl -w
did it

The contents of my test file (InFile1.txt) are:

"1101","today is the best 
day of my life","old","no"

This is just a tiny test file that I tossed together.  The real data
consists of ~20 fields per record.  Of these, there are maybe 8-12
fields that could contain embedded newlines.  There could be multiple
newlines in any of these fields.

Also, the actual data files are not huge.  Each is under 7Mb.

I did look around to see if there were any modules available that would
help me out.  I looked at File::Sort, Sort::Merge, and File:MergeSort.
I'm not sure how to get past this first hurdle.  These modules are
either looking at the file contents line by line or the input mechanism
is completely open, and I would need to supply my own.

Any assistance is appreciated! :)

Best Regards,


Perl-Win32-Users mailing list
To unsubscribe: