RE: Looking for suggestions - how to sort & merge comma separated quoted string files

2004-11-26 Thread McGlinchy, Alistair
Bill,

> The problem is that some of the double-quoted strings contain 
> newlines. OK, there are a lot of newlines in these quoted 
> strings - and they have to remain in the data.

You could be lucky and get away with setting the record delimiter to be
a qw(\"\n). It depends on how well you can trust your source to have a
\n iff it is the end of a record or an embedded quite. If this isn't
good enough check out Text::CSV and similar modules on CPAN.

Cheers,

Alistair


$/=qq(\"\n);
while() {
print "Record $. is :".$_
}

__DATA__ 
"1100","","new","yes"
"1101","today is the best 
day of my life","old","no"
"1102","
Evil case with \n being the first char of a quoted string" 
"1102","He said \"
Is this a case we may have to deal with?\""
"1103","She replied ""
No, but perhaps this style of escapeing this is possible."""


---

==
Registered Office:
Marks and Spencer plc
Waterside House
35 North Wharf Road
London
W2 1NW

Registered No. 214436 in England and Wales.

Telephone (020) 7935 4422
Facsimile (020) 7487 2670

<>

Please note that electronic mail may be monitored.

This e-mail is confidential. If you received it by mistake, please let us know 
and then delete it from your system; you should not copy, disclose, or 
distribute its contents to anyone nor act in reliance on this e-mail, as this 
is prohibited and may be unlawful.

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Looking for suggestions - how to sort & merge comma separated quoted string files

2004-11-24 Thread Kalarness, Bill
Hi,

I have three files of data which consists of records of double-quoted
strings that are comma separated.  The first data "field" for each
record contains a unique sequence number.  All the records in the files
contain the same number of "fields".

Step 1: I need to merge two of these files of records together, sorted
on the the sequence number. (into a separate output file)
Step 2: I need to compare this merged output file (call it A) with the
third input file.  This time, I need to look for missing sequence
numbers - (they increment by 1) - and insert dummy records for these
missing sequence numbers into the merged output file (A).

I wouldn't think this is too difficult, but...

The problem is that some of the double-quoted strings contain newlines.
OK, there are a lot of newlines in these quoted strings - and they have
to remain in the data.

So, when I try to open a filehandle to attempt to separate the file
contents on the commas as I'm trying to put the file contents into an
array - I only get the first line of file data.

Here's my code  (try not to laugh too hard!)


  select STDOUT;
  $|=1;
  select STDERR;
  $|=1;


  $InFile1 = (qq/$ENV{'TEMP'}\\InFile1.txt/);

open INFILE1, "< $InFile1";
@InArray1 = split(",", );
print "did it\n";
close INFILE1;

foreach $InArray1(@InArray1)
{
print $InArray1;
}

My resulting output is:

C:\Temp>perl -w migration3.pl
did it
"1100new""yes"

The contents of my test file (InFile1.txt) are:

"1100","","new","yes"
"1101","today is the best 
day of my life","old","no"


This is just a tiny test file that I tossed together.  The real data
consists of ~20 fields per record.  Of these, there are maybe 8-12
fields that could contain embedded newlines.  There could be multiple
newlines in any of these fields.

Also, the actual data files are not huge.  Each is under 7Mb.


I did look around to see if there were any modules available that would
help me out.  I looked at File::Sort, Sort::Merge, and File:MergeSort.
I'm not sure how to get past this first hurdle.  These modules are
either looking at the file contents line by line or the input mechanism
is completely open, and I would need to supply my own.

Any assistance is appreciated! :)

Best Regards,

-Bill


___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs