Re: Data file with records that span two lines - REVISITED

Perl Noob Tue, 19 Jan 2010 15:04:31 -0800

> On 1/19/2010 12:09 AM, Perl Noob wrote:
>> I have a data file with thousands of records.  The problem is that
>> the
>> records in the data file span two lines for each record.  I want to
>> write a perl script that makes each record a single line.  The file
>> looks like this:
>>
>> RECORD1FIELD1  RECORD1FIELD2     RECORD1FIELD3  RECORD1FIELD3
>>            RECORD1FIELD4          RECORD1FIELD5
>>
>> RECORD2FIELD1  RECORD2FIELD2     RECORD2FIELD3  RECORD2FIELD3
>>            RECORD2FIELD4          RECORD2FIELD5
>>
>>   . . .
>>
>> What I want is this:
>>
>> RECORD1FIELD1  . . .RECORD1FIELD5
>> RECORD2FIELD1  . . .RECORD2FIELD5
>>
>>
>> The second line of each record actually has a bunch of spaces before
>> the first field.  I thought I could exploit this with:
>>
>> s/\n                                //gi;
>>
>> what I thought would happen is the script would look for a new line
>> followed by a bunch of empty spaces and delete only those.  But that
>> didn't work.
>>
>> Using a hex editor I saw that each new line was 0D 0A. I then tried:
>>
>> s/\x0D\x0A//gi;
>>
>> that didn't work either.
>>
>> I just want to move the second line of each record to the end of the
>> first.  It seems so simple, but I am exhausted of trying different
>> things.
>>
>>
>>
>>
>
> I see a couple of choices.  Your example data seems to have an
> extra newline between logical records.  If that's true, then
> you can read them as paragraphs, e.g.,
>
>       1  #!/usr/bin/perl
>       2
>       3  use warnings;
>       4  use strict;
>       5
>       6  $/ = "\n\n";  # one of the paragraph modes
>       7
>       8  while( <DATA> ) {
>       9      my @fields = split;
>      10      print "@fields\n";
>      11  }
>      12
>      13
>      14  __DATA__
>      15  RECORD1FIELD1  RECORD1FIELD2     RECORD1FIELD3  RECORD1FIELD3
>      16            RECORD1FIELD4          RECORD1FIELD5
>      17
>      18  RECORD2FIELD1  RECORD2FIELD2     RECORD2FIELD3  RECORD2FIELD3
>      19            RECORD2FIELD4          RECORD2FIELD5
>      20
>
> If the apparent extra newline was not intentional, then
> you could simply read two lines at a time, e.g.,
>
>       1  #!/usr/bin/perl
>       2
>       3  use warnings;
>       4  use strict;
>       5
>       6  while( <DATA> ) {
>       7      $_ .= <DATA>;
>       8      my @fields = split;
>       9      print "@fields\n";
>      10  }
>      11
>      12
>      13  __DATA__
>      14  RECORD1FIELD1  RECORD1FIELD2     RECORD1FIELD3  RECORD1FIELD3
>      15            RECORD1FIELD4          RECORD1FIELD5
>      16  RECORD2FIELD1  RECORD2FIELD2     RECORD2FIELD3  RECORD2FIELD3
>      17            RECORD2FIELD4          RECORD2FIELD5
>
>
> --
> Brad


I am AMAZED at the help available in this forum.  It is an awesome
resource.  I can see, though, that my situation needs to be stated
more clearly.

The data is not consistent throughout the entire file.  I WISH I only
had to skip every other line.  The problem is not quite that simple. 
The data I need is always consistent within the file, but is not so
neat as to be on every other line.  The common characteristic of the
data I need is that the record has an end of line marker followed by
65 spaces on the following line.  Here is a better sample of what I
described:

_______BEGIN SAMPLE DATA FILE_________________
RandomJunkNothingImportantMoreJunk
StuffthatdoesntmatterWhocaresaboutthis
RECORD1FIELD1(3 spaces)RECORD1FIELD2(3 spaces)RECORD1FIELD3(newline)
(65 spaces)RECORD1FIELD4(12 spaces)RECORD1FIELD5
RECORD2FIELD1(3 spaces)RECORD2FIELD2(3 spaces)RECORD2FIELD3(newline)
(65 spaces)RECORD2FIELD4(12 spaces)RECORD2FIELD5
RandomJunkNothingImportantMoreJunk
StuffthatdoesntmatterWhocaresaboutthis
MoreJunkThatDoesntmatterStuffIdontwantWhocaresaboutthis
RECORD3FIELD1(3 spaces)RECORD3FIELD2(3 spaces)RECORD3FIELD3(newline)
(65 spaces)RECORD3FIELD4(12 spaces)RECORD3FIELD5
RECORD4FIELD1(3 spaces)RECORD4FIELD2(3 spaces)RECORD4FIELD3(newline)
(65 spaces)RECORD4FIELD4(12 spaces)RECORD4FIELD5
RECORD5FIELD1(3 spaces)RECORD5FIELD2(3 spaces)RECORD5FIELD3(newline)
(65 spaces)RECORD5FIELD4(12 spaces)RECORD5FIELD5
RECORD6FIELD1(3 spaces)RECORD6FIELD2(3 spaces)RECORD6FIELD3(newline)
(65 spaces)RECORD6FIELD4(12 spaces)RECORD6FIELD5
___________END SAMPLE DATA FILE ____________________


You will notice in the sample above that the only consistent items
between the usable data is the (newline) followed by (65 spaces). 
Therefore if I could find a way to do a search and replace

s/(newline)(65spaces)//gi;

that would be great.  I just need to get each (newline)followed by
(65spaces) and delete it.  I just am not sure how to do that.  My
brain hurts.


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: Data file with records that span two lines - REVISITED

Reply via email to