Re: regex - no field seperator

2005-08-18 Thread John Doe
Keenan, Greg John (Greg)** CTR ** am Donnerstag, 18. August 2005 05.34:
> Hi,
>
> I have the following data that I'm trying to parse into an array.  There
> are 19 fields but with hosts 5 & 6 fields 6 & 7 do not have any space
> between them.  This is how I get it from the OS and have no control over
> it.
>
> The maximum length for field 6 is 7 chars and field 7 is 6 chars.
>
> 200508171648 host1.dom.com 0 0 14 2166 623 8 4 12 0 0 0 35 131 14 0 0 100
> 200508171648 host2.dom.com 0 0 0 265 7563 5 3 8 0 0 0 34 66 7 0 0 100
> 200508171648 host3.dom.com 0 0 0 461 8112 4 0 6 0 0 0 53 84 9 0 0 100
> 200508171648 host4.dom.com 0 0 0 46 9468 5 3 9 0 0 0 39 75 8 0 2 98
> 200508171648 host5.dom.com 0 1 0 7008342480 3 0 0 0 0 0 0 41 8 0 2 98
> 200508171648 host6.dom.com 0 1 0 8936445548 3 0 0 0 0 0 0 14 5 0 0 100
>
> I have tried the following, and several other combos, with no luck.  It
> matches the first 4 lines but fails for the last 2 because they appear to
> have only 18 fields I assume.
> @oput = /(\d+) (.+\..+\..+) (\d+) (\d+) (\d+) (\d{2,7}) (\d{2,6}) (\d+)
> (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+)/;

The (\d{2,7}) (\d{2,6}) part, containing a space, won't mach concatenated 
fields 6 & 7.

> Can someone point me in the right direction please?

The biggest problem might be that you don't know where to split e.g. 
8936445548 in the last line into two numbers, if you know field 6 is between 
2 and 7, and field 7 between 2 and 6 digits long.

The rest of the problem as I understand it is that the layout of fields 6  & 7 
(2166 623 in the 1st, 8936445548 in the last line) are not perceptible 
"locally", but only by looking at the whole line. (I think this is an error of 
the app producing the lines).

So, you have to take in account the whole line to extract fields 6 & 7.

There is for sure a way to do that in a single regex.
Also, consider the use of split in records with field separators (basically, 
here it may be missing at one place).

One basic idea to parse the lines could be:

1. split the line on space into an array.
2. Count the number of entries
3.a) 19 entries: fields 6 & 7 contained separately
3.b) 18 entries: fields 6 & 7 concatenated, handle separately
4. Handle the host field(s) and adjust the array according to the number of 
entries.

>
> Thanks Greg.

joe

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: regex - no field seperator

2005-08-18 Thread Wagner, David --- Senior Programmer Analyst --- WGO
Keenan, Greg John (Greg)** CTR ** wrote:
> Hi,
> 
> I have the following data that I'm trying to parse into an array. 
> There are 19 fields but with hosts 5 & 6 fields 6 & 7 do not have any
> space between them.  This is how I get it from the OS and have no
> control over it. 
> 
> The maximum length for field 6 is 7 chars and field 7 is 6 chars.
> 
> 200508171648 host1.dom.com 0 0 14 2166 623 8 4 12 0 0 0 35 131 14 0 0 100
> 200508171648 host2.dom.com 0 0 0 265 7563 5 3 8 0 0 0 34 66 7 0 0 100
> 200508171648 host3.dom.com 0 0 0 461 8112 4 0 6 0 0 0 53 84 9 0 0 100
> 200508171648 host4.dom.com 0 0 0 46 9468 5 3 9 0 0 0 39 75 8 0 2 98
> 200508171648 host5.dom.com 0 1 0 7008342480 3 0 0 0 0 0 0 41 8 0 2 98
> 200508171648 host6.dom.com 0 1 0 8936445548 3 0 0 0 0 0 0 14 5 0 0 100 
> 
> I have tried the following, and several other combos, with no luck. 
> It matches the first 4 lines but fails for the last 2 because they
> appear to have only 18 fields I assume.
> @oput = /(\d+) (.+\..+\..+) (\d+) (\d+) (\d+) (\d{2,7}) (\d{2,6})
> (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+)
> (\d+)/; 
> 
You are working much too hard to capture the data. Use split like:

@oput = split (/\s+/,$_);
You say it is a total of 13 characters, but in this case you have 10 
characters. How do you identify which field is full? Once you do that then the 
ability to get it can be done. But you have to first identify how to know out 
say in this case the 10 chaacters what the proper split is?

Wags ;)
> Can someone point me in the right direction please?
> 
> Thanks Greg.



***
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
***


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: regex - no field seperator

2005-08-18 Thread Bryan R Harris

>> I have the following data that I'm trying to parse into an array.
>> There are 19 fields but with hosts 5 & 6 fields 6 & 7 do not have any
>> space between them.  This is how I get it from the OS and have no
>> control over it.
>> 
>> The maximum length for field 6 is 7 chars and field 7 is 6 chars.
>> 
>> 200508171648 host1.dom.com 0 0 14 2166 623 8 4 12 0 0 0 35 131 14 0 0 100
>> 200508171648 host2.dom.com 0 0 0 265 7563 5 3 8 0 0 0 34 66 7 0 0 100
>> 200508171648 host3.dom.com 0 0 0 461 8112 4 0 6 0 0 0 53 84 9 0 0 100
>> 200508171648 host4.dom.com 0 0 0 46 9468 5 3 9 0 0 0 39 75 8 0 2 98
>> 200508171648 host5.dom.com 0 1 0 7008342480 3 0 0 0 0 0 0 41 8 0 2 98
>> 200508171648 host6.dom.com 0 1 0 8936445548 3 0 0 0 0 0 0 14 5 0 0 100
>> 
>> I have tried the following, and several other combos, with no luck.
>> It matches the first 4 lines but fails for the last 2 because they
>> appear to have only 18 fields I assume.
>> @oput = /(\d+) (.+\..+\..+) (\d+) (\d+) (\d+) (\d{2,7}) (\d{2,6})
>> (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+)
>> (\d+)/; 
>> 
> You are working much too hard to capture the data. Use split like:
> 
> @oput = split (/\s+/,$_);


@oput = split;

...works too.  By default, split operates on $_, and splits on whitespace,
discarding any initial whitespace.

I owe dinner to whoever originally made that decision.

- B




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: regex - no field seperator

2005-08-18 Thread Keenan, Greg John (Greg)** CTR **
-Original Message-
>From: Wagner, David --- Senior Programmer Analyst --- WGO
>[mailto:[EMAIL PROTECTED] 
>Sent: Friday, 19 August 2005 3:21 AM
>To: Keenan, Greg John (Greg)** CTR **; beginners@perl.org
>Subject: RE: regex - no field seperator
>
>Keenan, Greg John (Greg)** CTR ** wrote:
>> Hi,
>> 
>> I have the following data that I'm trying to parse into an array. 
>> There are 19 fields but with hosts 5 & 6 fields 6 & 7 do not have any 
>> space between them.  This is how I get it from the OS and have no 
>> control over it.
>> 
>> The maximum length for field 6 is 7 chars and field 7 is 6 chars.
>> 
>> 200508171648 host1.dom.com 0 0 14 2166 623 8 4 12 0 0 0 35 131 14 0 0 
>> 100
>> 200508171648 host2.dom.com 0 0 0 265 7563 5 3 8 0 0 0 34 66 7 0 0 100
>> 200508171648 host3.dom.com 0 0 0 461 8112 4 0 6 0 0 0 53 84 9 0 0 100
>> 200508171648 host4.dom.com 0 0 0 46 9468 5 3 9 0 0 0 39 75 8 0 2 98
>> 200508171648 host5.dom.com 0 1 0 7008342480 3 0 0 0 0 0 0 41 8 0 2 98
>> 200508171648 host6.dom.com 0 1 0 8936445548 3 0 0 0 0 0 0 14 5 0 0 100
>> 
>> I have tried the following, and several other combos, with no luck. 
>> It matches the first 4 lines but fails for the last 2 because they 
>> appear to have only 18 fields I assume.
>> @oput = /(\d+) (.+\..+\..+) (\d+) (\d+) (\d+) (\d{2,7}) (\d{2,6})
>> (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) 
>> (\d+)/;
>> 
>   You are working much too hard to capture the data. Use split like:
>
>   @oput = split (/\s+/,$_);
>You say it is a total of 13 characters, but in this case you have 10
>characters. How do you identify which field is full? Once you do that
>then >the ability to get it can be done. But you have to first
>identify how to know out say in this case the 10 chaacters what
>the proper split is?

Fields 6 & 7 could be a minimum of 2 chars or 7 & 6 chars respectively but
the only time fields 6 & 7 merge is if field 7 has reached its maximum
length of 6 chars.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




RE: regex - no field seperator

2005-08-18 Thread Wagner, David --- Senior Programmer Analyst --- WGO
Keenan, Greg John (Greg)** CTR ** wrote:
> -Original Message-
>> From: Wagner, David --- Senior Programmer Analyst --- WGO
>> [mailto:[EMAIL PROTECTED]
>> Sent: Friday, 19 August 2005 3:21 AM
>> To: Keenan, Greg John (Greg)** CTR **; beginners@perl.org
>> Subject: RE: regex - no field seperator
>> 
>> Keenan, Greg John (Greg)** CTR ** wrote:
>>> Hi,
>>> 
>>> I have the following data that I'm trying to parse into an array.
>>> There are 19 fields but with hosts 5 & 6 fields 6 & 7 do not have
>>> any space between them.  This is how I get it from the OS and have
>>> no control over it. 
>>> 
>>> The maximum length for field 6 is 7 chars and field 7 is 6 chars.
>>> 
>>> 200508171648 host1.dom.com 0 0 14 2166 623 8 4 12 0 0 0 35 131 14 0
>>> 0 100 200508171648 host2.dom.com 0 0 0 265 7563 5 3 8 0 0 0 34 66 7
>>> 0 0 100 200508171648 host3.dom.com 0 0 0 461 8112 4 0 6 0 0 0 53 84
>>> 9 0 0 100 200508171648 host4.dom.com 0 0 0 46 9468 5 3 9 0 0 0 39
>>> 75 8 0 2 98 200508171648 host5.dom.com 0 1 0 7008342480 3 0 0 0 0 0
>>> 0 41 8 0 2 98 200508171648 host6.dom.com 0 1 0 8936445548 3 0 0 0 0
>>> 0 0 14 5 0 0 100 
>>> 
>>> I have tried the following, and several other combos, with no luck.
>>> It matches the first 4 lines but fails for the last 2 because they
>>> appear to have only 18 fields I assume.
>>> @oput = /(\d+) (.+\..+\..+) (\d+) (\d+) (\d+) (\d{2,7}) (\d{2,6})
>>> (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+)
>>> (\d+)/; 
>>> 
>>  You are working much too hard to capture the data. Use split like:
>> 
>>  @oput = split (/\s+/,$_);
>> You say it is a total of 13 characters, but in this case you have 10
>> characters. How do you identify which field is full? Once you do that
>> then >the ability to get it can be done. But you have to first
>> identify how to know out say in this case the 10 chaacters what
>> the proper split is?
> 
> Fields 6 & 7 could be a minimum of 2 chars or 7 & 6 chars
> respectively but the only time fields 6 & 7 merge is if field 7 has
> reached its maximum length of 6 chars.

Here is a shot:

#!perl

use strict;
use warnings;
my @oput = ();

while (  ) {
chomp;
@oput = split (/\s+/,$_);

if ( scalar(@oput) != 19 ) {# missing a field, make one assumption
if ( length($oput[5]) > 6 ) {   # you have two fields combined
my $MyLen = length($oput[5]) - 6;# stated only combines if 7 
has 6 characters

my $MyWrkFld = substr($oput[5],$MyLen); # Move the 6 chars
substr($oput[5],$MyLen,6) = ''; # Delete the 6 
characters
splice(@oput,6,0,$MyWrkFld);# insert into 
array
 }
 }
printf "<%s>\n", join(';', @oput);
 }
 
__DATA__
200508171648 host1.dom.com 0 0 14 2166 623 8 4 12 0 0 0 35 131 14 0 0 100
200508171648 host2.dom.com 0 0 0 265 7563 5 3 8 0 0 0 34 66 7 0 0 100
200508171648 host3.dom.com 0 0 0 461 8112 4 0 6 0 0 0 53 84 9 0 0 100
200508171648 host4.dom.com 0 0 0 46 9468 5 3 9 0 0 0 39 75 8 0 2 98
200508171648 host5.dom.com 0 1 0 7008342480 3 0 0 0 0 0 0 41 8 0 2 98
200508171648 host6.dom.com 0 1 0 8936445548 3 0 0 0 0 0 0 14 5 0 0 100 

Output:
<200508171648;host1.dom.com;0;0;14;2166;623;8;4;12;0;0;0;35;131;14;0;0;100>
<200508171648;host2.dom.com;0;0;0;265;7563;5;3;8;0;0;0;34;66;7;0;0;100>
<200508171648;host3.dom.com;0;0;0;461;8112;4;0;6;0;0;0;53;84;9;0;0;100>
<200508171648;host4.dom.com;0;0;0;46;9468;5;3;9;0;0;0;39;75;8;0;2;98>
<200508171648;host5.dom.com;0;1;0;7008;342480;3;0;0;0;0;0;0;41;8;0;2;98>
<200508171648;host6.dom.com;0;1;0;8936;445548;3;0;0;0;0;0;0;14;5;0;0;100>


***
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
***


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Re: regex - no field seperator

2005-08-18 Thread John W. Krahn
Keenan, Greg John (Greg)** CTR ** wrote:
>>From: Wagner, David --- Senior Programmer Analyst --- WGO
>>
>>Keenan, Greg John (Greg)** CTR ** wrote:
>>>
>>>I have the following data that I'm trying to parse into an array. 
>>>There are 19 fields but with hosts 5 & 6 fields 6 & 7 do not have any 
>>>space between them.  This is how I get it from the OS and have no 
>>>control over it.
>>>
>>>The maximum length for field 6 is 7 chars and field 7 is 6 chars.
>>>
>>>200508171648 host1.dom.com 0 0 14 2166 623 8 4 12 0 0 0 35 131 14 0 0 
>>>100
>>>200508171648 host2.dom.com 0 0 0 265 7563 5 3 8 0 0 0 34 66 7 0 0 100
>>>200508171648 host3.dom.com 0 0 0 461 8112 4 0 6 0 0 0 53 84 9 0 0 100
>>>200508171648 host4.dom.com 0 0 0 46 9468 5 3 9 0 0 0 39 75 8 0 2 98
>>>200508171648 host5.dom.com 0 1 0 7008342480 3 0 0 0 0 0 0 41 8 0 2 98
>>>200508171648 host6.dom.com 0 1 0 8936445548 3 0 0 0 0 0 0 14 5 0 0 100
>>>
>>>I have tried the following, and several other combos, with no luck. 
>>>It matches the first 4 lines but fails for the last 2 because they 
>>>appear to have only 18 fields I assume.
>>>@oput = /(\d+) (.+\..+\..+) (\d+) (\d+) (\d+) (\d{2,7}) (\d{2,6})
>>>(\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) 
>>>(\d+)/;
>>>
>>  You are working much too hard to capture the data. Use split like:
>>
>>  @oput = split (/\s+/,$_);
>>You say it is a total of 13 characters, but in this case you have 10
>>characters. How do you identify which field is full? Once you do that
>>then >the ability to get it can be done. But you have to first
>>identify how to know out say in this case the 10 chaacters what
>>the proper split is?
> 
> Fields 6 & 7 could be a minimum of 2 chars or 7 & 6 chars respectively but
> the only time fields 6 & 7 merge is if field 7 has reached its maximum
> length of 6 chars.

Well then, that should be easy enough.  :-)


while (  ) {

my @oput = split;

if ( @oput == 18 ) {
splice @oput, 5, 1, $oput[ 5 ] =~ /(.+)(.{6})/;
}
elsif ( @oput != 19 ) {
warn "Error in $file line $. - wrong number of input fields.\n";
next;
}

do_something_with( @oput );
}




John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: regex - no field seperator

2005-08-18 Thread Keenan, Greg John (Greg)** CTR **
 -Original Message-
From: John W. Krahn [mailto:[EMAIL PROTECTED] 
Sent: Friday, 19 August 2005 10:26 AM
To: Perl Beginners
Subject: Re: regex - no field seperator

Keenan, Greg John (Greg)** CTR ** wrote:
>>From: Wagner, David --- Senior Programmer Analyst --- WGO
>>
>>Keenan, Greg John (Greg)** CTR ** wrote:
>>>
>>>I have the following data that I'm trying to parse into an array. 
>>>There are 19 fields but with hosts 5 & 6 fields 6 & 7 do not have any 
>>>space between them.  This is how I get it from the OS and have no 
>>>control over it.
>>>
>>>The maximum length for field 6 is 7 chars and field 7 is 6 chars.
>>>
>>>200508171648 host1.dom.com 0 0 14 2166 623 8 4 12 0 0 0 35 131 14 0 0 
>>>100
>>>200508171648 host2.dom.com 0 0 0 265 7563 5 3 8 0 0 0 34 66 7 0 0 100
>>>200508171648 host3.dom.com 0 0 0 461 8112 4 0 6 0 0 0 53 84 9 0 0 100
>>>200508171648 host4.dom.com 0 0 0 46 9468 5 3 9 0 0 0 39 75 8 0 2 98
>>>200508171648 host5.dom.com 0 1 0 7008342480 3 0 0 0 0 0 0 41 8 0 2 98
>>>200508171648 host6.dom.com 0 1 0 8936445548 3 0 0 0 0 0 0 14 5 0 0 
>>>100
>>>
>>>I have tried the following, and several other combos, with no luck. 
>>>It matches the first 4 lines but fails for the last 2 because they 
>>>appear to have only 18 fields I assume.
>>>@oput = /(\d+) (.+\..+\..+) (\d+) (\d+) (\d+) (\d{2,7}) (\d{2,6})
>>>(\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) (\d+) 
>>>(\d+)/;
>>>
>>  You are working much too hard to capture the data. Use split like:
>>
>>  @oput = split (/\s+/,$_);
>>You say it is a total of 13 characters, but in this case you have 10 
>>characters. How do you identify which field is full? Once you do that 
>>then >the ability to get it can be done. But you have to first 
>>identify how to know out say in this case the 10 chaacters what the 
>>proper split is?
> 
> Fields 6 & 7 could be a minimum of 2 chars or 7 & 6 chars respectively 
> but the only time fields 6 & 7 merge is if field 7 has reached its 
> maximum length of 6 chars.

Well then, that should be easy enough.  :-)


while (  ) {

my @oput = split;

if ( @oput == 18 ) {
splice @oput, 5, 1, $oput[ 5 ] =~ /(.+)(.{6})/;
}
elsif ( @oput != 19 ) {
warn "Error in $file line $. - wrong number of input fields.\n";
next;
}

do_something_with( @oput );
}


Thanks to David & John for their excellent solutions.  I've learnt a little
bit more about perl & regexs over the last few days.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>