RE: Split based on length

NYIMI Jose (BMB) Mon, 22 Sep 2003 01:20:10 -0700

I don't know which version of perl are you using but
I have just tried again the script i sent to you
And the ones with your modifications , both work well and do not
Die as you explained : fatal error "x outside of string"
Below is my perl -v output :
C:\>perl -v

This is perl, v5.6.1 built for MSWin32-x86-multi-thread
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2001, Larry Wall

Binary build 631 provided by ActiveState Tool Corp. http://www.ActiveState.com
Built 17:16:22 Jan  2 2002

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'.  If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.

José.

-----Original Message-----
From: zsdc [mailto:[EMAIL PROTECTED] 
Sent: Saturday, September 20, 2003 10:06 AM
To: NYIMI Jose (BMB)
Cc: [EMAIL PROTECTED]
Subject: Re: Split based on length

NYIMI Jose (BMB) wrote:

> I have an other request : "code review" :-)
> Below is my final code.
> I'm sure that you guys will find
> Some better style to write it ...

This code doesn't work with the input you sent me. It dies processing 
the third line of input (i.e. the first one which is not skipped) with 
fatal error "x outside of string" when get_fields calls unpack.

 > sub get_fields{
 >   my($str)[EMAIL PROTECTED];
 >   my $format="$index_len $name_len $way_len";
 >   $format.=" $delim_len $meas_len" x $nof_meas;
 >   unpack($format,$str); # <<<=== it dies here
 > }

 From perldiag manpage: 'x outside of string (F) You had a pack template 
that specified a relative position after the end of the string being 
unpacked. See "pack" in perlfunc.'

Unfortunately I cannot help you with the debugging right now, but I hope 
others who helped you with unpack might be more up to date with your 
code and data.

Still, as you asked about style, I made few mostly cosmetic changes to 
your code. The output is identical, it does exactly the same thing as 
your original code, even reproducing the fatal error, but might be 
somewhat cleaner to work with. I also demonstrated some magic with 
print. So here's my refactored version of your code:

#!/usr/bin/perl

use strict;
use warnings;

#------------------
# Global variables
#------------------

my $group_len =  64;
my $index_len = 'A1';
my $name_len  = 'A6';
my $way_len   = 'A3';
my $delim_len = 'x1';
my $meas_len  = 'A5';
my $nof_meas  =  9;

#-------------
# Subroutines
#-------------

sub split_len {
   my ($str, $len) = @_;
   $str =~ /.{1,$len}/g;
}

sub get_fields {
   my $str = shift || $_;
   my $format = "$index_len $name_len $way_len";
   $format .= " $delim_len $meas_len" x $nof_meas;
   unpack $format, $str; # <- DIES HERE: x outside of string
}

#------
# Main
#------

($,, $\) = ("\t", "\n");

while (<>) {
     next if $. < 3;
     /(\d{2}-\d{2}-\d{4}:\d{3})\s+(\d{2})(.+)/ or die "Bad input\n";
     my ($date, $dummy, $str) = ($1, $2, $3);
     for (split_len $str, $group_len) {
        print $date, get_fields;
     }
}

__END__

I'm sure it's going to get much cleaner in a while, when other people 
change few other things. Ask if you're not sure how the print works here 
but you might first read about "$," and "$\" (the output field separator 
and the output record separator) in perldoc perlvar.

Note that this code still dies on unpack, just like yours, so you still 
have to debug it anyway. Speaking about the unpack, I wouldn't use it 
here. For simple things it's great but with your data the format gets 
quite complicated. Take a look at these CPAN modules:

Parse::FixedLength
Data::FixedFormat
Text::FixedLength
Text::FixedLength::Extra
AnyData
AnyData::Format::Fixed
DBD::AnyData

I'm not sure which one would be best for you, but they all help handle 
fixed-length records data, exactly what you are doing manually.

For example DBD::AnyData (which would probably be an overkill here, but 
is definitely worth knowing about) is a driver for DBI. You can use the 
data like it was a table in an SQL database. It uses SQL::Statement as 
SQL parsing and processing engine.

The AnyData module helps you manipulate data in many formats (fixed 
length, tab delimited, pipe delimited, passwd style, CSV, XML, HTML 
tables, vertical/paragraph text files, etc.). You can use the data like 
it was a %hash or manipulate it with SQL queries with DBI and 
DBD::AnyData. AnyData also makes converting between formats very easy.

Parse::FixedLength is very powerful. You define a parser object, which 
then can parse data. You can subclass Parse::FixedLength and make using 
it in your main program very easy by just calling new method on your 
parser class and feeding the new object with data.

Data::FixedFormat is easier to use and its "count" keyword would be 
probably ideal for your needs here to use instead of get_fields 
subroutine, when you want to extract three different fields and then a 
list of many fields of the same length.

Text::FixedLength is an old-school non-OO module, which haven't been 
updated for five years, but still is very functional, and there's a more 
recent Text::FixedLength::Extra extending its functionality.

Well, I guess TMTOWTDI. If you want to choose the best one for you and 
avoid problems with manual parsing similar data in the future, then go 
to http://search.cpan.org/ find the modules I listed, read their 
description, synopsis, then maybe examples and other parts of the 
manual, and you should more or less know what you like most.

-- 
ZSDC Perl and Systems Security Consulting

**** DISCLAIMER ****

"This e-mail and any attachment thereto may contain information which is confidential 
and/or protected by intellectual property rights and are intended for the sole use of 
the recipient(s) named above. 
Any use of the information contained herein (including, but not limited to, total or 
partial reproduction, communication or distribution in any form) by other persons than 
the designated recipient(s) is prohibited. 
If you have received this e-mail in error, please notify the sender either by 
telephone or by e-mail and delete the material from any computer".

Thank you for your cooperation.

For further information about Proximus mobile phone services please see our website at 
http://www.proximus.be or refer to any Proximus agent.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Split based on length

Reply via email to