Re: FW: byte-reversing nightmare

2003-09-21 Thread Michael D Schleif
$Bill Luebkert <[EMAIL PROTECTED]> [2003:09:21:20:33:05-0700] scribed:
> [EMAIL PROTECTED] wrote:
> 
> > It seems to me that the fastest way to solve this
> > problem would be with a transform, i.e. tr or y, operator.
> > That way all the computatiomal work could be done by
> > perl at compile time. It might be interesting to write
> > a perl script to generate the two strings used by the tr
> > operator
> 
> My test showed the array to be fastest and hash and tr about the same.
> The hash and array have the initial overhead of creating the table and
> the tr needs to do the same (but in a prior run so it can be hard-coded
> to avoid the slow eval).  I tried to use the same binary to integer code
> in all cases to even the overhead of converting from binary data.


Yes, of course, an array lookup is going to be the fastest, especially
faster than ad hoc conversion.

However, didn't you forget to re-pack your results?  With this method,
first you un-pack, second you do the lookup, and finally you need to
re-pack each byte, one-by-one -- the infrastructure is not optimal.

With this:

   pack('B*', unpack( 'b*', $b ));

the conversion is internalized.  Coupled with read-ing large chunks at a
time, due to [un]pack B|b self-throttling one byte-at-a-time, the
infrastructure is minimized.  Isn't it?

   while (read IN, my $b, $chunk)

What do you think?

-- 
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know.  The more I know, the more I know I don't know . . .
--


pgp0.pgp
Description: PGP signature


Re: FW: byte-reversing nightmare

2003-09-21 Thread $Bill Luebkert
[EMAIL PROTECTED] wrote:

> It seems to me that the fastest way to solve this
> problem would be with a transform, i.e. tr or y, operator.
> That way all the computatiomal work could be done by
> perl at compile time. It might be interesting to write
> a perl script to generate the two strings used by the tr
> operator

My test showed the array to be fastest and hash and tr about the same.
The hash and array have the initial overhead of creating the table and
the tr needs to do the same (but in a prior run so it can be hard-coded
to avoid the slow eval).  I tried to use the same binary to integer code
in all cases to even the overhead of converting from binary data.

# My CPU time results: A1=8.82, A2=44.80, A3=6.42, A4=8.45

use strict;
use Bit::Vector;

my %hash = ();
my @array = ();

# method one - use a 256 entry hash

for (0 .. 255) {
my $b = 0;
for (my $ii = 0; $ii < 8; $ii++) {
if ($_ & (1 << $ii)) {
$b |= (0x80 >> $ii);
}
}
$hash{$_} = $b;
$array[$_] = $b;
}

use Benchmark;

# create a binary string for data

my $bindat = ''; $bindat .= pack 'C', $_ for 0 .. 255;

my $bin;
my $res;
my $count = 1;
my $results = timethese($count, {
  'A1' => sub {
foreach (0 .. 255) {
$bin = unpack 'C', substr $bindat, $_, 1;
$res = $hash{$bin};
}
  },
  'A2' => sub {
foreach (0 .. 255) {
$res = 0;
$bin = unpack 'C', substr $bindat, $_, 1;
for (my $ii = 0; $ii < 8; $ii++) {
if ($bin & (1 << $ii)) {
$res |= (0x80 >> $ii);
}
}
}
  },
  'A3' => sub {
foreach (0 .. 255) {
$bin = unpack 'C', substr $bindat, $_, 1;
$res = $array[$bin];
}
  },
  'A4' => sub {
foreach (0 .. 255) {
$res = unpack 'C', substr $bindat, $_, 1;
$res =~ 
tr/0x00-0xff/\x00\x80\x40\xc0\x20\xa0\x60\xe0\x10\x90\x50\xd0\x30\xb0\x70\xf0\x08\x88\x48\xc8\x28\xa8\x68\xe8\x18\x98\x58\xd8\x38\xb8\x78\xf8\x04\x84\x44\xc4\x24\xa4\x64\xe4\x14\x94\x54\xd4\x34\xb4\x74\xf4\x0c\x8c\x4c\xcc\x2c\xac\x6c\xec\x1c\x9c\x5c\xdc\x3c\xbc\x7c\xfc\x02\x82\x42\xc2\x22\xa2\x62\xe2\x12\x92\x52\xd2\x32\xb2\x72\xf2\x0a\x8a\x4a\xca\x2a\xaa\x6a\xea\x1a\x9a\x5a\xda\x3a\xba\x7a\xfa\x06\x86\x46\xc6\x26\xa6\x66\xe6\x16\x96\x56\xd6\x36\xb6\x76\xf6\x0e\x8e\x4e\xce\x2e\xae\x6e\xee\x1e\x9e\x5e\xde\x3e\xbe\x7e\xfe\x01\x81\x41\xc1\x21\xa1\x61\xe1\x11\x91\x51\xd1\x31\xb1\x71\xf1\x09\x89\x49\xc9\x29\xa9\x69\xe9\x19\x99\x59\xd9\x39\xb9\x79\xf9\x05\x85\x45\xc5\x25\xa5\x65\xe5\x15\x95\x55\xd5\x35\xb5\x75\xf5\x0d\x8d\x4d\xcd\x2d\xad\x6d\xed\x1d\x9d\x5d\xdd\x3d\xbd\x7d\xfd\x03\x83\x43\xc3\x23\xa3\x63\xe3\x13\x93\x53\xd3\x33\xb3\x73\xf3\x0b\x8b\x4b\xcb\x2b\xab\x6b\xeb\x1b\x9b\x5b\xdb\x3b\xbb\x7b\xfb\x07\x87\x47\xc7\x27\xa7\x67\xe7\x17\x97\x57\xd7\x37\xb7\x77\xf7\x0f\x
8f\x4f\xcf\x2f\xaf\x6f\xef\x1f\x9f\x5f\xdf\x3f\xbf\x7f\xff/;
}
  },
});

__END__


-- 
  ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
 (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
  / ) /--<  o // //  Castle of Medieval Myth & Magic http://www.todbe.com/
-/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: FW: byte-reversing nightmare

2003-09-21 Thread cjolley
> Ed Chester wrote:
>
>>>Hi all.
>>>I have a problemette with the following scripting issue.
>>>Its not primarily a technical problem... I quickly wrote a
>>>script that
>>>did exactly what I want, if possibly inelegantly. The problem
>>>I have is that a colleague wrote, (get this) a _visual basic_
>>>, program that does the same thing about 100 times
>>>faster. I am devastated.
>>>The truly sad thing is that I'm too focussed on the first
>>>solution to work out a faster way of doing it.
>>>
>>>Here's the problem:
>>>Got a large file (>1Mb) containing binary data.
>>>Bytes are in the correct order.
>>>Each byte is in low-endian order (lsb first).
>>>Want write an output file with bytes in the same order, but with
>>> each byte big-endianed.
>>>

It seems to me that the fastest way to solve this
problem would be with a transform, i.e. tr or y, operator.
That way all the computatiomal work could be done by
perl at compile time. It might be interesting to write
a perl script to generate the two strings used by the tr
operator

>>>Sounds simple. So I broke out my old friend Bit::Vector, and
>>>here's what I did (surrounding stuffs snipped):
>>>
>>>while (read LL, my $b, 1) {
>>>my $h = unpack "H*", $b;
>>>my $b_h = Bit::Vector -> new_Hex(8,$h) -> to_Bin;
>>>my $r = reverse($b_h);
>>>print OF pack "H*", Bit::Vector -> new_Bin(8,$r) -> to_Hex;
>>>}
>>>
>>>LL is my filehandle, $b is current byte, OF is output filehandle,
>>>$r the byte in the order I want it for output.
>>>
>>>Now I realise that the cause of my woe is that I'm using
>>>bit::vector to do something that its really overqualified
>>>for, and also that
>>>I convert to and from binary and hex unecessarily. I can probably
>>>skip packing and unpacking, probably not use Bit::Vector, and
>>>have a much faster solution - but nothing I've tried has
>>>worked. I also
>>>failed spectacularly to use Bit::Vector's Reverse method.
>>>Thinking it was clever, I also tried doing things with
>>>bitwise & and | to
>>>map bytes into the other endian form. No joy.
>>>
>>>SO - given I know somebody out there has done this before,
>>>and that you all are laughing at the type changes going on in
>>>the code above, can somebody take me out of my misery and
>>>suggest improvements so I don't have to spend my weekend
>>>feeling like I've allowed VB to do a
>>>better job than Perl?
>
> Try this:
>
> use strict;
>
> # create a byte conversion hash - one time so cheap
>
> for (0 .. 255) {
>   my $b = 0;
>   for (my $ii = 0; $ii < 8; $ii++) {
>   if ($_ & (1 << $ii)) {
>   $b |= (0x80 >> $ii);
>   }
>   }
>   $hash{$_} = $b;
> }
>
> # open files
>
> open LL 
> binmode LL;
> open FF 
> binmode FF;
>
> # convert each byte using hash
>
> while (read LL, my $b, 1) {
>   print OF $hash{$_};
> }
>
> close LL;
> close FF;
>
>
> --
>   ,-/-  __  _  _ $Bill LuebkertMailto:[EMAIL PROTECTED]
>  (_/   /  )// //   DBE CollectiblesMailto:[EMAIL PROTECTED]
>   / ) /--<  o // //  Castle of Medieval Myth & Magic
> http://www.todbe.com/
> -/-' /___/_<_http://dbecoll.tripod.com/ (My Perl/Lakers stuff)
>
> ___
> Perl-Win32-Users mailing list
> [EMAIL PROTECTED]
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: FW: byte-reversing nightmare

2003-09-19 Thread Michael D Schleif
Michael D Schleif <[EMAIL PROTECTED]> [2003:09:19:22:58:53-0500] scribed:
> Ed Chester <[EMAIL PROTECTED]> [2003:09:19:11:15:38+0100] scribed:
> 
> 
> > > Here's the problem: 
> > > Got a large file (>1Mb) containing binary data. 
> > > Bytes are in the correct order. 
> > > Each byte is in low-endian order (lsb first).
> > > Want write an output file with bytes in the same order, but with 
> > >  each byte big-endianed. 
> > > 
> > > Sounds simple. So I broke out my old friend Bit::Vector, and 
> > > here's what I did (surrounding stuffs snipped):
> > > 
> > > while (read LL, my $b, 1) {
> > > my $h = unpack "H*", $b;
> > > my $b_h = Bit::Vector -> new_Hex(8,$h) -> to_Bin;
> > > my $r = reverse($b_h);
> > > print OF pack "H*", Bit::Vector -> new_Bin(8,$r) -> to_Hex;
> > > }
> > > 
> > > LL is my filehandle, $b is current byte, OF is output filehandle, 
> > > $r the byte in the order I want it for output. 
> 
> 
> Is this what you want?
> 
>   BEGIN  
> #! /usr/bin/perl -w
> 
> use strict;
> 
> # Determine the endianness of your system
> my $is_little_endian = unpack( 'c', pack( 's', 1 ) );
> print $is_little_endian ? "little" : "BIG", " endian\n";
> 
> # Try different size chunk's
> my $chunk = 32768;
> my $infile = shift || "/home/mds/binary";
> my $outfile = "./test.out";
> -s $outfile && unlink $outfile;
> 
> open IN, "$infile"
> or die "\n\tERROR: Cannot read \'$infile\' : $? : $! \n\n";
> binmode(IN);
> open OUT, ">>$outfile"
> or die "\n\tERROR: Cannot create \'$outfile\' : $? : $! \n\n";
> while (read IN, my $b, $chunk) {
>print OUT pack('H*', unpack( 'h*', $b ))
> }
> close IN;
> close OUT;
> 
> exit 0;
>    END   
> 
> # ls -al
> total 2088
> drwxr-sr-x3 mds  mds  4096 Sep 19 22:52 .
> drwxr-sr-x   12 mds  mds  4096 Sep 19 17:03 ..
> -rwxr-xr-x1 mds  mds   1037480 Jul 18 09:51 test.in
> -rw-r--r--1 mds  mds   1037480 Sep 19 22:52 test.out
> 
> # time ~/endian.plx ./test.in
> little endian
> 
> real0m0.338s
> user0m0.130s
> sys 0m0.030s

Or, if I misunderstood, and you do not want to reverse nybbles within
each byte; rather, you want to reverse the order of bits in each byte,
then replace that pack/unpack with this:

   print OUT pack('B*', unpack( 'b*', $b ));

I cannot know what files others are testing; but, I get this:

# time ~/endian.plx ./test.in
little endian

real0m0.911s
user0m0.430s
sys 0m0.050s


hth

-- 
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know.  The more I know, the more I know I don't know . . .
--


pgp0.pgp
Description: PGP signature


Re: FW: byte-reversing nightmare

2003-09-19 Thread Michael D Schleif
Ed Chester <[EMAIL PROTECTED]> [2003:09:19:11:15:38+0100] scribed:


> > Here's the problem: 
> > Got a large file (>1Mb) containing binary data. 
> > Bytes are in the correct order. 
> > Each byte is in low-endian order (lsb first).
> > Want write an output file with bytes in the same order, but with 
> >  each byte big-endianed. 
> > 
> > Sounds simple. So I broke out my old friend Bit::Vector, and 
> > here's what I did (surrounding stuffs snipped):
> > 
> > while (read LL, my $b, 1) {
> > my $h = unpack "H*", $b;
> > my $b_h = Bit::Vector -> new_Hex(8,$h) -> to_Bin;
> > my $r = reverse($b_h);
> > print OF pack "H*", Bit::Vector -> new_Bin(8,$r) -> to_Hex;
> > }
> > 
> > LL is my filehandle, $b is current byte, OF is output filehandle, 
> > $r the byte in the order I want it for output. 


Is this what you want?

  BEGIN  
#! /usr/bin/perl -w

use strict;

# Determine the endianness of your system
my $is_little_endian = unpack( 'c', pack( 's', 1 ) );
print $is_little_endian ? "little" : "BIG", " endian\n";

# Try different size chunk's
my $chunk = 32768;
my $infile = shift || "/home/mds/binary";
my $outfile = "./test.out";
-s $outfile && unlink $outfile;

open IN, "$infile"
or die "\n\tERROR: Cannot read \'$infile\' : $? : $! \n\n";
binmode(IN);
open OUT, ">>$outfile"
or die "\n\tERROR: Cannot create \'$outfile\' : $? : $! \n\n";
while (read IN, my $b, $chunk) {
   print OUT pack('H*', unpack( 'h*', $b ))
}
close IN;
close OUT;

exit 0;
   END   

# ls -al
total 2088
drwxr-sr-x3 mds  mds  4096 Sep 19 22:52 .
drwxr-sr-x   12 mds  mds  4096 Sep 19 17:03 ..
-rwxr-xr-x1 mds  mds   1037480 Jul 18 09:51 test.in
-rw-r--r--1 mds  mds   1037480 Sep 19 22:52 test.out

# time ~/endian.plx ./test.in
little endian

real0m0.338s
user0m0.130s
sys 0m0.030s


hth

-- 
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know.  The more I know, the more I know I don't know . . .
--


pgp0.pgp
Description: PGP signature