Splitting A large data file

2002-10-23 Thread Kipp, James
I am working on a Windows NT box and I don't have the luxury of any file
splitting utilities. We have a data file with fixed length records. I was
wondering the most efficient way of splitting the file into 5 smaller files.
Thought ( Hoping :-) ) some one out there may have done something like this.


Thanks !!


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting A large data file

2002-10-23 Thread Jenda Krynicky
From: "Kipp, James" <[EMAIL PROTECTED]>
> I am working on a Windows NT box and I don't have the luxury of any
> file splitting utilities. We have a data file with fixed length
> records. I was wondering the most efficient way of splitting the file
> into 5 smaller files. Thought ( Hoping :-) ) some one out there may
> have done something like this.

# untested code !!!
# please add error checking !!!
use strict;
my $record_length = ...;
my $num_parts = 5;

my $chunk = 1024 * $record_length; 
# or something else. I just want the $chunk to be a nice number
# yet be sure the chunk contains complete records
# I assume the $chunk will be much smaller than the size of the
# whole file.

my $file_size = -s $filename;
my $chunks_in_part = int($file_size / ($chunk * 5));

open IN, $filename;
binmode(IN);

my $buff;
foreach my $part (1 .. $num_parts) {
open OUT, "> $filename.$part";
binmode(OUT);
for(my $i = 1; $i <= $chunks_in_part ; $i++) {
sysread IN, $buff, $chunk;
syswrite OUT, $chunk;
}
if ($part == $num_parts) { # write the rest to the last file
while (sysread IN, $buff, $chunk) {
syswrite OUT, $chunk;
}
}
close OUT;
}


I think you get the idea. Simply ... read the file in chunks (N*4KB 
at least) that contain whole records and use sysread() and 
syswrite(). 

Jenda
= [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting A large data file

2002-10-23 Thread Todd W

James Kipp wrote:

I am working on a Windows NT box and I don't have the luxury of any file
splitting utilities. We have a data file with fixed length records. I was
wondering the most efficient way of splitting the file into 5 smaller files.
Thought ( Hoping :-) ) some one out there may have done something like this.


Thanks !!


#!/usr/bin/perl -w

use strict;

# call new() with named args found in init() to override defaults
my( $fSpliter ) = Text::FileSplitter->new();

$fSpliter->split();

print( "done!\n" );

package Text::FileSplitter;
use strict;
use IO::File;

sub new {
  my($class, %args) = @_;
  my($self) = bless( { %args }, $class );
  $self->init();
  return( $self );
}

sub init {
  my($self) = shift(); my($filehandles) = [];

  $self->{ file } ||= './splitfile.txt';
  $self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and $1 );
  $self->{ file_count }  ||= 5;
  $self->{ record_length }  ||= 10;

  $self->{ fh } = IO::File->new( "< $self->{ file }" )
or die("open $self->{ file }: $!");

  foreach ( 1 .. $self->{ file_count } ) {
push(
  @{ $filehandles },
  IO::File->new("> $self->{ output_prefix }.$_")
);
  }
  $self->{ ofh } = $filehandles;

}

sub split {
  my($self) = shift(); my($buffer);
  my($counter) = 0;
  while ( sysread $self->{ fh }, $buffer, $self->{ record_length } ) {
$self->{ ofh }[ $counter % $self->{ file_count } ]->print( $buffer );
$counter++;
  }
}

HTH

Todd W.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting A large data file

2002-10-23 Thread Todd W

James Kipp wrote:

I am working on a Windows NT box and I don't have the luxury of any file
splitting utilities. We have a data file with fixed length records. I was
wondering the most efficient way of splitting the file into 5 smaller files.
Thought ( Hoping :-) ) some one out there may have done something like this.


Thanks !!


#!/usr/bin/perl -w

use strict;

# call new() with named args found in init() to override defaults
my( $fSpliter ) = Text::FileSplitter->new();

$fSpliter->split();

print( "done!\n" );

package Text::FileSplitter;
use strict;
use IO::File;

sub new {
  my($class, %args) = @_;
  my($self) = bless( { %args }, $class );
  $self->init();
  return( $self );
}

sub init {
  my($self) = shift(); my($filehandles) = [];

  $self->{ file } ||= './splitfile.txt';
  $self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and $1 );
  $self->{ file_count }  ||= 5;
  $self->{ record_length }  ||= 10;

  $self->{ fh } = IO::File->new( "< $self->{ file }" )
or die("open $self->{ file }: $!");

  foreach ( 1 .. $self->{ file_count } ) {
push(
  @{ $filehandles },
  IO::File->new("> $self->{ output_prefix }.$_")
);
  }
  $self->{ ofh } = $filehandles;

}

sub split {
  my($self) = shift(); my($buffer);
  my($counter) = 0;
  while ( sysread $self->{ fh }, $buffer, $self->{ record_length } ) {
$self->{ ofh }[ $counter % $self->{ file_count } ]->print( $buffer );
$counter++;
  }
}

HTH

Todd W.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting A large data file

2002-10-23 Thread Todd W

James Kipp wrote:

I am working on a Windows NT box and I don't have the luxury of any file
splitting utilities. We have a data file with fixed length records. I was
wondering the most efficient way of splitting the file into 5 smaller files.
Thought ( Hoping :-) ) some one out there may have done something like this.


Thanks !!


#!/usr/bin/perl -w

use strict;

# call new() with named args found in init() to override defaults
my( $fSpliter ) = Text::FileSplitter->new();

$fSpliter->split();

print( "done!\n" );

package Text::FileSplitter;
use strict;
use IO::File;

sub new {
  my($class, %args) = @_;
  my($self) = bless( { %args }, $class );
  $self->init();
  return( $self );
}

sub init {
  my($self) = shift(); my($filehandles) = [];

  $self->{ file } ||= './splitfile.txt';
  $self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and $1 );
  $self->{ file_count }  ||= 5;
  $self->{ record_length }  ||= 10;

  $self->{ fh } = IO::File->new( "< $self->{ file }" )
or die("open $self->{ file }: $!");

  foreach ( 1 .. $self->{ file_count } ) {
push(
  @{ $filehandles },
  IO::File->new("> $self->{ output_prefix }.$_")
);
  }
  $self->{ ofh } = $filehandles;

}

sub split {
  my($self) = shift(); my($buffer);
  my($counter) = 0;
  while ( sysread $self->{ fh }, $buffer, $self->{ record_length } ) {
$self->{ ofh }[ $counter % $self->{ file_count } ]->print( $buffer );
$counter++;
  }
}

HTH

Todd W.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting A large data file

2002-10-23 Thread Todd W

James Kipp wrote:

I am working on a Windows NT box and I don't have the luxury of any file
splitting utilities. We have a data file with fixed length records. I was
wondering the most efficient way of splitting the file into 5 smaller files.
Thought ( Hoping :-) ) some one out there may have done something like this.


Thanks !!


#!/usr/bin/perl -w

use strict;

# call new() with named args found in init() to override defaults
my( $fSpliter ) = Text::FileSplitter->new();

$fSpliter->split();

print( "done!\n" );

package Text::FileSplitter;
use strict;
use IO::File;

sub new {
  my($class, %args) = @_;
  my($self) = bless( { %args }, $class );
  $self->init();
  return( $self );
}

sub init {
  my($self) = shift(); my($filehandles) = [];

  $self->{ file } ||= './splitfile.txt';
  $self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and $1 );
  $self->{ file_count }  ||= 5;
  $self->{ record_length }  ||= 10;

  $self->{ fh } = IO::File->new( "< $self->{ file }" )
or die("open $self->{ file }: $!");

  foreach ( 1 .. $self->{ file_count } ) {
push(
  @{ $filehandles },
  IO::File->new("> $self->{ output_prefix }.$_")
);
  }
  $self->{ ofh } = $filehandles;

}

sub split {
  my($self) = shift(); my($buffer);
  my($counter) = 0;
  while ( sysread $self->{ fh }, $buffer, $self->{ record_length } ) {
$self->{ ofh }[ $counter % $self->{ file_count } ]->print( $buffer );
$counter++;
  }
}

HTH

Todd W.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Splitting A large data file

2002-10-23 Thread Javeed SAR
How to combine it back after split???

Regards 
j@veed





-Original Message-
From: Todd W [mailto:trw3@;uakron.edu]
Sent: Thursday, October 24, 2002 10:36 AM
To: [EMAIL PROTECTED]; James Kipp
Subject: Re: Splitting A large data file



James Kipp wrote:
> I am working on a Windows NT box and I don't have the luxury of any file
> splitting utilities. We have a data file with fixed length records. I was
> wondering the most efficient way of splitting the file into 5 smaller
files.
> Thought ( Hoping :-) ) some one out there may have done something like
this.
> 
> 
> Thanks !!

#!/usr/bin/perl -w

use strict;

# call new() with named args found in init() to override defaults
my( $fSpliter ) = Text::FileSplitter->new();

$fSpliter->split();

print( "done!\n" );

package Text::FileSplitter;
use strict;
use IO::File;

sub new {
   my($class, %args) = @_;
   my($self) = bless( { %args }, $class );
   $self->init();
   return( $self );
}

sub init {
   my($self) = shift(); my($filehandles) = [];

   $self->{ file } ||= './splitfile.txt';
   $self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and $1 );
   $self->{ file_count }  ||= 5;
   $self->{ record_length }  ||= 10;

   $self->{ fh } = IO::File->new( "< $self->{ file }" )
 or die("open $self->{ file }: $!");

   foreach ( 1 .. $self->{ file_count } ) {
 push(
   @{ $filehandles },
   IO::File->new("> $self->{ output_prefix }.$_")
 );
   }
   $self->{ ofh } = $filehandles;

}

sub split {
   my($self) = shift(); my($buffer);
   my($counter) = 0;
   while ( sysread $self->{ fh }, $buffer, $self->{ record_length } ) {
 $self->{ ofh }[ $counter % $self->{ file_count } ]->print( $buffer );
 $counter++;
   }
}

HTH

Todd W.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Splitting A large data file

2002-10-23 Thread Tanton Gibbs
perl -pe "" filename1 filename2 filename3 ... > catted_file
- Original Message -
From: "Javeed SAR" <[EMAIL PROTECTED]>
To: "Todd W" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "James Kipp"
<[EMAIL PROTECTED]>
Sent: Thursday, October 24, 2002 1:15 AM
Subject: RE: Splitting A large data file


> How to combine it back after split???
>
> Regards
> j@veed
>
>
>
>
>
> -Original Message-
> From: Todd W [mailto:trw3@;uakron.edu]
> Sent: Thursday, October 24, 2002 10:36 AM
> To: [EMAIL PROTECTED]; James Kipp
> Subject: Re: Splitting A large data file
>
>
>
> James Kipp wrote:
> > I am working on a Windows NT box and I don't have the luxury of any file
> > splitting utilities. We have a data file with fixed length records. I
was
> > wondering the most efficient way of splitting the file into 5 smaller
> files.
> > Thought ( Hoping :-) ) some one out there may have done something like
> this.
> >
> >
> > Thanks !!
>
> #!/usr/bin/perl -w
>
> use strict;
>
> # call new() with named args found in init() to override defaults
> my( $fSpliter ) = Text::FileSplitter->new();
>
> $fSpliter->split();
>
> print( "done!\n" );
>
> package Text::FileSplitter;
> use strict;
> use IO::File;
>
> sub new {
>my($class, %args) = @_;
>my($self) = bless( { %args }, $class );
>$self->init();
>return( $self );
> }
>
> sub init {
>my($self) = shift(); my($filehandles) = [];
>
>$self->{ file } ||= './splitfile.txt';
>$self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and $1 );
>$self->{ file_count }  ||= 5;
>$self->{ record_length }  ||= 10;
>
>$self->{ fh } = IO::File->new( "< $self->{ file }" )
>  or die("open $self->{ file }: $!");
>
>foreach ( 1 .. $self->{ file_count } ) {
>  push(
>@{ $filehandles },
>IO::File->new("> $self->{ output_prefix }.$_")
>  );
>}
>$self->{ ofh } = $filehandles;
>
> }
>
> sub split {
>my($self) = shift(); my($buffer);
>my($counter) = 0;
>while ( sysread $self->{ fh }, $buffer, $self->{ record_length } ) {
>  $self->{ ofh }[ $counter % $self->{ file_count } ]->print( $buffer );
>  $counter++;
>}
> }
>
> HTH
>
> Todd W.
>
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting A large data file

2002-10-23 Thread Todd W


Javeed Sar wrote:
>

How to combine it back after split???



perl -e 'print <>' splitfile.1 splitfile.2 splitfile.3 splitfile.4 
splitfile.5 > splitfile.new.txt

note the ordering is different, because the program below sends the next 
record to the next filehandle in a circle. The above one liner dumps the 
contents of the files sequentially. But if your datafile implementation 
is sound that shouldnt matter.

Todd W.


James Kipp wrote:


I am working on a Windows NT box and I don't have the luxury of any file
splitting utilities. We have a data file with fixed length records. I was
wondering the most efficient way of splitting the file into 5 smaller


files.


Thought ( Hoping :-) ) some one out there may have done something like


this.



Thanks !!



#!/usr/bin/perl -w

use strict;

# call new() with named args found in init() to override defaults
my( $fSpliter ) = Text::FileSplitter->new();

$fSpliter->split();

print( "done!\n" );

package Text::FileSplitter;
use strict;
use IO::File;

sub new {
   my($class, %args) = @_;
   my($self) = bless( { %args }, $class );
   $self->init();
   return( $self );
}

sub init {
   my($self) = shift(); my($filehandles) = [];

   $self->{ file } ||= './splitfile.txt';
   $self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and $1 );
   $self->{ file_count }  ||= 5;
   $self->{ record_length }  ||= 10;

   $self->{ fh } = IO::File->new( "< $self->{ file }" )
 or die("open $self->{ file }: $!");

   foreach ( 1 .. $self->{ file_count } ) {
 push(
   @{ $filehandles },
   IO::File->new("> $self->{ output_prefix }.$_")
 );
   }
   $self->{ ofh } = $filehandles;

}

sub split {
   my($self) = shift(); my($buffer);
   my($counter) = 0;
   while ( sysread $self->{ fh }, $buffer, $self->{ record_length } ) {
 $self->{ ofh }[ $counter % $self->{ file_count } ]->print( $buffer );
 $counter++;
   }
}




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting A large data file

2002-10-23 Thread Tanton Gibbs
Also, there is a CPAN utility for doing this

http://search.cpan.org/author/SDAGUE/ppt-0.12/bin/split
- Original Message -
From: "Todd W" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, October 24, 2002 1:36 AM
Subject: Re: Splitting A large data file


>
>
> Javeed Sar wrote:
>  >
> > How to combine it back after split???
> >
>
> perl -e 'print <>' splitfile.1 splitfile.2 splitfile.3 splitfile.4
> splitfile.5 > splitfile.new.txt
>
> note the ordering is different, because the program below sends the next
> record to the next filehandle in a circle. The above one liner dumps the
> contents of the files sequentially. But if your datafile implementation
> is sound that shouldnt matter.
>
> Todd W.
>
> >
> > James Kipp wrote:
> >
> >>I am working on a Windows NT box and I don't have the luxury of any file
> >>splitting utilities. We have a data file with fixed length records. I
was
> >>wondering the most efficient way of splitting the file into 5 smaller
> >
> > files.
> >
> >>Thought ( Hoping :-) ) some one out there may have done something like
> >
> > this.
> >
> >>
> >>Thanks !!
> >
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > # call new() with named args found in init() to override defaults
> > my( $fSpliter ) = Text::FileSplitter->new();
> >
> > $fSpliter->split();
> >
> > print( "done!\n" );
> >
> > package Text::FileSplitter;
> > use strict;
> > use IO::File;
> >
> > sub new {
> >my($class, %args) = @_;
> >my($self) = bless( { %args }, $class );
> >$self->init();
> >return( $self );
> > }
> >
> > sub init {
> >my($self) = shift(); my($filehandles) = [];
> >
> >$self->{ file } ||= './splitfile.txt';
> >$self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and $1 );
> >$self->{ file_count }  ||= 5;
> >$self->{ record_length }  ||= 10;
> >
> >$self->{ fh } = IO::File->new( "< $self->{ file }" )
> >  or die("open $self->{ file }: $!");
> >
> >foreach ( 1 .. $self->{ file_count } ) {
> >  push(
> >@{ $filehandles },
> >IO::File->new("> $self->{ output_prefix }.$_")
> >  );
> >}
> >$self->{ ofh } = $filehandles;
> >
> > }
> >
> > sub split {
> >my($self) = shift(); my($buffer);
> >my($counter) = 0;
> >while ( sysread $self->{ fh }, $buffer, $self->{ record_length } ) {
> >  $self->{ ofh }[ $counter % $self->{ file_count } ]->print(
$buffer );
> >  $counter++;
> >}
> > }
> >
>
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting A large data file

2002-10-23 Thread Todd W


Tanton Gibbs wrote:

Also, there is a CPAN utility for doing this



Yeah, I forgot to mention again that Im just playing around. Always do a 
a search at http://search.cpan.org/ before you design your own solution 
because chances are that someone has already solved your problem. Other 
than for academic reasons, its (almost) always better to use a solution 
from CPAN, because chances are the module on CPAN is better debugged and 
documented.




Javeed Sar wrote:



How to combine it back after split???



perl -e 'print <>' splitfile.1 splitfile.2 splitfile.3 splitfile.4
splitfile.5 > splitfile.new.txt





Todd W.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Splitting A large data file

2002-10-23 Thread Javeed SAR
i split one perl script, it got split ,i combined it , it's not working???
Though the siza are same.
what is the problem.

Regards 
j@veed





-Original Message-
From: Tanton Gibbs [mailto:thgibbs@;deltafarms.com]
Sent: Thursday, October 24, 2002 11:12 AM
To: [EMAIL PROTECTED]
Subject: Re: Splitting A large data file


Also, there is a CPAN utility for doing this

http://search.cpan.org/author/SDAGUE/ppt-0.12/bin/split
- Original Message -
From: "Todd W" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, October 24, 2002 1:36 AM
Subject: Re: Splitting A large data file


>
>
> Javeed Sar wrote:
>  >
> > How to combine it back after split???
> >
>
> perl -e 'print <>' splitfile.1 splitfile.2 splitfile.3 splitfile.4
> splitfile.5 > splitfile.new.txt
>
> note the ordering is different, because the program below sends the next
> record to the next filehandle in a circle. The above one liner dumps the
> contents of the files sequentially. But if your datafile implementation
> is sound that shouldnt matter.
>
> Todd W.
>
> >
> > James Kipp wrote:
> >
> >>I am working on a Windows NT box and I don't have the luxury of any file
> >>splitting utilities. We have a data file with fixed length records. I
was
> >>wondering the most efficient way of splitting the file into 5 smaller
> >
> > files.
> >
> >>Thought ( Hoping :-) ) some one out there may have done something like
> >
> > this.
> >
> >>
> >>Thanks !!
> >
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > # call new() with named args found in init() to override defaults
> > my( $fSpliter ) = Text::FileSplitter->new();
> >
> > $fSpliter->split();
> >
> > print( "done!\n" );
> >
> > package Text::FileSplitter;
> > use strict;
> > use IO::File;
> >
> > sub new {
> >my($class, %args) = @_;
> >my($self) = bless( { %args }, $class );
> >$self->init();
> >return( $self );
> > }
> >
> > sub init {
> >my($self) = shift(); my($filehandles) = [];
> >
> >$self->{ file } ||= './splitfile.txt';
> >$self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and $1 );
> >$self->{ file_count }  ||= 5;
> >$self->{ record_length }  ||= 10;
> >
> >$self->{ fh } = IO::File->new( "< $self->{ file }" )
> >  or die("open $self->{ file }: $!");
> >
> >foreach ( 1 .. $self->{ file_count } ) {
> >  push(
> >@{ $filehandles },
> >IO::File->new("> $self->{ output_prefix }.$_")
> >  );
> >}
> >$self->{ ofh } = $filehandles;
> >
> > }
> >
> > sub split {
> >my($self) = shift(); my($buffer);
> >my($counter) = 0;
> >while ( sysread $self->{ fh }, $buffer, $self->{ record_length } ) {
> >  $self->{ ofh }[ $counter % $self->{ file_count } ]->print(
$buffer );
> >  $counter++;
> >}
> > }
> >
>
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Splitting A large data file

2002-10-23 Thread Tanton Gibbs
What happens if you diff the file

diff file1 file2
- Original Message -
From: "Javeed SAR" <[EMAIL PROTECTED]>
To: "Tanton Gibbs" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, October 24, 2002 2:18 AM
Subject: RE: Splitting A large data file


> i split one perl script, it got split ,i combined it , it's not working???
> Though the siza are same.
> what is the problem.
>
> Regards
> j@veed
>
>
>
>
>
> -Original Message-
> From: Tanton Gibbs [mailto:thgibbs@;deltafarms.com]
> Sent: Thursday, October 24, 2002 11:12 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Splitting A large data file
>
>
> Also, there is a CPAN utility for doing this
>
> http://search.cpan.org/author/SDAGUE/ppt-0.12/bin/split
> - Original Message -
> From: "Todd W" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Thursday, October 24, 2002 1:36 AM
> Subject: Re: Splitting A large data file
>
>
> >
> >
> > Javeed Sar wrote:
> >  >
> > > How to combine it back after split???
> > >
> >
> > perl -e 'print <>' splitfile.1 splitfile.2 splitfile.3 splitfile.4
> > splitfile.5 > splitfile.new.txt
> >
> > note the ordering is different, because the program below sends the next
> > record to the next filehandle in a circle. The above one liner dumps the
> > contents of the files sequentially. But if your datafile implementation
> > is sound that shouldnt matter.
> >
> > Todd W.
> >
> > >
> > > James Kipp wrote:
> > >
> > >>I am working on a Windows NT box and I don't have the luxury of any
file
> > >>splitting utilities. We have a data file with fixed length records. I
> was
> > >>wondering the most efficient way of splitting the file into 5 smaller
> > >
> > > files.
> > >
> > >>Thought ( Hoping :-) ) some one out there may have done something like
> > >
> > > this.
> > >
> > >>
> > >>Thanks !!
> > >
> > >
> > > #!/usr/bin/perl -w
> > >
> > > use strict;
> > >
> > > # call new() with named args found in init() to override defaults
> > > my( $fSpliter ) = Text::FileSplitter->new();
> > >
> > > $fSpliter->split();
> > >
> > > print( "done!\n" );
> > >
> > > package Text::FileSplitter;
> > > use strict;
> > > use IO::File;
> > >
> > > sub new {
> > >my($class, %args) = @_;
> > >my($self) = bless( { %args }, $class );
> > >$self->init();
> > >return( $self );
> > > }
> > >
> > > sub init {
> > >my($self) = shift(); my($filehandles) = [];
> > >
> > >$self->{ file } ||= './splitfile.txt';
> > >$self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and
$1 );
> > >$self->{ file_count }  ||= 5;
> > >$self->{ record_length }  ||= 10;
> > >
> > >$self->{ fh } = IO::File->new( "< $self->{ file }" )
> > >  or die("open $self->{ file }: $!");
> > >
> > >foreach ( 1 .. $self->{ file_count } ) {
> > >  push(
> > >@{ $filehandles },
> > >IO::File->new("> $self->{ output_prefix }.$_")
> > >  );
> > >}
> > >$self->{ ofh } = $filehandles;
> > >
> > > }
> > >
> > > sub split {
> > >my($self) = shift(); my($buffer);
> > >my($counter) = 0;
> > >while ( sysread $self->{ fh }, $buffer, $self->{ record_length } )
{
> > >  $self->{ ofh }[ $counter % $self->{ file_count } ]->print(
> $buffer );
> > >  $counter++;
> > >}
> > > }
> > >
> >
> >
> > --
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> >
>
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Splitting A large data file

2002-10-24 Thread Jenda Krynicky
> How to combine it back after split???
> 
> Regards 
> j@veed

copy /B file1.dat + file2.dat + file3.dat + file4.dat + file5.dat 
file.dat

Jenda
= [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting A large data file

2002-10-24 Thread Todd W


Javeed Sar wrote:

i split one perl script, it got split ,i combined it , it's not working???
Though the siza are same.
what is the problem.



Mak sure you completely read the posts. When you asked how to join them 
back together I said the following would work:

perl -e 'print <>' splitfile.1 splitfile.2 splitfile.3 splitfile.4 
splitfile.5 > splitfile.new.txt

but I also said this:

note the ordering is different, because the program below (snipped for 
this post) sends the next record to the next filehandle in a circle. The 
above one liner dumps the contents of the files sequentially. But if 
your datafile implementation is sound that shouldnt matter.

Hence your result file was the same size, but not the "same file".

If you need to get the same file back out and you used the 
Text::Splitfile package I posted, you need to do the reverse of what it 
does. Create an array of the filehandles, loop over them in a circle, 
and print each line to your one big datafile.

Todd W.



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Splitting A large data file

2002-10-25 Thread Kipp, James
Thanks. I have been away the last couple of days. Question: this is not a
binary file, so is binmode still needed ?

> -Original Message-
> From: Jenda Krynicky [mailto:Jenda@;Krynicky.cz]
> Sent: Wednesday, October 23, 2002 5:00 PM
> To: [EMAIL PROTECTED]
> Subject: Re: Splitting A large data file
> 
> 
> From: "Kipp, James" <[EMAIL PROTECTED]>
> > I am working on a Windows NT box and I don't have the luxury of any
> > file splitting utilities. We have a data file with fixed length
> > records. I was wondering the most efficient way of 
> splitting the file
> > into 5 smaller files. Thought ( Hoping :-) ) some one out there may
> > have done something like this.
> 
> # untested code !!!
> # please add error checking !!!
> use strict;
> my $record_length = ...;
> my $num_parts = 5;
> 
> my $chunk = 1024 * $record_length; 
>   # or something else. I just want the $chunk to be a nice number
>   # yet be sure the chunk contains complete records
>   # I assume the $chunk will be much smaller than the size of the
>   # whole file.
> 
> my $file_size = -s $filename;
> my $chunks_in_part = int($file_size / ($chunk * 5));
> 
> open IN, $filename;
> binmode(IN);
> 
> my $buff;
> foreach my $part (1 .. $num_parts) {
>   open OUT, "> $filename.$part";
>   binmode(OUT);
>   for(my $i = 1; $i <= $chunks_in_part ; $i++) {
>   sysread IN, $buff, $chunk;
>   syswrite OUT, $chunk;
>   }
>   if ($part == $num_parts) { # write the rest to the last file
>   while (sysread IN, $buff, $chunk) {
>   syswrite OUT, $chunk;
>   }
>   }
>   close OUT;
> }
> 
> 
> I think you get the idea. Simply ... read the file in chunks (N*4KB 
> at least) that contain whole records and use sysread() and 
> syswrite(). 
> 
> Jenda
> = [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =
> When it comes to wine, women and song, wizards are allowed 
> to get drunk and croon as much as they like.
>   -- Terry Pratchett in Sourcery
> 
> 
> -- 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Splitting A large data file

2002-10-25 Thread Kipp, James
My thoughts exactly. I could spend a whole day figuring it out, but the
chances that one of you fine perlers already did it is good !!

> -Original Message-
> From: Todd W [mailto:trw3@;uakron.edu]
> Sent: Thursday, October 24, 2002 1:48 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Splitting A large data file
>
> Yeah, I forgot to mention again that Im just playing around. 
> Always do a 
> a search at http://search.cpan.org/ before you design your 
> own solution 
> because chances are that someone has already solved your 
> problem. Other 
> than for academic reasons, its (almost) always better to use 
> a solution 
> from CPAN, because chances are the module on CPAN is better 
> debugged and 
> documented.
> 
>
> 


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Splitting A large data file

2002-10-25 Thread Kipp, James
This is an interesting approach. Of course my Perl OOP skills have
atrophied, so I have a few questions if you don't mind. 
> 
> #!/usr/bin/perl -w
> 
> use strict;
> 
> # call new() with named args found in init() to override defaults
> my( $fSpliter ) = Text::FileSplitter->new();

Is Text:: a pre existing class (from CPAN or something) or are we
originating this class?
If this is our own new class, I should have $PERLIB\text\FileSplitter.pm ?

Override defaults like this? 
my( $fSpliter ) = Text::FileSplitter->new( file=>'split.txt',
record_length=>50 );

Thanks 
Jim


> 
> $fSpliter->split();
> 
> print( "done!\n" );


> 
> package Text::FileSplitter;
> use strict;
> use IO::File;
> 
> sub new {
>my($class, %args) = @_;
>my($self) = bless( { %args }, $class );
>$self->init();
>return( $self );
> }
> 
> sub init {
>my($self) = shift(); my($filehandles) = [];
> 
>$self->{ file } ||= './splitfile.txt';
>$self->{ output_prefix } ||= ( ($self->{ file } =~ 
> /(\w+)/) and $1 );
>$self->{ file_count }  ||= 5;
>$self->{ record_length }  ||= 10;
> 
>$self->{ fh } = IO::File->new( "< $self->{ file }" )
>  or die("open $self->{ file }: $!");
> 
>foreach ( 1 .. $self->{ file_count } ) {
>  push(
>@{ $filehandles },
>IO::File->new("> $self->{ output_prefix }.$_")
>  );
>}
>$self->{ ofh } = $filehandles;
> 
> }
> 
> sub split {
>my($self) = shift(); my($buffer);
>my($counter) = 0;
>while ( sysread $self->{ fh }, $buffer, $self->{ 
> record_length } ) {
>  $self->{ ofh }[ $counter % $self->{ file_count } 
> ]->print( $buffer );
>  $counter++;
>}
> }
> 
> HTH
> 
> Todd W.
>
> 


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Splitting A large data file

2002-10-25 Thread Todd Wade

- Original Message -
From: "Kipp, James" <[EMAIL PROTECTED]>
To: "'Todd W'" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Friday, October 25, 2002 12:13 PM
Subject: RE: Splitting A large data file


> This is an interesting approach. Of course my Perl OOP skills have
> atrophied, so I have a few questions if you don't mind.
> >
> > #!/usr/bin/perl -w
> >
> > use strict;
> >
> > # call new() with named args found in init() to override defaults
> > my( $fSpliter ) = Text::FileSplitter->new();
>
> Is Text:: a pre existing class (from CPAN or something) or are we
> originating this class?
> If this is our own new class, I should have $PERLIB\text\FileSplitter.pm ?

The class name is just something I made up and I wrote the entire package
myself. Usually what I do for general purpose modules I write is find the
path to a directory called /site_perl in the @INC array. On my system its
called /usr/lib/perl5/site_perl. I then put a directory called "My" in that
directory. Then I put all the custom extension modules I write in there.

There are a couple problems with this code, the largest one being if you run
it from a directory that you do not have write permissions to, it will
generate a ton of warnings about writing to an unopened filehandle. Thats
because when we opened the output files for writing, we didnt check to see
if the file actually opened.

Make sure you read the inline comments below

> Override defaults like this?
> my( $fSpliter ) = Text::FileSplitter->new( file=>'split.txt',
> record_length=>50 );

Yeah you got it. Below is what the driver file would look like:

#!/usr/bin/perl -w
use strict;
use My::Text::FileSplitter;

my($FSplitter) = My::Text::FileSplitter->new(
file => 'split.txt', # override the default filename
splitfile.txt
output_prefix => 'split',  # this would be the default if you
omitted it
file_count   => 10, # split the file into 10 smaller files
record_length => 1024, # the size in bytes of each record
);

$FSplitter->split(); # actually split the file

print( "Done!\n" ); # all done

Everything below here is in the file called
/usr/lib/perl5/site_perl/My/Text/FileSplitter.pm

package My::Text::FileSplitter;
#   ^^^ note how I changed this. the rest is the same.
use strict;
use IO::File;

sub new {
   my($class, %args) = @_;
   my($self) = bless( { %args }, $class );
   $self->init();
   return( $self );
}

sub init {
   my($self) = shift(); my($filehandles) = [];

   $self->{ file } ||= './splitfile.txt';
   $self->{ output_prefix } ||= ( ($self->{ file } =~ /(\w+)/) and $1 );
   $self->{ file_count }  ||= 5;
   $self->{ record_length }  ||= 10;

   $self->{ fh } = IO::File->new( "< $self->{ file }" ) or die("open
$self->{ file }: $!");

   foreach ( 1 .. $self->{ file_count } ) {
 push(
   @{ $filehandles },
   IO::File->new("> $self->{ output_prefix }.$_")
 );
   }
   $self->{ ofh } = $filehandles;
}

sub split {
   my($self) = shift(); my($buffer);
   my($counter) = 0;
   while ( sysread $self->{ fh }, $buffer, $self->{ record_length } ) {
 $self->{ ofh }[ $counter % $self->{ file_count } ]->print( $buffer );
 $counter++;
   }
}

1;


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]