Re: Help! Perl Substitution

2004-02-19 Thread Jeff Westman
WC -Sx- Jones <[EMAIL PROTECTED]>
wrote:

> Jeff Westman wrote:
> 
> > When I ran this 
> > 
> >$ perl -ne 's/|^NEWLINE^|^/\n/g;print' loadFile
> 
> The program loads the ENTIRE loadfile and then splits characters
> at 
> whitespace "between" characters and then prints every character
> followed 
> by a newline.
> 
> So, how big is loadfile?

28M.

Isn't this pretty inefficient?  Perl die-hards using perl s'///'
over the traditional 'sed', but sed doesn't load the entire file
into memory, hence the name, Steam EDitor.  Sure, once in memory it
runs fast, but it gave me quite unexpected results.

Thanks to all who helped :)


-Jeff

__
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Help! Perl Substitution

2004-02-19 Thread WC -Sx- Jones
Jeff Westman wrote:

When I ran this 

   $ perl -ne 's/|^NEWLINE^|^/\n/g;print' loadFile
The program loads the ENTIRE loadfile and then splits characters at 
whitespace "between" characters and then prints every character followed 
by a newline.

So, how big is loadfile?

__Sx___
print (pack('c5',(41*2),sqrt(7056),(unpack('c',H)-2),oct(115),10));
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: Help! Perl Substitution

2004-02-19 Thread Jeff Westman
Paul Johnson <[EMAIL PROTECTED]> wrote:

> On Thu, Feb 19, 2004 at 04:36:55PM -0800, david wrote:
> > Jeff Westman wrote:
> > 
> > > I'm trying to help out another developer with a mini-Perl
> script.
> > > He has a file that contains one very long line, about 28M in
> size.
> > > He needs to do a replacement of all occurances of
> > > 
> > > |^NEWLINE^|^
> > > 
> > > to a literal newline (HPUX, 0x0a or "\n").
> > > 
> > > When I ran this
> > > 
> > >$ perl -ne 's/|^NEWLINE^|^/\n/g;print' loadFile
> > > 
> > > it choked and gave me
> > > 
> > > Out of memory during "large" request for 1073745920 bytes,
> total
> > > sbrk() is 604078796 bytes at -e line 1, <> line 1.
> > > 
> > 
> > if your system do not have memory to read in large chunk, you
> can easily 
> > break the chunks up by reading smaller chunks to process. for
> example, the 
> > follwoing reads 4k a time and process them:
> > 
> > [panda]# perl -ne 'BEGIN{$/=\10} s/\|\^NEWLINE\^\|\^/\n/g;
> print' loadFile
> 
> The trouble with this approach is that you will miss any
> separators
> which are split.  Your example actually reads 10 bytes at a time,
> but
> using $/ is the right idea:
> 
>   perl -ple 'BEGIN { $/="|^NEWLINE^|^" }' loadFile
> 
> This reads "lines" separated by "|^NEWLINE^|^", chomps away the
> separator and prints the "lines" followed by a newline.
> 
> perldoc perlrun
> perldoc perlvar

Paul -- a million thanks -- your solution worked absolutely
perfectly!!


-Jeff






__
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Help! Perl Substitution

2004-02-19 Thread david
Paul Johnson wrote:

>> 
>> [panda]# perl -ne 'BEGIN{$/=\10} s/\|\^NEWLINE\^\|\^/\n/g; print'
>> [loadFile
> 
> The trouble with this approach is that you will miss any separators
> which are split.  Your example actually reads 10 bytes at a time, but
> using $/ is the right idea:
> 
>   perl -ple 'BEGIN { $/="|^NEWLINE^|^" }' loadFile
> 
> This reads "lines" separated by "|^NEWLINE^|^", chomps away the
> separator and prints the "lines" followed by a newline.
> 

thanks for pointing out the obvious. as soon as i hit "Send" i realize both 
problems you mentioned. OP also noticed this bug. although your solution 
should work, it's not generalized enough. why? because depends on how the 
data file is organized and how many newline marks are there in the file, it 
can eat up a lot of memory as well. consider if OP's loadFile only have a 
few newline marks and they are all near the end of the file or separated in 
large chunks. this will cause Perl to keep reading and reading. given this 
small and unlikely event, i think your solution is perfect.

david
-- 
sub'_{print"@_ ";* \ = * __ ,\ & \}
sub'__{print"@_ ";* \ = * ___ ,\ & \}
sub'___{print"@_ ";* \ = *  ,\ & \}
sub'{print"@_,\n"}&{_+Just}(another)->(Perl)->(Hacker)

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Help! Perl Substitution

2004-02-19 Thread Paul Johnson
On Thu, Feb 19, 2004 at 04:36:55PM -0800, david wrote:
> Jeff Westman wrote:
> 
> > I'm trying to help out another developer with a mini-Perl script.
> > He has a file that contains one very long line, about 28M in size.
> > He needs to do a replacement of all occurances of
> > 
> > |^NEWLINE^|^
> > 
> > to a literal newline (HPUX, 0x0a or "\n").
> > 
> > When I ran this
> > 
> >$ perl -ne 's/|^NEWLINE^|^/\n/g;print' loadFile
> > 
> > it choked and gave me
> > 
> > Out of memory during "large" request for 1073745920 bytes, total
> > sbrk() is 604078796 bytes at -e line 1, <> line 1.
> > 
> 
> if your system do not have memory to read in large chunk, you can easily 
> break the chunks up by reading smaller chunks to process. for example, the 
> follwoing reads 4k a time and process them:
> 
> [panda]# perl -ne 'BEGIN{$/=\10} s/\|\^NEWLINE\^\|\^/\n/g; print' loadFile

The trouble with this approach is that you will miss any separators
which are split.  Your example actually reads 10 bytes at a time, but
using $/ is the right idea:

  perl -ple 'BEGIN { $/="|^NEWLINE^|^" }' loadFile

This reads "lines" separated by "|^NEWLINE^|^", chomps away the
separator and prints the "lines" followed by a newline.

perldoc perlrun
perldoc perlvar

-- 
Paul Johnson - [EMAIL PROTECTED]
http://www.pjcj.net

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Help! Perl Substitution

2004-02-19 Thread david
Jeff Westman wrote:

> I'm trying to help out another developer with a mini-Perl script.
> He has a file that contains one very long line, about 28M in size.
> He needs to do a replacement of all occurances of
> 
> |^NEWLINE^|^
> 
> to a literal newline (HPUX, 0x0a or "\n").
> 
> When I ran this
> 
>$ perl -ne 's/|^NEWLINE^|^/\n/g;print' loadFile
> 
> it choked and gave me
> 
> Out of memory during "large" request for 1073745920 bytes, total
> sbrk() is 604078796 bytes at -e line 1, <> line 1.
> 

if your system do not have memory to read in large chunk, you can easily 
break the chunks up by reading smaller chunks to process. for example, the 
follwoing reads 4k a time and process them:

[panda]# perl -ne 'BEGIN{$/=\10} s/\|\^NEWLINE\^\|\^/\n/g; print' loadFile

david
-- 
sub'_{print"@_ ";* \ = * __ ,\ & \}
sub'__{print"@_ ";* \ = * ___ ,\ & \}
sub'___{print"@_ ";* \ = *  ,\ & \}
sub'{print"@_,\n"}&{_+Just}(another)->(Perl)->(Hacker)

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Help! Perl Substitution

2004-02-19 Thread Jeff Westman
I'm trying to help out another developer with a mini-Perl script. 
He has a file that contains one very long line, about 28M in size. 
He needs to do a replacement of all occurances of

|^NEWLINE^|^

to a literal newline (HPUX, 0x0a or "\n").

When I ran this 

   $ perl -ne 's/|^NEWLINE^|^/\n/g;print' loadFile

it choked and gave me

Out of memory during "large" request for 1073745920 bytes, total
sbrk() is 604078796 bytes at -e line 1, <> line 1.

Any suggestions would be extremely helpful.


Thanks

Jeff

__
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]