RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Muey
> Dan Anderson wrote:
> 
> > 
> > Very true.  But you also need to look at what you're doing. 
>  A spider 
> > that indexes or coallates pages across several sites might need to 
> > slurp up a large number of pages -- which even at a few kilobytes a 
> > piece would be costly on system resources.
> > 
> 
> Ironically this is the one time I could see slurping as not 
> working too. 
>   Has anyone hit the limit for open file descriptors?  I know 
> it is OS 
> dependent and pretty damn high, but on a nice enough system with nice 
> enough app I suspect you could hit it.  At that point you 
> would have to 
> slurp or close a filehandle...
> 
> Dan always comes up with the good discussion topics ;-)...

I try! 
DMuey

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Survey : Max size allowable for slurping files

2004-01-22 Thread Wiggins d'Anconia
Dan Anderson wrote:

Very true.  But you also need to look at what you're doing.  A spider
that indexes or coallates pages across several sites might need to slurp
up a large number of pages -- which even at a few kilobytes a piece
would be costly on system resources.
Ironically this is the one time I could see slurping as not working too. 
 Has anyone hit the limit for open file descriptors?  I know it is OS 
dependent and pretty damn high, but on a nice enough system with nice 
enough app I suspect you could hit it.  At that point you would have to 
slurp or close a filehandle...

Dan always comes up with the good discussion topics ;-)...

http://danconia.org

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Anderson
On Thu, 2004-01-22 at 18:21, Dan Anderson wrote:
> On Thu, 2004-01-22 at 17:59, James Edward Gray II wrote:
> > On Jan 22, 2004, at 4:12 PM, Tim Johnson wrote:
> > 
> > > Here's another argument against slurping:  When you slurp a file all at
> > > once, even if your program isn't using up much of the CPU, on many
> > > machines it will slow down performance considerably if you slurp a 
> > > large
> > > file (large, of course, is still sometimes relative).  If that is the
> > > only thing you are running at the time, it may not make much of a
> > > difference, but it is usually not a good idea to assume that.
> > 
> > The flip side of that argument.  A quote from the earlier posted 
> > article:
> > 
> > "Another major win for slurping over line by line is speed. Perl's IO 
> > system (like many others) is slow. Calling <> for each line requires a 
> > check for the end of line, checks for EOF, copying a line, munging the 
> > internal handle structure, etc. Plenty of work for each line read in. 
> > On the other hand, slurping, if done correctly, will usually involve 
> > only one I/O call and no extra data copying. The same is true for 
> > writing files to disk, and we will cover that as well."  --Uri Guttman
> 
> 
> Just to add my $0.02, while you are likely to see your machine slow to a
> halt if you slurp too big a file, there is no guarantee that the extra
> overhead required for going line by line will be noticed, especially if
> you're doing enough other things on every line.

I just thought of a really good example to add.  Let's say you're
migrating from Database A to Database B.  And, because the SQL dump of
database A does something that breaks standards or doesn't work in
database B (i.e. mySQL's AUTO_INCREMENT), you decide to create a perl
script to transform the SQL

You'd have a large number of operations per line (relative to the cost
of reading in a file line by line), and if -- for instance -- you passed
it around your department and somebody tried using it with a database
which was several gigabytes (or possibly even terabytes if you work at a
data wharehouse), you would be asking for trouble.

On the other hand, somebody mentioned slurping web pages because very
few browsers are going to be set to receive 100 GB web pages.

Very true.  But you also need to look at what you're doing.  A spider
that indexes or coallates pages across several sites might need to slurp
up a large number of pages -- which even at a few kilobytes a piece
would be costly on system resources.

-Dan


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Anderson
On Thu, 2004-01-22 at 17:59, James Edward Gray II wrote:
> On Jan 22, 2004, at 4:12 PM, Tim Johnson wrote:
> 
> > Here's another argument against slurping:  When you slurp a file all at
> > once, even if your program isn't using up much of the CPU, on many
> > machines it will slow down performance considerably if you slurp a 
> > large
> > file (large, of course, is still sometimes relative).  If that is the
> > only thing you are running at the time, it may not make much of a
> > difference, but it is usually not a good idea to assume that.
> 
> The flip side of that argument.  A quote from the earlier posted 
> article:
> 
> "Another major win for slurping over line by line is speed. Perl's IO 
> system (like many others) is slow. Calling <> for each line requires a 
> check for the end of line, checks for EOF, copying a line, munging the 
> internal handle structure, etc. Plenty of work for each line read in. 
> On the other hand, slurping, if done correctly, will usually involve 
> only one I/O call and no extra data copying. The same is true for 
> writing files to disk, and we will cover that as well."  --Uri Guttman


Just to add my $0.02, while you are likely to see your machine slow to a
halt if you slurp too big a file, there is no guarantee that the extra
overhead required for going line by line will be noticed, especially if
you're doing enough other things on every line.

-Dan


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Survey : Max size allowable for slurping files

2004-01-22 Thread James Edward Gray II
On Jan 22, 2004, at 4:12 PM, Tim Johnson wrote:

Here's another argument against slurping:  When you slurp a file all at
once, even if your program isn't using up much of the CPU, on many
machines it will slow down performance considerably if you slurp a 
large
file (large, of course, is still sometimes relative).  If that is the
only thing you are running at the time, it may not make much of a
difference, but it is usually not a good idea to assume that.
The flip side of that argument.  A quote from the earlier posted 
article:

"Another major win for slurping over line by line is speed. Perl's IO 
system (like many others) is slow. Calling <> for each line requires a 
check for the end of line, checks for EOF, copying a line, munging the 
internal handle structure, etc. Plenty of work for each line read in. 
On the other hand, slurping, if done correctly, will usually involve 
only one I/O call and no extra data copying. The same is true for 
writing files to disk, and we will cover that as well."  --Uri Guttman

James

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Muey
> Here's another argument against slurping:  When you slurp a 
> file all at once, even if your program isn't using up much of 
> the CPU, on many machines it will slow down performance 
> considerably if you slurp a large file (large, of course, is 
> still sometimes relative).  If that is the only thing you are 
> running at the time, it may not make much of a difference, 
> but it is usually not a good idea to assume that.

Good argument that's the kind of thign I was looking for. :)

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Muey

> > Good comparison, I never see advice to use no warnigns and 
> no strict 
> > though :)
> 
> I've actually seen it a few times in code, but it's usually surrounded
> by:
> 
> ##
> ##
> #WARNING!!
> ##
> # Warnings / Strict turned off here because you know what 
> you're doing, right?
> 
> :-D

Tee hee hee :)

> 
> -Dan

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Tim Johnson
 
I advise people to use "no warnings qw(uninitialized)" from time to
time, and it usually sparks a backlash of "Don't do that!" emails, but
no one has been able to actually give me a good reason why not.  I think
it's a similar situation.  90% of the time, you can do it with no
problems, but most of the intended audience may not be able to
understand when to say when, so the popular consensus is that it's best
to just tell people not to do it.  

Here's another argument against slurping:  When you slurp a file all at
once, even if your program isn't using up much of the CPU, on many
machines it will slow down performance considerably if you slurp a large
file (large, of course, is still sometimes relative).  If that is the
only thing you are running at the time, it may not make much of a
difference, but it is usually not a good idea to assume that.

-Original Message-
From: Dan Muey [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 22, 2004 1:49 PM
To: Dan Anderson
Cc: Perl Beginners Mailing List
Subject: RE: Survey : Max size allowable for slurping files


Good comparison, I never see advice to use no warnigns and no strict
though :)



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Anderson
On Thu, 2004-01-22 at 16:49, Dan Muey wrote:
> > > But I see a lot of "don't slurp that" and I was hoping for more
> > > clear reasons/situatuions to or not to slurp so people 
> > positn code can have a better idea why a perosn said:
> > > "do(n't) slurp your file here"
> > > 
> > > Basically we need to expalin why more:
> > > 
> > > - Don't slurp this because it's STDIN and it may be huge, 
> > so huge in 
> > > fact it could overload your system.
> > > - If this is an html file you'd probably be safe slurping 
> > it up to ease it's processing.
> > 
> > I think it's like using a "no warnings" or "no strict" pragma 
> > to do some dangerous things because you know what you're 
> > doing.  It's there for people when they get advanced enough 
> > to need it, but it's not a good idea to encourage its use on 
> > a beginners list.
> > 
> 
> Good comparison, I never see advice to use no warnigns and no strict though :)

I've actually seen it a few times in code, but it's usually surrounded
by:

##
##
#WARNING!!
##
# Warnings / Strict turned off here because you know what you're doing,
right?

:-D

-Dan


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Muey
> > But I see a lot of "don't slurp that" and I was hoping for more
> > clear reasons/situatuions to or not to slurp so people 
> positn code can have a better idea why a perosn said:
> > "do(n't) slurp your file here"
> > 
> > Basically we need to expalin why more:
> > 
> > - Don't slurp this because it's STDIN and it may be huge, 
> so huge in 
> > fact it could overload your system.
> > - If this is an html file you'd probably be safe slurping 
> it up to ease it's processing.
> 
> I think it's like using a "no warnings" or "no strict" pragma 
> to do some dangerous things because you know what you're 
> doing.  It's there for people when they get advanced enough 
> to need it, but it's not a good idea to encourage its use on 
> a beginners list.
> 

Good comparison, I never see advice to use no warnigns and no strict though :)

> -Dan
> 
> 

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Anderson
On Thu, 2004-01-22 at 16:32, Dan Muey wrote:
> > On Thu, 2004-01-22 at 16:16, Dan Muey wrote:
> > > > On Thu, 2004-01-22 at 13:18, Dan Muey wrote:
> > > > > There are always comments like "you can slurp the file 
> > as long as 
> > > > > it's not too big" or "becareful not to slurp a really 
> > big file or 
> > > > > you'll be in trouble".
> > > > 
> > > > I'd like to add that some of it depends on swap space.  I've
> > > > slurped well past physical memory and most of it went to 
> > > > swap.  Although the script was significantly lower it still 
> > > > ran.  However, if you get to a certain point your machine -- 
> > > > no matter what OS you are running -- will crash and burn.  Of 
> > > > course, this is *if* you can get to that level. 
> > > > Users of *BSD systems with limit installed know that if your 
> > > > process eats too much memory IT will die and not the system.
> > > 
> > > Good info Dan, I'm surprised more folks aren't adding their .02
> > > since it seems (to me anyway) like people are just as religious 
> > > about slurping as they are strict and warnings.
> > 
> > I think a lot of it is a problem of how exactly to answer.  
> > There will always be situations where slurping is a great 
> > idea and situations where slurping is a horrible idea.  I 
> > wish it were possible to give a better example like, "Use 
> > formula _ to calculate whether or not you can slurp 
> > safely".  But there are just too many variables that change 
> > from computer to computer and program to program then to say anything
> > besides: "If you slurp watch the resources your program is 
> > using and kill it off before it DOSes your computer".
> 
> Yeah it's tough because it is so vague, that's what I was hoping to clarify.
> It's easy to say use strict because 
> 
> But I see a lot of "don't slurp that" and I was hoping for more 
> clear reasons/situatuions to or not to slurp so people positn code can have a better 
> idea why a perosn said:
> "do(n't) slurp your file here"
> 
> Basically we need to expalin why more:
> 
> - Don't slurp this because it's STDIN and it may be huge, so huge in fact it could 
> overload your system.
> - If this is an html file you'd probably be safe slurping it up to ease it's 
> processing.

I think it's like using a "no warnings" or "no strict" pragma to do some
dangerous things because you know what you're doing.  It's there for
people when they get advanced enough to need it, but it's not a good
idea to encourage its use on a beginners list.

-Dan


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Muey
> On Thu, 2004-01-22 at 16:16, Dan Muey wrote:
> > > On Thu, 2004-01-22 at 13:18, Dan Muey wrote:
> > > > There are always comments like "you can slurp the file 
> as long as 
> > > > it's not too big" or "becareful not to slurp a really 
> big file or 
> > > > you'll be in trouble".
> > > 
> > > I'd like to add that some of it depends on swap space.  I've
> > > slurped well past physical memory and most of it went to 
> > > swap.  Although the script was significantly lower it still 
> > > ran.  However, if you get to a certain point your machine -- 
> > > no matter what OS you are running -- will crash and burn.  Of 
> > > course, this is *if* you can get to that level. 
> > > Users of *BSD systems with limit installed know that if your 
> > > process eats too much memory IT will die and not the system.
> > 
> > Good info Dan, I'm surprised more folks aren't adding their .02
> > since it seems (to me anyway) like people are just as religious 
> > about slurping as they are strict and warnings.
> 
> I think a lot of it is a problem of how exactly to answer.  
> There will always be situations where slurping is a great 
> idea and situations where slurping is a horrible idea.  I 
> wish it were possible to give a better example like, "Use 
> formula _ to calculate whether or not you can slurp 
> safely".  But there are just too many variables that change 
> from computer to computer and program to program then to say anything
> besides: "If you slurp watch the resources your program is 
> using and kill it off before it DOSes your computer".

Yeah it's tough because it is so vague, that's what I was hoping to clarify.
It's easy to say use strict because 

But I see a lot of "don't slurp that" and I was hoping for more 
clear reasons/situatuions to or not to slurp so people positn code can have a better 
idea why a perosn said:
"do(n't) slurp your file here"

Basically we need to expalin why more:

- Don't slurp this because it's STDIN and it may be huge, so huge in fact it could 
overload your system.
- If this is an html file you'd probably be safe slurping it up to ease it's 
processing.

Oh well we'll see...
> 
> -Dan
> 
> 

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Anderson
On Thu, 2004-01-22 at 16:16, Dan Muey wrote:
> > On Thu, 2004-01-22 at 13:18, Dan Muey wrote:
> > > There are always comments like "you can slurp the file as
> > > long as it's not too big" or "becareful not to slurp a 
> > > really big file or you'll be in trouble".
> > 
> > I'd like to add that some of it depends on swap space.  I've 
> > slurped well past physical memory and most of it went to 
> > swap.  Although the script was significantly lower it still 
> > ran.  However, if you get to a certain point your machine -- 
> > no matter what OS you are running -- will crash and burn.  Of 
> > course, this is *if* you can get to that level. 
> > Users of *BSD systems with limit installed know that if your 
> > process eats too much memory IT will die and not the system.
> 
> Good info Dan, I'm surprised more folks aren't adding their .02 
> since it seems (to me anyway) like people are just as religious 
> about slurping as they are strict and warnings.

I think a lot of it is a problem of how exactly to answer.  There will
always be situations where slurping is a great idea and situations where
slurping is a horrible idea.  I wish it were possible to give a better
example like, "Use formula _ to calculate whether or not you can
slurp safely".  But there are just too many variables that change from
computer to computer and program to program then to say anything
besides: "If you slurp watch the resources your program is using and
kill it off before it DOSes your computer".

-Dan


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Muey
> On Thu, 2004-01-22 at 13:18, Dan Muey wrote:
> > There are always comments like "you can slurp the file as
> > long as it's not too big" or "becareful not to slurp a 
> > really big file or you'll be in trouble".
> 
> I'd like to add that some of it depends on swap space.  I've 
> slurped well past physical memory and most of it went to 
> swap.  Although the script was significantly lower it still 
> ran.  However, if you get to a certain point your machine -- 
> no matter what OS you are running -- will crash and burn.  Of 
> course, this is *if* you can get to that level. 
> Users of *BSD systems with limit installed know that if your 
> process eats too much memory IT will die and not the system.

Good info Dan, I'm surprised more folks aren't adding their .02 
since it seems (to me anyway) like people are just as religious 
about slurping as they are strict and warnings.

Thnakd
Dan

> 
> -Dan
> 
> 

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Anderson
On Thu, 2004-01-22 at 13:18, Dan Muey wrote:
> There are always comments like "you can slurp the file as 
> long as it's not too big" or "becareful not to slurp a 
> really big file or you'll be in trouble".

I'd like to add that some of it depends on swap space.  I've slurped
well past physical memory and most of it went to swap.  Although the
script was significantly lower it still ran.  However, if you get to a
certain point your machine -- no matter what OS you are running -- will
crash and burn.  Of course, this is *if* you can get to that level. 
Users of *BSD systems with limit installed know that if your process
eats too much memory IT will die and not the system.

-Dan


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




RE: Survey : Max size allowable for slurping files

2004-01-22 Thread Dan Muey
> On Jan 22, 2004, at 12:18 PM, Dan Muey wrote:
> 
> > There are always comments like "you can slurp the file as 
> long as it's 
> > not too big" or "becareful not to slurp a really big file 
> or you'll be 
> > in trouble".
> >
> > So what I'd like to survey is what would be what the safest 
> max size 
> > of a file that one should ever slurp and why?
> > (IE if you have 128 MB of RAM and try to slurp a 768MB 
> file that'd
> > cause issues)
> > (IE if the max file size on your system is 2GB you may 
> have isseus 
> > slurping a 4 GB file.)

I also found this article form the author of File::Slurp ,Uri Guttman
http://www.perl.com/pub/a/2003/11/21/slurp.html

Very informative indeed.

> 
> This question is pretty hardware dependent.  On my Dual G5 2 
> Ghz with 2 

M Dual G5, ohhohohho
I just got a 17" G4 PowerBook which lets me develop Perl stuff 
wherever I go and run it in apache right then and there.

> GB RAM, I don't have to worry too much about what I slurp.  
> That won't 
> be the case for a lot of machines though.  I can even imagine 
> situations where it wouldn't be wise to slurp big files, even if the 
> machine could handle it.
> 
> If I had to come up with a solid "guideline" to tell people, it would 
> probably be don't worry too much about slurping a file that's 
> a fourth 
> of your RAM or less.  I must stress that is a "guideline" 
> though, not a 
> safe rule.
> 
> Generally, my decision process goes like this.
> 
> Do I only need one line at a time?  If yes, don't slurp.
> 
> Could I read a "group of lines" at a time?  (Generally with something 
> like "paragraph mode".)  If yes, go that way.
> 
> Is what I'm doing a lot easier if I slurp the file?  If no, DON'T DO 
> IT!  There's no point.
> 
> If yes, I finally examine if there is a good reason I shouldn't slurp 
> the file?  (Execution hardware not up to it.  Multiple copies of the 
> process will be run in parallel.  Whatever.)
> 
> By this point, if I haven't talked myself out of it, I slurp the file.
> 
> How much you know can be a big factor too.  If you're going 
> to run the 
> script on your machine, once every night as a cron job or as 
> a one shot 
> data munge, you know a lot and should feel pretty safe.  If you're 
> going to upload the script to the CPAN and encourage people to run it 
> everywhere all the time, even on their toaster, try to keep the 
> memory/processor footprint as reasonable as possible, which may rule 
> out slurping.
> 
> I think the important thing to stress is that it's a choice.  
> It often 
> makes things easier, so don't be ashamed to make that choice, when it 
> does and won't hurt anything.  However, be aware that it 
> COULD be a bad 
> choice.  Think it through.
> 

Good info James. Thanks

> James

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: Survey : Max size allowable for slurping files

2004-01-22 Thread James Edward Gray II
On Jan 22, 2004, at 12:18 PM, Dan Muey wrote:

There are always comments like "you can slurp the file as
long as it's not too big" or "becareful not to slurp a
really big file or you'll be in trouble".
So what I'd like to survey is what would be what the safest
max size of a file that one should ever slurp and why?
	(IE if you have 128 MB of RAM and try to slurp a 768MB file that'd 
cause issues)
	(IE if the max file size on your system is 2GB you may have isseus 
slurping a 4 GB file.)
This question is pretty hardware dependent.  On my Dual G5 2 Ghz with 2 
GB RAM, I don't have to worry too much about what I slurp.  That won't 
be the case for a lot of machines though.  I can even imagine 
situations where it wouldn't be wise to slurp big files, even if the 
machine could handle it.

If I had to come up with a solid "guideline" to tell people, it would 
probably be don't worry too much about slurping a file that's a fourth 
of your RAM or less.  I must stress that is a "guideline" though, not a 
safe rule.

Generally, my decision process goes like this.

Do I only need one line at a time?  If yes, don't slurp.

Could I read a "group of lines" at a time?  (Generally with something 
like "paragraph mode".)  If yes, go that way.

Is what I'm doing a lot easier if I slurp the file?  If no, DON'T DO 
IT!  There's no point.

If yes, I finally examine if there is a good reason I shouldn't slurp 
the file?  (Execution hardware not up to it.  Multiple copies of the 
process will be run in parallel.  Whatever.)

By this point, if I haven't talked myself out of it, I slurp the file.

How much you know can be a big factor too.  If you're going to run the 
script on your machine, once every night as a cron job or as a one shot 
data munge, you know a lot and should feel pretty safe.  If you're 
going to upload the script to the CPAN and encourage people to run it 
everywhere all the time, even on their toaster, try to keep the 
memory/processor footprint as reasonable as possible, which may rule 
out slurping.

I think the important thing to stress is that it's a choice.  It often 
makes things easier, so don't be ashamed to make that choice, when it 
does and won't hurt anything.  However, be aware that it COULD be a bad 
choice.  Think it through.

James

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]