Re: 60 hours divided by 60 is 2 minutes?

2002-10-28 Thread Terry Vogelaar
What Rob Cozens, David Vaughan and Jan Schenkel wrote was of course 
much better than what I wrote in my script. I thought I needed to use 
"repeat with i = 2 to the number of lines..." instead of "repeat for 
each", because in a repeat for each loop you cannot see the previous 
line. But of course you can in the previous loop! I didn't think about 
that. Thanks.


___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: 60 hours divided by 60 is 2 minutes?

2002-10-28 Thread Rob Cozens
Any thoughts on how this is possible and what we can learn from it when
making programs?


Hi Terry,

Before dealing specifically with 1 @ 60 hours vs 60 @ 2 minutes, I 
need to know if you are actually deleting each duplicate line on the 
fly?  Have you tried building a new list instead:

functon purgeDuplicates @textData
   put empty into newData
   put numToChar(30) into lastLine -- any char not in the first line 
of textData
   repeat for each line thisLine in textData
  if thisLine = lastLine then next repeat
  put thisLine&return after newData
  put thisLine into lastLine
   end repeat
   return newData -- or write as a command and "put newData into textData"
end purgeDuplicates

I'd be curious to know what algorithm you used and what times the 
above handler produces.

Other things to look at:

1.  Is it possible you are maxed out in actual RAM and spending a lot 
of time reading from/writing to virtual memory?

2.  Are you passing the 55K lines of text by value or reference?

3.  Have you tried writing your handler inline with the handler that 
reads in the data so it needn't be passed at all?

Eg:
put get URL (whatever) into textData
put empty into newData
put numToChar(30) into lastLine -- any char not in the first line
repeat for each line thisLine in textData
  if thisLine = lastLine then next repeat
  put thisLine&return after newData
  put thisLine into lastLine
   end repeat
   put newData into textData

instead of

put get URL (whatever) into textData
put purgeDuplicates(textData) into textData

(although if textData is passed by reference, the impact of item 
three is negligible).
--

Rob Cozens
CCW, Serendipity Software Company
http://www.oenolog.com/who.htm

"And I, which was two fooles, do so grow three;
Who are a little wise, the best fooles bee."

from "The Triple Foole" by John Donne (1572-1631)
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: 60 hours divided by 60 is 2 minutes?

2002-10-28 Thread Robert Brenstein
Hi all,

I have a 5MB file with about 55 lines that need to be processed by a
script. A simple script that deletes a line if the previous line has the
same contents. That takes more than 60 hours to complete. So I thought I
divide the file into smaller files of about one 60th of the total number of
lines. But instead of the expected hour of processing time, it took 2
minutes for each file to complete.

I understand processes are faster with less data in memory, but I never
would have thought the difference would be this big.

Any thoughts on how this is possible and what we can learn from it when
making programs?

Terry



My first guess would be that with full file he get into memory 
swapping which is a speed killer. Rev loads all you stack data in 
memory after all. You can test this by setting your virtual memory to 
be just 1 mb over the physical RAM.

Robert
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution


Re: 60 hours divided by 60 is 2 minutes?

2002-10-28 Thread Richard Gaskin
David Vaughan wrote:

>> I have a 5MB file with about 55 lines that need to be processed by
>> a
>> script. A simple script that deletes a line if the previous line has
>> the
>> same contents. That takes more than 60 hours to complete. So I thought
>> I
>> divide the file into smaller files of about one 60th of the total
>> number of
>> lines. But instead of the expected hour of processing time, it took 2
>> minutes for each file to complete.
> 
> Terry
> 
> I am a bit puzzled by your result in the first place. I generated
> 55 lines with random data which had some chance of duplication in
> the next line. I then processed it to remove duplicates. The latter
> task took a whole four seconds. Not two minutes and not 60 hours; for
> the whole file, not for one sixtieth. Were you using "repeat for each"?

Also, when it comes to adding or deleting, working with arrays is much
faster than with large chunks.  Remember that you can use arays and chunks
interhangeably with the split and combine commands.
 
-- 
 Richard Gaskin 
 Fourth World Media Corporation
 Custom Software and Web Development for All Major Platforms
 Developer of WebMerge 2.0: Publish any database on any site
 ___
 [EMAIL PROTECTED]   http://www.FourthWorld.com
 Tel: 323-225-3717   AIM: FourthWorldInc

___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution



Re: 60 hours divided by 60 is 2 minutes?

2002-10-28 Thread Jan Schenkel
--- MultiCopy Rotterdam-Zuid <[EMAIL PROTECTED]>
wrote:
> Hi all,
> 
> I have a 5MB file with about 55 lines that need
> to be processed by a
> script. A simple script that deletes a line if the
> previous line has the
> same contents. That takes more than 60 hours to
> complete. So I thought I
> divide the file into smaller files of about one 60th
> of the total number of
> lines. But instead of the expected hour of
> processing time, it took 2
> minutes for each file to complete.
> 
> I understand processes are faster with less data in
> memory, but I never
> would have thought the difference would be this big.
> 
> Any thoughts on how this is possible and what we can
> learn from it when
> making programs?
> 
> Terry
> 

Hi Terry,

Though in extreme cases it might have to do with the
OS swapping the memory to disk at an incredible rate,
I'm more inclined to believe that it might have
something to do with the algorithm.
Off the top of my head, I'd process it with:

function ReadUniqueLinesFromFile pFile
  put URL pFile into tInput
  put empty into tPrevLine
  repeat for each line tLine of tInput
if tLine <> tPrevLine then
  put tLine & return after tOutput
  put tLine into tPrevLine
end if
  end repeat
  delete char -1 of tOutput
  return tOutput
end ReadUniqueLinesFromFile

And that should work pretty quickly.

Hope this helped,

Jan Schenkel.

=
"As we grow older, we grow both wiser and more foolish at the same time."  (La 
Rochefoucauld)

__
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/
___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution



Re: 60 hours divided by 60 is 2 minutes?

2002-10-28 Thread David Vaughan

On Friday, Oct 25, 2002, at 21:35 Australia/Sydney, MultiCopy 
Rotterdam-Zuid wrote:

Hi all,

I have a 5MB file with about 55 lines that need to be processed by 
a
script. A simple script that deletes a line if the previous line has 
the
same contents. That takes more than 60 hours to complete. So I thought 
I
divide the file into smaller files of about one 60th of the total 
number of
lines. But instead of the expected hour of processing time, it took 2
minutes for each file to complete.

Terry

I am a bit puzzled by your result in the first place. I generated 
55 lines with random data which had some chance of duplication in 
the next line. I then processed it to remove duplicates. The latter 
task took a whole four seconds. Not two minutes and not 60 hours; for 
the whole file, not for one sixtieth. Were you using "repeat for each"?

regards
David

I understand processes are faster with less data in memory, but I never
would have thought the difference would be this big.

Any thoughts on how this is possible and what we can learn from it when
making programs?

Terry

___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution



___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution



60 hours divided by 60 is 2 minutes?

2002-10-28 Thread MultiCopy Rotterdam-Zuid
Hi all,

I have a 5MB file with about 55 lines that need to be processed by a
script. A simple script that deletes a line if the previous line has the
same contents. That takes more than 60 hours to complete. So I thought I
divide the file into smaller files of about one 60th of the total number of
lines. But instead of the expected hour of processing time, it took 2
minutes for each file to complete.

I understand processes are faster with less data in memory, but I never
would have thought the difference would be this big.

Any thoughts on how this is possible and what we can learn from it when
making programs?

Terry

___
use-revolution mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/use-revolution