Re: 60 hours divided by 60 is 2 minutes?
What Rob Cozens, David Vaughan and Jan Schenkel wrote was of course much better than what I wrote in my script. I thought I needed to use "repeat with i = 2 to the number of lines..." instead of "repeat for each", because in a repeat for each loop you cannot see the previous line. But of course you can in the previous loop! I didn't think about that. Thanks. ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: 60 hours divided by 60 is 2 minutes?
Any thoughts on how this is possible and what we can learn from it when making programs? Hi Terry, Before dealing specifically with 1 @ 60 hours vs 60 @ 2 minutes, I need to know if you are actually deleting each duplicate line on the fly? Have you tried building a new list instead: functon purgeDuplicates @textData put empty into newData put numToChar(30) into lastLine -- any char not in the first line of textData repeat for each line thisLine in textData if thisLine = lastLine then next repeat put thisLine&return after newData put thisLine into lastLine end repeat return newData -- or write as a command and "put newData into textData" end purgeDuplicates I'd be curious to know what algorithm you used and what times the above handler produces. Other things to look at: 1. Is it possible you are maxed out in actual RAM and spending a lot of time reading from/writing to virtual memory? 2. Are you passing the 55K lines of text by value or reference? 3. Have you tried writing your handler inline with the handler that reads in the data so it needn't be passed at all? Eg: put get URL (whatever) into textData put empty into newData put numToChar(30) into lastLine -- any char not in the first line repeat for each line thisLine in textData if thisLine = lastLine then next repeat put thisLine&return after newData put thisLine into lastLine end repeat put newData into textData instead of put get URL (whatever) into textData put purgeDuplicates(textData) into textData (although if textData is passed by reference, the impact of item three is negligible). -- Rob Cozens CCW, Serendipity Software Company http://www.oenolog.com/who.htm "And I, which was two fooles, do so grow three; Who are a little wise, the best fooles bee." from "The Triple Foole" by John Donne (1572-1631) ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: 60 hours divided by 60 is 2 minutes?
Hi all, I have a 5MB file with about 55 lines that need to be processed by a script. A simple script that deletes a line if the previous line has the same contents. That takes more than 60 hours to complete. So I thought I divide the file into smaller files of about one 60th of the total number of lines. But instead of the expected hour of processing time, it took 2 minutes for each file to complete. I understand processes are faster with less data in memory, but I never would have thought the difference would be this big. Any thoughts on how this is possible and what we can learn from it when making programs? Terry My first guess would be that with full file he get into memory swapping which is a speed killer. Rev loads all you stack data in memory after all. You can test this by setting your virtual memory to be just 1 mb over the physical RAM. Robert ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: 60 hours divided by 60 is 2 minutes?
David Vaughan wrote: >> I have a 5MB file with about 55 lines that need to be processed by >> a >> script. A simple script that deletes a line if the previous line has >> the >> same contents. That takes more than 60 hours to complete. So I thought >> I >> divide the file into smaller files of about one 60th of the total >> number of >> lines. But instead of the expected hour of processing time, it took 2 >> minutes for each file to complete. > > Terry > > I am a bit puzzled by your result in the first place. I generated > 55 lines with random data which had some chance of duplication in > the next line. I then processed it to remove duplicates. The latter > task took a whole four seconds. Not two minutes and not 60 hours; for > the whole file, not for one sixtieth. Were you using "repeat for each"? Also, when it comes to adding or deleting, working with arrays is much faster than with large chunks. Remember that you can use arays and chunks interhangeably with the split and combine commands. -- Richard Gaskin Fourth World Media Corporation Custom Software and Web Development for All Major Platforms Developer of WebMerge 2.0: Publish any database on any site ___ [EMAIL PROTECTED] http://www.FourthWorld.com Tel: 323-225-3717 AIM: FourthWorldInc ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: 60 hours divided by 60 is 2 minutes?
--- MultiCopy Rotterdam-Zuid <[EMAIL PROTECTED]> wrote: > Hi all, > > I have a 5MB file with about 55 lines that need > to be processed by a > script. A simple script that deletes a line if the > previous line has the > same contents. That takes more than 60 hours to > complete. So I thought I > divide the file into smaller files of about one 60th > of the total number of > lines. But instead of the expected hour of > processing time, it took 2 > minutes for each file to complete. > > I understand processes are faster with less data in > memory, but I never > would have thought the difference would be this big. > > Any thoughts on how this is possible and what we can > learn from it when > making programs? > > Terry > Hi Terry, Though in extreme cases it might have to do with the OS swapping the memory to disk at an incredible rate, I'm more inclined to believe that it might have something to do with the algorithm. Off the top of my head, I'd process it with: function ReadUniqueLinesFromFile pFile put URL pFile into tInput put empty into tPrevLine repeat for each line tLine of tInput if tLine <> tPrevLine then put tLine & return after tOutput put tLine into tPrevLine end if end repeat delete char -1 of tOutput return tOutput end ReadUniqueLinesFromFile And that should work pretty quickly. Hope this helped, Jan Schenkel. = "As we grow older, we grow both wiser and more foolish at the same time." (La Rochefoucauld) __ Do you Yahoo!? Y! Web Hosting - Let the expert host your web site http://webhosting.yahoo.com/ ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
Re: 60 hours divided by 60 is 2 minutes?
On Friday, Oct 25, 2002, at 21:35 Australia/Sydney, MultiCopy Rotterdam-Zuid wrote: Hi all, I have a 5MB file with about 55 lines that need to be processed by a script. A simple script that deletes a line if the previous line has the same contents. That takes more than 60 hours to complete. So I thought I divide the file into smaller files of about one 60th of the total number of lines. But instead of the expected hour of processing time, it took 2 minutes for each file to complete. Terry I am a bit puzzled by your result in the first place. I generated 55 lines with random data which had some chance of duplication in the next line. I then processed it to remove duplicates. The latter task took a whole four seconds. Not two minutes and not 60 hours; for the whole file, not for one sixtieth. Were you using "repeat for each"? regards David I understand processes are faster with less data in memory, but I never would have thought the difference would be this big. Any thoughts on how this is possible and what we can learn from it when making programs? Terry ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution
60 hours divided by 60 is 2 minutes?
Hi all, I have a 5MB file with about 55 lines that need to be processed by a script. A simple script that deletes a line if the previous line has the same contents. That takes more than 60 hours to complete. So I thought I divide the file into smaller files of about one 60th of the total number of lines. But instead of the expected hour of processing time, it took 2 minutes for each file to complete. I understand processes are faster with less data in memory, but I never would have thought the difference would be this big. Any thoughts on how this is possible and what we can learn from it when making programs? Terry ___ use-revolution mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/use-revolution