RFE: Read and Analyze Giant Files
Jeanne: Slick solution! Think I'll try it. . (read and analyze data in 300 meg file with over 3 million lines) Welp, it sounds good but it doesn't work., Here's my code #!/usr/local/bin/mc on startup put 0 into the_counter repeat read from stdin until "mystic_mouse" if the result is not empty then add 1 to the_counter else put the_counter exit repeat end if end repeat end startup I got, counter is zero, program exits with "broken pipe". I am cat'ing the input into it with a pipe. ?? Himalayan Academy Publications Sannyasin Sivakatirswami [EMAIL PROTECTED] www.HinduismToday.com, www.HimalayanAcademy.com, www.Gurudeva.org, www.hindu.org ___ metacard mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/metacard
Re: Read and Analyze Giant Files
Sannyasin Sivakatirswami a écrit : > > We are trying to use Rev (or MC) to analyze a web site access log that is 3 million >lines long, a 300 meg (or more) file. > > If I try a shell script (interpreted) or pascal program (compiled) each runs in >about 2 minutes on this file but an xTalk script takes a very long time, maybe it >hangs forever? shell and pascal can read the file one line at a time and process the >line but not sure how to do it in mc. > > Here's the code > > shell, 2 lines > > #!/bin/sh > fgrep mystic_mouse | wc -l > > pascal, 16 lines > > Program detect; > {$H+} > Var > buffer: string; > result: integer; > Begin {main} > buffer := ''; > result := 0; > While (Not(eof)) Do > Begin > Readln(buffer); > If (pos('mystic_mouse', buffer) > 0) Then > inc(result); > End; {file} > Writeln(result); > End. {program} > > metacard, 13 lines > > #!/usr/local/bin/mc > on startup > put empty into the_message > put 0 into the_counter > read from stdin until empty > put it into the_message > repeat for each line this_line in the_message > if (this_line contains "mystic_mouse") then > put the_counter + 1 into the_counter > end if > end repeat > put the_counter > end startup > > is there a more efficient way in Transcript to do this? > > Well, to try it out, I chopped down the log to a half million lines, then I got >these times: shell script, 22 seconds. Pascal 7 seconds. metacard 13 minutes and >still running ?? > > I would like never to have to say that "Metacard/Revolution can't do this" > > Before I tackle it further I was thinking someone had already invented this wheel. > > Thanks! > Himalayan Academy Publications > Sannyasin Sivakatirswami > Editor's Assistant/Production Manager > [EMAIL PROTECTED] > www.HinduismToday.com, www.HimalayanAcademy.com, > www.Gurudeva.org, www.hindu.org Try something alike : > on mouseup > put "1" into startread > open file thefile for read > read from file thefile until eof > put the num of lines of it in endtoread > close file thefile > repeat while startread < endtoread > open file thefile for read > read from file thefile at startread for 99 lines > ... > do what you need with it > ... > close file thefile > add 100 to startread > end repeat > end mouseup -- Cordialement, Pierre Sahores Inspection académique de Seine-Saint-Denis. Applications et bases de données WEB et VPN Qualifier et produire l'avantage compétitif ___ metacard mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/metacard
Re: Read and Analyze Giant Files
Problem here most likely is that you are trying to read the entire 300MB file into memory at once. Unless you have an extra 300MB block of RAM floating around, this will start beating on virtual memory pretty hard (and increasingly slow). You'll notice that your non-MetaCard script reads one line at a time- I would suggest doing the same. If you are just looking for occurences of a single string, you could also try using the grep command-line tool. Type "man grep" into the Terminal for syntax. HTH, Brian
Re: Read and Analyze Giant Files
--On Thursday, November 07, 2002 18:42:42 -1000 Sannyasin Sivakatirswami <[EMAIL PROTECTED]> wrote: metacard, 13 lines # !/usr/local/bin/mc on startup put empty into the_message put 0 into the_counter read from stdin until empty put it into the_message repeat for each line this_line in the_message if (this_line contains "mystic_mouse") then put the_counter + 1 into the_counter end if end repeat put the_counter end startup is there a more efficient way in Transcript to do this? In Transcript maybe not but you can try metatalk;-). First of all, I don't understand how you feed the file to the script in your script (read from stdin until empty?). Since the file is so huge you should probably use something like: open file x for read read from file x at for # so that you don't have to load 300MB into ram unless you want to. # do a repeat for each line x in it get offset("mystic_mouse",i) if it is not 0 then add 1 to the_counter Just an other possibility... Even better, have the server write a log each day to keep it shorter (unless it's google;-). Regards, Andu Novac ___ metacard mailing list [EMAIL PROTECTED] http://lists.runrev.com/mailman/listinfo/metacard
Read and Analyze Giant Files
We are trying to use Rev (or MC) to analyze a web site access log that is 3 million lines long, a 300 meg (or more) file. If I try a shell script (interpreted) or pascal program (compiled) each runs in about 2 minutes on this file but an xTalk script takes a very long time, maybe it hangs forever? shell and pascal can read the file one line at a time and process the line but not sure how to do it in mc. Here's the code shell, 2 lines #!/bin/sh fgrep mystic_mouse | wc -l pascal, 16 lines Program detect; {$H+} Var buffer : string; result : integer; Begin {main} buffer := ''; result := 0; While (Not(eof)) Do Begin Readln(buffer); If (pos('mystic_mouse', buffer) > 0) Then inc(result); End; {file} Writeln(result); End. {program} metacard, 13 lines #!/usr/local/bin/mc on startup put empty into the_message put 0 into the_counter read from stdin until empty put it into the_message repeat for each line this_line in the_message if (this_line contains "mystic_mouse") then put the_counter + 1 into the_counter end if end repeat put the_counter end startup is there a more efficient way in Transcript to do this? Well, to try it out, I chopped down the log to a half million lines, then I got these times: shell script, 22 seconds. Pascal 7 seconds. metacard 13 minutes and still running ?? I would like never to have to say that "Metacard/Revolution can't do this" Before I tackle it further I was thinking someone had already invented this wheel. Thanks! Himalayan Academy Publications Sannyasin Sivakatirswami Editor's Assistant/Production Manager [EMAIL PROTECTED] www.HinduismToday.com, www.HimalayanAcademy.com, www.Gurudeva.org, www.hindu.org