RFE: Read and Analyze Giant Files

2002-11-08 Thread Sannyasin Sivakatirswami
Jeanne:

Slick solution!  Think I'll try it. .
(read and analyze data in 300 meg file with over 3 million lines)

Welp, it sounds good but it doesn't work.,  Here's my code

#!/usr/local/bin/mc
on startup
put 0 into the_counter
repeat
  read from stdin until "mystic_mouse"
  if the result is not empty then
    add 1 to the_counter
  else
    put the_counter
    exit repeat
  end if
end repeat
end startup

I got, counter is zero, program exits with "broken pipe".  I am cat'ing 
the input into it with a pipe.

??
Himalayan Academy Publications
Sannyasin Sivakatirswami
[EMAIL PROTECTED]
www.HinduismToday.com, www.HimalayanAcademy.com,
www.Gurudeva.org, www.hindu.org
___
metacard mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/metacard


Re: Read and Analyze Giant Files

2002-11-08 Thread Pierre Sahores
Sannyasin Sivakatirswami a écrit :
> 
> We are trying to use Rev (or MC) to analyze a web site access log that is 3 million 
>lines long, a 300 meg (or more) file.
> 
> If I try a shell script (interpreted) or pascal program (compiled) each runs in 
>about 2 minutes on this file but an xTalk script takes a very long time, maybe it 
>hangs forever?  shell and pascal can read the file one line at a time and process the 
>line but not sure how to do it in mc.
> 
> Here's the code
> 
> shell, 2 lines
> 
> #!/bin/sh
> fgrep mystic_mouse | wc -l
> 
> pascal, 16 lines
> 
> Program detect;
> {$H+}
> Var
>   buffer: string;
>   result: integer;
> Begin {main}
>   buffer := '';
>   result := 0;
>   While (Not(eof)) Do
>   Begin
> Readln(buffer);
> If (pos('mystic_mouse', buffer) > 0) Then
>   inc(result);
>   End; {file}
>   Writeln(result);
> End. {program}
> 
> metacard, 13 lines
> 
> #!/usr/local/bin/mc
> on startup
>   put empty into the_message
>   put 0 into the_counter
>   read from stdin until empty
>   put it into the_message
>   repeat for each line this_line in the_message
> if (this_line contains "mystic_mouse") then
>   put the_counter + 1 into the_counter
> end if
>   end repeat
>   put the_counter
> end startup
> 
> is there a more efficient way in Transcript to do this?
> 
> Well, to try it out, I chopped down the log to a half million lines, then I got 
>these times:  shell script, 22 seconds.  Pascal 7 seconds.  metacard 13 minutes and 
>still running ??
> 
> I would like never to have to say that "Metacard/Revolution can't do this"
> 
> Before I tackle it further I was thinking someone had already invented this wheel.
> 
> Thanks!
> Himalayan Academy Publications
> Sannyasin Sivakatirswami
> Editor's Assistant/Production Manager
> [EMAIL PROTECTED]
> www.HinduismToday.com, www.HimalayanAcademy.com,
> www.Gurudeva.org, www.hindu.org

Try something alike :

> on mouseup
> put "1" into startread
> open file thefile for read
> read from file thefile until eof
> put the num of lines of it in endtoread
> close file thefile
> repeat while startread < endtoread
> open file thefile for read
> read from file thefile at startread for 99 lines
> ...
> do what you need with it
> ...
> close file thefile
> add 100 to startread
> end repeat
> end mouseup

-- 
Cordialement, Pierre Sahores

Inspection académique de Seine-Saint-Denis.
Applications et bases de données WEB et VPN
Qualifier et produire l'avantage compétitif
___
metacard mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/metacard



Re: Read and Analyze Giant Files

2002-11-07 Thread Yennie
Problem here most likely is that you are trying to read the entire 300MB file into memory at once. Unless you have an extra 300MB block of RAM floating around, this will start beating on virtual memory pretty hard (and increasingly slow).

You'll notice that your non-MetaCard script reads one line at a time- I would suggest doing the same.
If you are just looking for occurences of a single string, you could also try using the grep command-line tool. Type "man grep" into the Terminal for syntax.

HTH,
Brian


Re: Read and Analyze Giant Files

2002-11-07 Thread andu


--On Thursday, November 07, 2002 18:42:42 -1000 Sannyasin Sivakatirswami 
<[EMAIL PROTECTED]> wrote:


metacard, 13 lines

# !/usr/local/bin/mc
on startup
  put empty into the_message
  put 0 into the_counter
  read from stdin until empty
  put it into the_message
  repeat for each line this_line in the_message
    if (this_line contains "mystic_mouse") then
  put the_counter + 1 into the_counter
    end if
  end repeat
  put the_counter
end startup



is there a more efficient way in Transcript to do this?



In Transcript maybe not but you can try metatalk;-).
First of all, I don't understand how you feed the file to the script in 
your script (read from stdin until empty?). Since the file is so huge you 
should probably use something like:

open file x for read
read from file x at  for 
# so that you don't have to load 300MB into ram unless you want to.
# do a
repeat for each line x in it
get offset("mystic_mouse",i)
if it is not 0 then add 1 to the_counter


Just an other possibility... Even better, have the server write a log each 
day to keep it shorter (unless it's google;-).


Regards, Andu Novac
___
metacard mailing list
[EMAIL PROTECTED]
http://lists.runrev.com/mailman/listinfo/metacard


Read and Analyze Giant Files

2002-11-07 Thread Sannyasin Sivakatirswami
We are trying to use Rev (or MC) to analyze a web site access log that is 3 million lines long, a 300 meg (or more) file.

If I try a shell script (interpreted) or pascal program (compiled) each runs in about 2 minutes on this file but an xTalk script takes a very long time, maybe it hangs forever?  shell and pascal can read the file one line at a time and process the line but not sure how to do it in mc.

Here's the code

shell, 2 lines

#!/bin/sh
fgrep mystic_mouse | wc -l

pascal, 16 lines

Program detect;
{$H+}
Var
  buffer    : string;
  result    : integer;
Begin {main}
  buffer := '';
  result := 0;
  While (Not(eof)) Do
  Begin
    Readln(buffer);
    If (pos('mystic_mouse', buffer) > 0) Then
  inc(result);
  End; {file}
  Writeln(result);
End. {program}

metacard, 13 lines

#!/usr/local/bin/mc
on startup
  put empty into the_message
  put 0 into the_counter
  read from stdin until empty
  put it into the_message
  repeat for each line this_line in the_message
    if (this_line contains "mystic_mouse") then
  put the_counter + 1 into the_counter
    end if
  end repeat
  put the_counter
end startup



is there a more efficient way in Transcript to do this? 

Well, to try it out, I chopped down the log to a half million lines, then I got these times:  shell script, 22 seconds.  Pascal 7 seconds.  metacard 13 minutes and still running ??

I would like never to have to say that "Metacard/Revolution can't do this"

Before I tackle it further I was thinking someone had already invented this wheel.



Thanks!
Himalayan Academy Publications
Sannyasin Sivakatirswami
Editor's Assistant/Production Manager
[EMAIL PROTECTED]
www.HinduismToday.com, www.HimalayanAcademy.com,
www.Gurudeva.org, www.hindu.org