Bill Moseley wrote:

At 12:00 AM 01/27/02 +0800, Stas Bekman wrote:

so we have about 3MB of source code in 134 files (and will be more likely 6MB, when 2.0 docs are done, with 200+ files). Do you think it's possible to grep through in a reasonable response time? Remember that there will be a lot of IO for opening and closing many files.


It's not like mod_perl is a high volume site. And it's running on a lot faster machine than my machine:

~/modperl-docs > find src -name '*.pod' | wc -l 105

~/modperl-docs > time find src -name '*.pod' | xargs fgrep '$|' | wc -l
     23

real    0m0.033s
user    0m0.030s
sys     0m0.010s

That seems reasonable enough, even if it was ten times slower.


Hmm, you were trying this on uloaded machine, right? If you have many parallel searches and other tasks running this can be much much slower, no?

Also remember that user doesn't care about CPU clocks, but elapsed wallclock.

Also which OS/distro are you running this at? how time gets through the pipe? It doesn't work for me. If I try:

time find src -name '*.pod' -exec fgrep -l '$|' {} \;
src/docs/2.0/devel/debug_c/debug_c.pod
src/docs/2.0/devel/testing/testing.pod
src/docs/1.0/faqs/cgi_to_mod_perl.pod
src/docs/1.0/guide/control.pod
src/docs/1.0/guide/debug.pod
src/docs/1.0/guide/perl.pod
src/docs/1.0/guide/performance.pod
src/docs/1.0/guide/porting.pod
src/docs/1.0/guide/scenario.pod
src/docs/1.0/guide.good/control.pod
src/docs/1.0/guide.good/debug.pod
src/docs/1.0/guide.good/perl.pod
src/docs/1.0/guide.good/performance.pod
src/docs/1.0/guide.good/porting.pod
src/docs/1.0/guide.good/scenario.pod
0.120u 0.170s 0:00.31 93.5%     0+0k 0+0io 18193pf+0w

as you can see it's much slower.


All the reverse indexing engines will parse on indexing, so it will always
be an issue of defining what makes up a word.

Let me ask Avi Rappoport if there's something good for searching code.

I think that Randy's setup was quite satisfying, but nextrieve was even better. What do you think about nextrieve?


I don't know much about it. It's not open source, and it's not free. I really doubt it integrates with Template Toolkit.


Ah, OK. I didn't know that.


Could we feed the pod source into Parse::RecDescent and get it to tokenize
perl code?  That would be more fun.

I guess so, but from what I know, Parse::RecDescent is not good for real-time processing because it's very slow. Rememember that it stores the parsed tree using Perl datastructures, which is very ineffective. I don't know if it was rewritten to use C datastructures since last year.


_____________________________________________________________________
Stas Bekman             JAm_pH      --   Just Another mod_perl Hacker
http://stason.org/      mod_perl Guide   http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]  http://ticketmaster.com http://apacheweek.com
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to