At 21:46 -0800 on 11/11/2011, Sumtingwong wrote about Re: Large text search:
> You can do it in BBEdit using a Perl script, but in what form do youwant the results?John, thanks for your reply. I have spent my evenings attempting to write a pithy one liner from the command line to do this, but the resolution is just not there with grep. All of the files to be searched contain paragraphs of text that is soft wrapped (I don't know if that is the correct term, sorry). I have not written any Perl in over 10 years, time to break out the books! What is needed? A frequency count of each word in the input file for each file that was searched. For example, the first word of the input file is "it". Document one is searched for "it" and it shows up 248 times. Optimal output would be (in tabbed columns): it Document 1 248 I know the output is going to be huge (as the input file is rather large), but that is fine--I just need to get to the analysis part at this point. Cheers!
I'm sorry that I can not help you with a BBE based solution but I think you might be attempting to use the wrong tool for this project.
This is the type of project that I feel is better suited to use of a database like mySQL. You read the first file to populate the database. As you read each new file you do the updates. Once you are done you can do a query on each word and it can tell you total occurrences or occurrences per file.
Note that the actual updating of the database needs a program to access the database while once populated the results can be done with any database access utility that accepts the SQL query.
. --You received this message because you are subscribed to the "BBEdit Talk" discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at <http://groups.google.com/group/bbedit?hl=en>If you have a feature request or would like to report a problem, please email "supp...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>
<<attachment: ;-) Wink.gif>>