A brief overview of how I did it:
1. Loop over my file count
2. Read the file and convert it all to lowercase
3. Removed all punctuation and replaced all line breaks with spaces
3. Make a structure and looped over the file variable, adding each unique
word if not present or increasing the counter if there and another found.
4. Loop over the structure and if a counter is greater than 20 then delete
that key. If it is less than 20 I put it into a main structure(this
encompasses all files) and I do the same increment counter if needed
approach as above.
5. Now outside my file loop I loop over that main structure and delete all
keys with counter of one.
6. Create a query via QueryNew, QueryAddRow functions
7. Loop over my main structure and insert the data into this query
8. Do a query on a query in order to do the proper ordering of the words
and their counts
9. Output the data via a cfdump
I thought about doing the database approach, but just opted for this
one. What I really wanted to do was present 3-4 solutions but my free
time at the moment was extremely limited so after a few days of just
sitting on my working solution I opted to just send it in.
Snipe - <CF_BotMaster Network="EFNet" Channel="ColdFusion">
On Thu, 6 May 2004, I-Lin Kuo wrote:
> I thought about this and while I didn't actuallydo it, I'd like to compare
> approaches anyway. Here's the challenge from the website for those
> unfamiliar with the problem:
> ======
> There are nine source text files. Your challenge is to read those nine files
> and count the occurences of each word, disregarding any punctuation or
> capitalization.
>
> Then, discard all words that occur more than 20 times within any single
> source file.
>
> Then, discard any word that only occurs within a single source file.
>
> Then, sort the remaining words in descending order of occurence (over all
> source files), secondarily sorted by alphabetical order.
> =====
>
> I wouldn't use a pure cf approach as the three requests are easily fulfilled
> by SQL statements without me having to write explicit loops.
>
> I'd:
> 1. read the contents of each file into a variable.
> 2. with a singular regular _expression_, replace all punctuation and white
> space characters by a space character.
> 3. Treat the variable as a list delimited by spaces and zap it with
> ListToArray(). Looping through an array is much faster than looping through
> a list.
> 4. Loop through the array and construct a structure/hash whose keys are
> the words and the values are the total number of appearances in the
> document.
> 5. Loop through the structure and write the result out in a tab-delimited
> file with three columns: filename, word, occurrences.
> 6. Do this for each file
> 7. invoke the bulk loading utility for the database to load each file into
> a table myTable1, by writing java.system.runtime.exec() within cf. (This is
> going to be faster than an insert statement for each word).
> 9. Delete from myTable where occurrences > 20;
> 10. Delete from myTable where word in (SELECT word from myTable2 group by
> word having count(*) = 1); /*some databases won't let you run this statement
> */
> 11. SELECT * from myTable2 order by occurrences, word;
>
>
>
> I-Lin Kuo, Ann Arbor, MI
> Macromedia Certified ColdFusion 5.0 Advanced Developer
> Sun Certified Java 2 Programmer
> Ann Arbor Java Users Group (www.aajug.org) SUN Top 25 JUG
>
>
>
>
>
> >From: [EMAIL PROTECTED] (Michael Dinowitz)
> >Reply-To: [EMAIL PROTECTED]
> >To: CF-Jobs-Talk <[EMAIL PROTECTED]>
> >Subject: Mindseye test
> >Date: Thu, 6 May 2004 10:37:41 -0400
> >
> >Two weeks or so back Raymond posted a job offer from the company he worked
> >with
> >that included a test. If anyone took that test, I'd like to trade solutions
> >so I
> >can see how others solved the problem.
> > http://www.houseoffusion.com/lists.cfm/link=m:13:2261:3015
> >Thanks
> >--
> >Michael Dinowitz
> >House of Fusion
> >http://www.houseoffusion.com
> >Finding technical solutions to the problems you didn't know you had yet
> >
> >
> >
>
>
[Todays Threads]
[This Message]
[Subscription]
[Fast Unsubscribe]
[User Settings]
- Mindseye test Michael Dinowitz
- Re: Mindseye test Aaron Rouse
- RE: Mindseye test I-Lin Kuo
- RE: Mindseye test Aaron Rouse
- RE: Mindseye test Jeffry Houser
- RE: Mindseye test Aaron Rouse
- RE: Mindseye t... Jeffry Houser
- RE: Mindse... Aaron Rouse
- RE: Mindseye test Aaron Rouse
- Re: Mindseye test Michael Dinowitz
- RE: Mindseye test Stephen Milligan
- Re: Mindseye test Neil Robertson-Ravo [Team Macromedia]
- RE: Mindseye t... Stephen Milligan
- Re: Mindse... Michael Dinowitz