A brief overview of how I did it:

1. Loop over my file count
2. Read the file and convert it all to lowercase
3. Removed all punctuation and replaced all line breaks with spaces
3. Make a structure and looped over the file variable, adding each unique
word if not present or increasing the counter if there and another found.
4. Loop over the structure and if a counter is greater than 20 then delete
that key.  If it is less than 20 I put it into a main structure(this
encompasses all files) and I do the same increment counter if needed
approach as above.
5. Now outside my file loop I loop over that main structure and delete all
keys with counter of one.
6. Create a query via QueryNew, QueryAddRow functions
7. Loop over my main structure and insert the data into this query
8. Do a query on a query in order to do the proper ordering of the words
and their counts
9. Output the data via a cfdump

I thought about doing the database approach, but just opted for this
one.  What I really wanted to do was present 3-4 solutions but my free
time at the moment was extremely limited so after a few days of just
sitting on my working solution I opted to just send it in.

Snipe - <CF_BotMaster Network="EFNet" Channel="ColdFusion">

On Thu, 6 May 2004, I-Lin Kuo wrote:

> I thought about this and while I didn't actuallydo it, I'd like to compare
> approaches anyway. Here's the challenge from the website for those
> unfamiliar with the problem:
> ======
> There are nine source text files. Your challenge is to read those nine files
> and count the occurences of each word, disregarding any punctuation or
> capitalization.
>
> Then, discard all words that occur more than 20 times within any single
> source file.
>
> Then, discard any word that only occurs within a single source file.
>
> Then, sort the remaining words in descending order of occurence (over all
> source files), secondarily sorted by alphabetical order.
> =====
>
> I wouldn't use a pure cf approach as the three requests are easily fulfilled
> by SQL statements without me having to write explicit loops.
>
> I'd:
>   1. read the contents of each file into a variable.
>   2. with a singular regular _expression_, replace all punctuation and white
> space characters by a space character.
>   3. Treat the variable as a list delimited by spaces and zap it with
> ListToArray(). Looping through an array is much faster than looping through
> a list.
>   4. Loop through the array and construct a structure/hash whose keys are
> the words and the values are the total number of appearances in the
> document.
>   5. Loop through the structure and write the result out in a tab-delimited
> file with three columns: filename, word, occurrences.
>   6. Do this for each file
>   7. invoke the bulk loading utility for the database to load each file into
> a table myTable1, by writing java.system.runtime.exec() within cf. (This is
> going to be faster than an insert statement for each word).
>   9. Delete from myTable where occurrences > 20;
>   10. Delete from myTable where word in (SELECT word from myTable2 group by
> word having count(*) = 1); /*some databases won't let you run this statement
> */
>   11. SELECT * from myTable2 order by occurrences, word;
>
>
>
> I-Lin Kuo, Ann Arbor, MI
> Macromedia Certified ColdFusion 5.0 Advanced Developer
> Sun Certified Java 2 Programmer
> Ann Arbor Java Users Group (www.aajug.org) SUN Top 25 JUG
>
>
>
>
>
> >From: [EMAIL PROTECTED] (Michael Dinowitz)
> >Reply-To: [EMAIL PROTECTED]
> >To: CF-Jobs-Talk <[EMAIL PROTECTED]>
> >Subject: Mindseye test
> >Date: Thu, 6 May 2004 10:37:41 -0400
> >
> >Two weeks or so back Raymond posted a job offer from the company he worked
> >with
> >that included a test. If anyone took that test, I'd like to trade solutions
> >so I
> >can see how others solved the problem.
> >  http://www.houseoffusion.com/lists.cfm/link=m:13:2261:3015
> >Thanks
> >--
> >Michael Dinowitz
> >House of Fusion
> >http://www.houseoffusion.com
> >Finding technical solutions to the problems you didn't know you had yet
> >
> >
> >
>
>
[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings]

Reply via email to