Re: [algogeeks] top search queries

Srikar Mon, 31 Jan 2011 07:04:36 -0800

First Approach,

0) the queries can be considered as an infinite stream. maintain a global
count of the number of queries coming from the stream (used for calc the
frequency %).
1) maintain a min-heap (has just top 100 queries by frequency) + hash table
(where we store frequency for each word in heap). the element at the top of
the heap is the query having min-frequency.
2) once we get a query. we check if it is present in the hashtable. if yes
then update this query frequency & reorder the heap. if no then do nothing.

here I assume we want top 100 most queries. this can be easily extended to
include all queries having > 0.1% frequency.

Complexity:
    time: O(logn) + O(n) [for heap & hashtable construction] here n=100
    space: 2O(n)

Will come up with something better. If extend n to say 1,00,000, the space
complexity is going to be a problem.

Second Approach,

0) We could use only one min-heap. Each node has the (query, frequency %).
1) get the query from the stream. Traverse the heap to see if this query is
already present. If yes then re-calculate the frequency & if needed reorder
the heap. if query not present in the heap, then calc it's frequency & see
if the frequency of the node at min-heap is < the current query freq. if
(curr_query_freq > min-heap node freq.) then swap the min-heap node &
reorder the heap. else continue.

Time: O(logn) n=number of queries we want to consider.
space: O(n)

Srikar

On Mon, Jan 31, 2011 at 6:57 PM, snehal jain <learner....@gmail.com> wrote:

> please give your ideas for this
>
>
> On Fri, Jan 21, 2011 at 2:28 PM, snehal jain <learner....@gmail.com>wrote:
>
>> Magy! receives a huge amount of queries everyday. They don’t want to
>> store all of them, since many of the queries are unique (submitted
>> just one time by just one user). Anyway, for caching reason they want
>> to understand what are the queries whose frequency exceed s=0.1%. How
>> do you solve the problem? Remember we don’t want to store all the
>> queries.
>> Hint: split the stream into windows and accept some error in
>> estimation.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Algorithm Geeks" group.
>> To post to this group, send email to algogeeks@googlegroups.com.
>> To unsubscribe from this group, send email to
>> algogeeks+unsubscr...@googlegroups.com<algogeeks%2bunsubscr...@googlegroups.com>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/algogeeks?hl=en.
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to algogeeks@googlegroups.com.
> To unsubscribe from this group, send email to
> algogeeks+unsubscr...@googlegroups.com<algogeeks%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] top search queries

Reply via email to