First Approach, 0) the queries can be considered as an infinite stream. maintain a global count of the number of queries coming from the stream (used for calc the frequency %). 1) maintain a min-heap (has just top 100 queries by frequency) + hash table (where we store frequency for each word in heap). the element at the top of the heap is the query having min-frequency. 2) once we get a query. we check if it is present in the hashtable. if yes then update this query frequency & reorder the heap. if no then do nothing.
here I assume we want top 100 most queries. this can be easily extended to include all queries having > 0.1% frequency. Complexity: time: O(logn) + O(n) [for heap & hashtable construction] here n=100 space: 2O(n) Will come up with something better. If extend n to say 1,00,000, the space complexity is going to be a problem. Second Approach, 0) We could use only one min-heap. Each node has the (query, frequency %). 1) get the query from the stream. Traverse the heap to see if this query is already present. If yes then re-calculate the frequency & if needed reorder the heap. if query not present in the heap, then calc it's frequency & see if the frequency of the node at min-heap is < the current query freq. if (curr_query_freq > min-heap node freq.) then swap the min-heap node & reorder the heap. else continue. Time: O(logn) n=number of queries we want to consider. space: O(n) Srikar On Mon, Jan 31, 2011 at 6:57 PM, snehal jain <learner....@gmail.com> wrote: > please give your ideas for this > > > On Fri, Jan 21, 2011 at 2:28 PM, snehal jain <learner....@gmail.com>wrote: > >> Magy! receives a huge amount of queries everyday. They don’t want to >> store all of them, since many of the queries are unique (submitted >> just one time by just one user). Anyway, for caching reason they want >> to understand what are the queries whose frequency exceed s=0.1%. How >> do you solve the problem? Remember we don’t want to store all the >> queries. >> Hint: split the stream into windows and accept some error in >> estimation. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Algorithm Geeks" group. >> To post to this group, send email to algogeeks@googlegroups.com. >> To unsubscribe from this group, send email to >> algogeeks+unsubscr...@googlegroups.com<algogeeks%2bunsubscr...@googlegroups.com> >> . >> For more options, visit this group at >> http://groups.google.com/group/algogeeks?hl=en. >> >> > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to algogeeks@googlegroups.com. > To unsubscribe from this group, send email to > algogeeks+unsubscr...@googlegroups.com<algogeeks%2bunsubscr...@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.