Re: [algogeeks] Re: top search queries
Going by the hint given in the problem Divide the stream into windows and assume that we have enough space to store all the queries from the stream window in a hash table along frequency count. Also maintain a global Hash table that will contain the frequency counts of top n queries seen so far. Make sure that n100 .Once the window is filled with enough queries, merge them into the global hash table. If a query is already available in the global hash table update the frequency count.Again this solution cannot be accurate due to limited memory and it allows for some error. larger the value of n lesser the error proneness. On Thu, Feb 3, 2011 at 9:51 PM, Srikar srikar2...@gmail.com wrote: @algoose I see what you are saying. what do you propose? checking out your link now... On Thu, Feb 3, 2011 at 11:44 AM, Algoose chase harishp...@gmail.comwrote: @Srikar In your first approach you cant simply ignore the queries that are not present in the heap because you have a stream of queries and you never know if the query that you are about to ignore is going be received frequently or not in future. Your approach is like find the top 100 queries from the stream and keep updating the frequencies of only those queries using heap and hash table. If you have to process some 1,00,000 , with a space for only 100 elements you cant find the frequencies correctly. this is a nice article related to this : http://www.americanscientist.org/issues/pub/the-britney-spears-problem On Tue, Feb 1, 2011 at 8:09 PM, sankalp srivastava richi.sankalp1...@gmail.com wrote: @guy above juver++ The solution , i don't think can get better than this , because you need to store the querries anyway (at least for the output ) -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.comalgogeeks%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.comalgogeeks%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.comalgogeeks%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
Re: [algogeeks] Re: top search queries
@Srikar In your first approach you cant simply ignore the queries that are not present in the heap because you have a stream of queries and you never know if the query that you are about to ignore is going be received frequently or not in future. Your approach is like find the top 100 queries from the stream and keep updating the frequencies of only those queries using heap and hash table. If you have to process some 1,00,000 , with a space for only 100 elements you cant find the frequencies correctly. this is a nice article related to this : http://www.americanscientist.org/issues/pub/the-britney-spears-problem On Tue, Feb 1, 2011 at 8:09 PM, sankalp srivastava richi.sankalp1...@gmail.com wrote: @guy above juver++ The solution , i don't think can get better than this , because you need to store the querries anyway (at least for the output ) -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.comalgogeeks%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
Re: [algogeeks] Re: top search queries
@algoose I see what you are saying. what do you propose? checking out your link now... On Thu, Feb 3, 2011 at 11:44 AM, Algoose chase harishp...@gmail.com wrote: @Srikar In your first approach you cant simply ignore the queries that are not present in the heap because you have a stream of queries and you never know if the query that you are about to ignore is going be received frequently or not in future. Your approach is like find the top 100 queries from the stream and keep updating the frequencies of only those queries using heap and hash table. If you have to process some 1,00,000 , with a space for only 100 elements you cant find the frequencies correctly. this is a nice article related to this : http://www.americanscientist.org/issues/pub/the-britney-spears-problem On Tue, Feb 1, 2011 at 8:09 PM, sankalp srivastava richi.sankalp1...@gmail.com wrote: @guy above juver++ The solution , i don't think can get better than this , because you need to store the querries anyway (at least for the output ) -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.comalgogeeks%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.comalgogeeks%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en. -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.
[algogeeks] Re: top search queries
@guy above juver++ The solution , i don't think can get better than this , because you need to store the querries anyway (at least for the output ) -- You received this message because you are subscribed to the Google Groups Algorithm Geeks group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.