How to efficiently get # of search results, per attribute
I'd like to implement a search across several types of entities, let's say, classes, professors, and departments. I want the user to be able to enter a simple, single query and not have to specify what they're looking for. Then I want the search results to be something like this: Search results for: philosophy boyer Found: 121 classes - 5 professors - 2 departments search results here... I know I could iterate through every hit returned and count them up myself, but that seems inefficient if there are lots of results. Is there some other way to get this kind of information from the search result set? My other ideas are: doing a separate search each result type, or storing different types in different indexes. Any suggestions? Thanks for your help! -Chris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to efficiently get # of search results, per attribute
It depends on how many results they're looking through, here are two scenarios I see: 1] If you don't have that many records you can fetch all the results and then do a post parsing step the determine totals 2] If you have a lot of entries in each category and you're worried about fetching thousands of records every time, you can just have seperate indecies per category and search them in in parallel (not Lucene Parallel Search) and you can get up to 100 hits for each one (efficiency) but you'll also have the total from the search to display. Either way you can boost up speed using RamDirectory if you need more speed from the search, but whichever approach you choose I would recommend that you sit down and do some number crunching to figure out which way to go. Hope this helps Nader Henein Chris Lamprecht wrote: I'd like to implement a search across several types of entities, let's say, classes, professors, and departments. I want the user to be able to enter a simple, single query and not have to specify what they're looking for. Then I want the search results to be something like this: Search results for: philosophy boyer Found: 121 classes - 5 professors - 2 departments search results here... I know I could iterate through every hit returned and count them up myself, but that seems inefficient if there are lots of results. Is there some other way to get this kind of information from the search result set? My other ideas are: doing a separate search each result type, or storing different types in different indexes. Any suggestions? Thanks for your help! -Chris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: How to efficiently get # of search results, per attribute
My Lucene application includes multi-faceted navigation that does a more complex version of the below. I've got 5 different taxonomies into which every indexed item is classified. The largest of the taxonomies has over 15,000 entries while the other 4 are much smaller. For every search query, I determine the best small set of nodes from each taxonomy to present to the user as drill down options, and provide the counts regarding how many results fall under each of these nodes. At present I only have about 25,000 indexed objects and usually no more than 1,000 results from the initial query. To determine the drill-down options and counts, I scan up to 1,000 results computing the counts for all nodes into which these results classify. Then for each taxonomy I pick the best drill-down options available (orthogonal set with reasonable branching factor that covers all results) and present them with their counts. If there are more than 1,000 results, I extrapolate the computed counts to estimate the actual counts on the entire set of results. This is all done with a single index and a single search. The total time required for performing this computation for the one large taxonomy is under 10ms, running in full debug mode in my ide. The query response time overall is subjectively instantaneous at the UI (Google-speed or better). So, unless some dimension of the problem is much bigger than mine, I doubt performance will be an issue. Chuck -Original Message- From: Nader Henein [mailto:[EMAIL PROTECTED] Sent: Saturday, November 13, 2004 2:29 AM To: Lucene Users List Subject: Re: How to efficiently get # of search results, per attribute It depends on how many results they're looking through, here are two scenarios I see: 1] If you don't have that many records you can fetch all the results and then do a post parsing step the determine totals 2] If you have a lot of entries in each category and you're worried about fetching thousands of records every time, you can just have seperate indecies per category and search them in in parallel (not Lucene Parallel Search) and you can get up to 100 hits for each one (efficiency) but you'll also have the total from the search to display. Either way you can boost up speed using RamDirectory if you need more speed from the search, but whichever approach you choose I would recommend that you sit down and do some number crunching to figure out which way to go. Hope this helps Nader Henein Chris Lamprecht wrote: I'd like to implement a search across several types of entities, let's say, classes, professors, and departments. I want the user to be able to enter a simple, single query and not have to specify what they're looking for. Then I want the search results to be something like this: Search results for: philosophy boyer Found: 121 classes - 5 professors - 2 departments search results here... I know I could iterate through every hit returned and count them up myself, but that seems inefficient if there are lots of results. Is there some other way to get this kind of information from the search result set? My other ideas are: doing a separate search each result type, or storing different types in different indexes. Any suggestions? Thanks for your help! -Chris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to efficiently get # of search results, per attribute
Nader and Chuck, Thanks for the responses, they're both helpful. My index sizes will begin on the order of 200,000 classes, and 20,000 instructors (and much fewer departments), and grow over time to maybe a few million classes. Compared to some of the numbers I've seen on this mailing list, my dataset is fairly small. I think I'll not worry about performance for now, until unless it becomes an issue. -Chris On Sat, 13 Nov 2004 15:36:11 -0800, Chuck Williams [EMAIL PROTECTED] wrote: My Lucene application includes multi-faceted navigation that does a more complex version of the below. I've got 5 different taxonomies into which every indexed item is classified. The largest of the taxonomies has over 15,000 entries while the other 4 are much smaller. For every search query, I determine the best small set of nodes from each taxonomy to present to the user as drill down options, and provide the counts regarding how many results fall under each of these nodes. At present I only have about 25,000 indexed objects and usually no more than 1,000 results from the initial query. To determine the drill-down options and counts, I scan up to 1,000 results computing the counts for all nodes into which these results classify. Then for each taxonomy I pick the best drill-down options available (orthogonal set with reasonable branching factor that covers all results) and present them with their counts. If there are more than 1,000 results, I extrapolate the computed counts to estimate the actual counts on the entire set of results. This is all done with a single index and a single search. The total time required for performing this computation for the one large taxonomy is under 10ms, running in full debug mode in my ide. The query response time overall is subjectively instantaneous at the UI (Google-speed or better). So, unless some dimension of the problem is much bigger than mine, I doubt performance will be an issue. Chuck -Original Message- From: Nader Henein [mailto:[EMAIL PROTECTED] Sent: Saturday, November 13, 2004 2:29 AM To: Lucene Users List Subject: Re: How to efficiently get # of search results, per attribute It depends on how many results they're looking through, here are two scenarios I see: 1] If you don't have that many records you can fetch all the results and then do a post parsing step the determine totals 2] If you have a lot of entries in each category and you're worried about fetching thousands of records every time, you can just have seperate indecies per category and search them in in parallel (not Lucene Parallel Search) and you can get up to 100 hits for each one (efficiency) but you'll also have the total from the search to display. Either way you can boost up speed using RamDirectory if you need more speed from the search, but whichever approach you choose I would recommend that you sit down and do some number crunching to figure out which way to go. Hope this helps Nader Henein Chris Lamprecht wrote: I'd like to implement a search across several types of entities, let's say, classes, professors, and departments. I want the user to be able to enter a simple, single query and not have to specify what they're looking for. Then I want the search results to be something like this: Search results for: philosophy boyer Found: 121 classes - 5 professors - 2 departments search results here... I know I could iterate through every hit returned and count them up myself, but that seems inefficient if there are lots of results. Is there some other way to get this kind of information from the search result set? My other ideas are: doing a separate search each result type, or storing different types in different indexes. Any suggestions? Thanks for your help! -Chris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]