Thanks for the detailed reply. Our urls are already designed in a way that they represent the category they are in. I like the idea of adding a custom index term of category. thanks again, kenan. On Tue, Aug 4, 2009 at 10:52 PM, Dennis Kubes <[email protected]> wrote:
> Visvo.com originally was a categorized wide web search. While I don't > think our approach was the best way to proceed in hindsight, here is what we > did. > > 1) We had a mapreduce job that wasrun to place urls in a given category. > The actual function for determining a category is arbitrary. We started > with Bayesian methods based on noun phrases matched to hand built > categories, but it could be any function you want as long as it maps url -> > 1+ categories. Our function returned floats for categories, highest > matching category wins. > > 2) The job was such that if the function would pick the best category out > of a level, then rerun on its children. The function returned a float > value. If that value was higher than its parent it would continue checking > children at the next level and so on. The idea behind this was to find the > best category in a tree of categories. > > 3) If a url was in a category, it was considered to be in all of its parent > categories. So let's say we a url is in the following category: > > /one/two/three/four > > It is also considered to be in > > /one/two/three > /one/two > /one > > In the index we added a custom field called category and we would add the > category it was assigned to and all of its parent categories. > > The UI would allow running keyword searches but also had a listing of > categories which were links. There was some special logic to try and > determine relevant starting point in the category tree from the query. Not > real successful so most started at the base of the category tree. Clicking > on a link would run a query like this: > > keywords AND category=/one/two/three > > Which should return you categorized results. As I said maybe not the best > approach but is an approach to having a categorized result. Hope this helps. > > Dennis > > > > Kenan Azam wrote: > >> Hi, >> >> I am using nutch 0.8.1 to do site wide searches. I want certain results to >> be boosted more than others for which I have added custom index terms and >> boosted them. >> However, now I have the need to categorize results into category so that >> interesting categories are not buried deep under. >> Has someone tried to categorize search results. For example out of a 100 >> results, 20 appear in category1, 50 appear in category 2 and all others >> appear in a third category? >> >> Thanks, Kenan. >> >>
