Thanks for the detailed reply. Our urls are already designed in a way that
they represent the category they are in. I like the idea of adding a custom
index term of category.  thanks again, kenan.
On Tue, Aug 4, 2009 at 10:52 PM, Dennis Kubes <[email protected]> wrote:

> Visvo.com originally was a categorized wide web search.  While I don't
> think our approach was the best way to proceed in hindsight, here is what we
> did.
>
> 1) We had a mapreduce job that wasrun to place urls in a given category.
>  The actual function for determining a category is arbitrary.   We started
> with Bayesian methods based on noun phrases matched to hand built
> categories, but it could be any function you want as long as it maps url ->
> 1+ categories.  Our function returned floats for categories,  highest
> matching category wins.
>
> 2) The job was such that if the function would pick the best category out
> of a level, then rerun on its children.  The function returned a float
> value.  If that value was higher than its parent it would continue checking
> children at the next level and so on.  The idea behind this was to find the
> best category in a tree of categories.
>
> 3) If a url was in a category, it was considered to be in all of its parent
> categories.  So let's say we a url is in the following category:
>
> /one/two/three/four
>
> It is also considered to be in
>
> /one/two/three
> /one/two
> /one
>
> In the index we added a custom field called category and we would add the
> category it was assigned to and all of its parent categories.
>
> The UI would allow running keyword searches but also had a listing of
> categories which were links.  There was some special logic to try and
> determine relevant starting point in the category tree from the query. Not
> real successful so most started at the base of the category tree. Clicking
> on a link would run a query like this:
>
> keywords AND category=/one/two/three
>
> Which should return you categorized results.  As I said maybe not the best
> approach but is an approach to having a categorized result. Hope this helps.
>
> Dennis
>
>
>
> Kenan Azam wrote:
>
>> Hi,
>>
>> I am using nutch 0.8.1 to do site wide searches. I want certain results to
>> be boosted more than others for which I have added custom index terms and
>> boosted them.
>> However, now I have the need to categorize results into category so that
>> interesting categories are not buried deep under.
>> Has someone tried to categorize search results. For example out of a 100
>> results, 20 appear in category1, 50 appear in category 2 and all others
>> appear in a third category?
>>
>> Thanks, Kenan.
>>
>>

Reply via email to