Machine Learning Question

David Stuart Wed, 17 Feb 2010 01:24:19 -0800

Hi All,

I think this question is appropriate for the Mahout mailing list but if not any 
pointers in the right direction or advise would be welcomed.


We have a taxonomy based navigation system where items in the navigation tree 
are made up of tag based queries (instead of natural language words) which are 
matched against content items tagged in a similar way.

so we have a taxonomy tree with queries
Id         Label 
001     Fruit (fid:123 or fid:675) AND -fid:(324 OR 678) ...
002     Round
003               Apple
004               Orange
006        Star
007              Star fruit
....

Content pool

"Interesting article on fruit" -> tagged with (123, 234, 675)
"The mightly orange!" -> tagged with (123, 324, 678)

hopefully you get the picture..

Now we bake these queries into our Solr index so instead of doing the Fruit 
query we have pre done it and just search for items in index that have id 001 
the reasons for doing this are not really important but we have written a 
indexer for the purpose. Also content items are multi-surfacing so a item could 
appear at 001, 004 and 007

Although the indexer is ok at doing this pre bake job its not very fast and as 
the content and tree grows it gets slower. 

NOW for the actual Question!!!

Is there a ML model that can quickly classify/identify where a new (or 
retagged)  piece of content fits onto the tree. Oh the queries on the leaf 
nodes can change (less often) so a quick process to reclassify what is in score 
for that leaf would be useful.
The reason I want this is because it would great have realtime feed back to an 
author applying tags to a document of where it fits in the site. 

Once I get this working I would love to add suggested tags or weighting based 
on content items with contextual similarity. 
I think it was Grant that was talking about a Solr external field that could be 
used to hook this together or maybe I am mistaken

Hope this makes sense

Thanks for you help/advise in advance

Regards,

Dave

Machine Learning Question

Reply via email to