Re: dynamic categorization & transactional data

2010-03-20 Thread caman

@Grant
Less than a minute.  If we go with the meta-retrieval from the index, we
will have to keep the index updated down to seconds. But that may not scale
well.  Probably a hybrid approach?
I will look into classifier. thanks





Grant Ingersoll-6 wrote:
> 
> 
> On Mar 18, 2010, at 2:44 PM, caman wrote:
> 
>> 
>> 1) Took care of the first one by Transformer.
> 
> This is often also something done by a classifier that is trained to deal
> with all the statistical variations in your text.  Tools like Weka,
> Mahout, OpenNLP, etc. can be applied here.
> 
>> 2) Any input on 2 please? I need to store # of views and popularity with
>> each document and that can change pretty often. Recommended to use
>> database
>> or can this be updated to SOLr directly? My issue with DB is that with
>> every
>> SOLR search hit, will have to do DB hit to retrieve meta-data. 
> 
> Define often, please.  Less than a minute or more than a minute?
> 
>> 
>> Any input is appreciated please
>> 
>> caman wrote:
>>> 
>>> Hello all,
>>> 
>>> Please see below.any help much appreciated.
>>> 1) Extracting data out of a text field to assign a category for certain
>>> configured words. e.g. If the text is "Google does it again with
>>> Android" 
>>> and If 'Google' and 'Android' are the configured words, I want to b able
>>> to assign the article to tags 'Google' and 'Android' and 'Technical' .
>>> Can
>>> I do this with a custom filter during analysis? Similarly setting up
>>> categories for each article based on keywords in the text.
>>> 2) How about using SOLR as transactional datastore? Need to keep track
>>> of
>>> rating for each document. Would 'ExternalFileField' be good choice for
>>> this use-case?
>>> 
>>> Thanks in advance.
>>> 
>> 
>> -- 
>> View this message in context:
>> http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27949786.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27970656.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dynamic categorization & transactional data

2010-03-18 Thread Grant Ingersoll

On Mar 18, 2010, at 2:44 PM, caman wrote:

> 
> 1) Took care of the first one by Transformer.

This is often also something done by a classifier that is trained to deal with 
all the statistical variations in your text.  Tools like Weka, Mahout, OpenNLP, 
etc. can be applied here.

> 2) Any input on 2 please? I need to store # of views and popularity with
> each document and that can change pretty often. Recommended to use database
> or can this be updated to SOLr directly? My issue with DB is that with every
> SOLR search hit, will have to do DB hit to retrieve meta-data. 

Define often, please.  Less than a minute or more than a minute?

> 
> Any input is appreciated please
> 
> caman wrote:
>> 
>> Hello all,
>> 
>> Please see below.any help much appreciated.
>> 1) Extracting data out of a text field to assign a category for certain
>> configured words. e.g. If the text is "Google does it again with Android" 
>> and If 'Google' and 'Android' are the configured words, I want to b able
>> to assign the article to tags 'Google' and 'Android' and 'Technical' . Can
>> I do this with a custom filter during analysis? Similarly setting up
>> categories for each article based on keywords in the text.
>> 2) How about using SOLR as transactional datastore? Need to keep track of
>> rating for each document. Would 'ExternalFileField' be good choice for
>> this use-case?
>> 
>> Thanks in advance.
>> 
> 
> -- 
> View this message in context: 
> http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27949786.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: 
http://www.lucidimagination.com/search



Re: dynamic categorization & transactional data

2010-03-18 Thread caman

David,

Much appreciated. This gives me enough to work with. 
I missed one important point. Our data changes pretty frequently which mean
we may be running deltas every 5-10 minutes. in-memory should work
thanks





David Smiley @MITRE.org wrote:
> 
> You'll probably want to influence your relevancy on this popularity number
> that is changing often.  ExternalFileField looks like a possibility though
> I haven't used it.  Another would be using an in-memory cache which stores
> all popularity numbers for any data that has its popularity updated since
> the last index update (say since the previous night).  On second thought,
> it may need to be absolutely all of them but these are just #s so no big
> deal?  You could then customize a "ValueSource" subclass which gets data
> from this fast in-memory up to date source.  See FileFloatSource for an
> example that uses a file instead of an in-memory structure.
> 
> ~ David Smiley
> Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
> 
> 
> On Mar 18, 2010, at 2:44 PM, caman wrote:
> 
>> 2) Any input on 2 please? I need to store # of views and popularity with
>> each document and that can change pretty often. Recommended to use
>> database
>> or can this be updated to SOLr directly? My issue with DB is that with
>> every
>> SOLR search hit, will have to do DB hit to retrieve meta-data. 
>> 
>> Any input is appreciated please
> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27950036.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: dynamic categorization & transactional data

2010-03-18 Thread Smiley, David W.
You'll probably want to influence your relevancy on this popularity number that 
is changing often.  ExternalFileField looks like a possibility though I haven't 
used it.  Another would be using an in-memory cache which stores all popularity 
numbers for any data that has its popularity updated since the last index 
update (say since the previous night).  On second thought, it may need to be 
absolutely all of them but these are just #s so no big deal?  You could then 
customize a "ValueSource" subclass which gets data from this fast in-memory up 
to date source.  See FileFloatSource for an example that uses a file instead of 
an in-memory structure.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/


On Mar 18, 2010, at 2:44 PM, caman wrote:

> 2) Any input on 2 please? I need to store # of views and popularity with
> each document and that can change pretty often. Recommended to use database
> or can this be updated to SOLr directly? My issue with DB is that with every
> SOLR search hit, will have to do DB hit to retrieve meta-data. 
> 
> Any input is appreciated please




Re: dynamic categorization & transactional data

2010-03-18 Thread caman

1) Took care of the first one by Transformer.
2) Any input on 2 please? I need to store # of views and popularity with
each document and that can change pretty often. Recommended to use database
or can this be updated to SOLr directly? My issue with DB is that with every
SOLR search hit, will have to do DB hit to retrieve meta-data. 

Any input id appreciated please

caman wrote:
> 
> Hello all,
> 
> Please see below.any help much appreciated.
> 1) Extracting data out of a text field to assign a category for certain
> configured words. e.g. If the text is "Google does it again with Android" 
> and If 'Google' and 'Android' are the configured words, I want to b able
> to assign the article to tags 'Google' and 'Android' and 'Technical' . Can
> I do this with a custom filter during analysis? Similarly setting up
> categories for each article based on keywords in the text.
> 2) How about using SOLR as transactional datastore? Need to keep track of
> rating for each document. Would 'ExternalFileField' be good choice for
> this use-case?
> 
> Thanks in advance.
> 

-- 
View this message in context: 
http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27949786.html
Sent from the Solr - User mailing list archive at Nabble.com.



dynamic categorization & transactional data

2010-03-04 Thread caman

Hello all,

Please see below.any help much appreciated.
1) Extracting data out of a text field to assign a category for certain
configured words. e.g. If the text is "Google does it again with Android" 
and If 'Google' and 'Android' are the configured words, I want to b able to
assign the article to tags 'Google' and 'Android' and 'Technical' . Can I do
this with a custom filter during analysis? Similarly setting up categories
for each article based on keywords in the text.
2) How about using SOLR as transactional datastore? Need to keep track of
rating for each document. Would 'ExternalFileField' be good choice for this
use-case?

Thanks in advance.
-- 
View this message in context: 
http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27790233.html
Sent from the Solr - User mailing list archive at Nabble.com.