Re: dynamic categorization & transactional data
@Grant Less than a minute. If we go with the meta-retrieval from the index, we will have to keep the index updated down to seconds. But that may not scale well. Probably a hybrid approach? I will look into classifier. thanks Grant Ingersoll-6 wrote: > > > On Mar 18, 2010, at 2:44 PM, caman wrote: > >> >> 1) Took care of the first one by Transformer. > > This is often also something done by a classifier that is trained to deal > with all the statistical variations in your text. Tools like Weka, > Mahout, OpenNLP, etc. can be applied here. > >> 2) Any input on 2 please? I need to store # of views and popularity with >> each document and that can change pretty often. Recommended to use >> database >> or can this be updated to SOLr directly? My issue with DB is that with >> every >> SOLR search hit, will have to do DB hit to retrieve meta-data. > > Define often, please. Less than a minute or more than a minute? > >> >> Any input is appreciated please >> >> caman wrote: >>> >>> Hello all, >>> >>> Please see below.any help much appreciated. >>> 1) Extracting data out of a text field to assign a category for certain >>> configured words. e.g. If the text is "Google does it again with >>> Android" >>> and If 'Google' and 'Android' are the configured words, I want to b able >>> to assign the article to tags 'Google' and 'Android' and 'Technical' . >>> Can >>> I do this with a custom filter during analysis? Similarly setting up >>> categories for each article based on keywords in the text. >>> 2) How about using SOLR as transactional datastore? Need to keep track >>> of >>> rating for each document. Would 'ExternalFileField' be good choice for >>> this use-case? >>> >>> Thanks in advance. >>> >> >> -- >> View this message in context: >> http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27949786.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem using Solr/Lucene: > http://www.lucidimagination.com/search > > > -- View this message in context: http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27970656.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic categorization & transactional data
On Mar 18, 2010, at 2:44 PM, caman wrote: > > 1) Took care of the first one by Transformer. This is often also something done by a classifier that is trained to deal with all the statistical variations in your text. Tools like Weka, Mahout, OpenNLP, etc. can be applied here. > 2) Any input on 2 please? I need to store # of views and popularity with > each document and that can change pretty often. Recommended to use database > or can this be updated to SOLr directly? My issue with DB is that with every > SOLR search hit, will have to do DB hit to retrieve meta-data. Define often, please. Less than a minute or more than a minute? > > Any input is appreciated please > > caman wrote: >> >> Hello all, >> >> Please see below.any help much appreciated. >> 1) Extracting data out of a text field to assign a category for certain >> configured words. e.g. If the text is "Google does it again with Android" >> and If 'Google' and 'Android' are the configured words, I want to b able >> to assign the article to tags 'Google' and 'Android' and 'Technical' . Can >> I do this with a custom filter during analysis? Similarly setting up >> categories for each article based on keywords in the text. >> 2) How about using SOLR as transactional datastore? Need to keep track of >> rating for each document. Would 'ExternalFileField' be good choice for >> this use-case? >> >> Thanks in advance. >> > > -- > View this message in context: > http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27949786.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: dynamic categorization & transactional data
David, Much appreciated. This gives me enough to work with. I missed one important point. Our data changes pretty frequently which mean we may be running deltas every 5-10 minutes. in-memory should work thanks David Smiley @MITRE.org wrote: > > You'll probably want to influence your relevancy on this popularity number > that is changing often. ExternalFileField looks like a possibility though > I haven't used it. Another would be using an in-memory cache which stores > all popularity numbers for any data that has its popularity updated since > the last index update (say since the previous night). On second thought, > it may need to be absolutely all of them but these are just #s so no big > deal? You could then customize a "ValueSource" subclass which gets data > from this fast in-memory up to date source. See FileFloatSource for an > example that uses a file instead of an in-memory structure. > > ~ David Smiley > Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ > > > On Mar 18, 2010, at 2:44 PM, caman wrote: > >> 2) Any input on 2 please? I need to store # of views and popularity with >> each document and that can change pretty often. Recommended to use >> database >> or can this be updated to SOLr directly? My issue with DB is that with >> every >> SOLR search hit, will have to do DB hit to retrieve meta-data. >> >> Any input is appreciated please > > > > -- View this message in context: http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27950036.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic categorization & transactional data
You'll probably want to influence your relevancy on this popularity number that is changing often. ExternalFileField looks like a possibility though I haven't used it. Another would be using an in-memory cache which stores all popularity numbers for any data that has its popularity updated since the last index update (say since the previous night). On second thought, it may need to be absolutely all of them but these are just #s so no big deal? You could then customize a "ValueSource" subclass which gets data from this fast in-memory up to date source. See FileFloatSource for an example that uses a file instead of an in-memory structure. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Mar 18, 2010, at 2:44 PM, caman wrote: > 2) Any input on 2 please? I need to store # of views and popularity with > each document and that can change pretty often. Recommended to use database > or can this be updated to SOLr directly? My issue with DB is that with every > SOLR search hit, will have to do DB hit to retrieve meta-data. > > Any input is appreciated please
Re: dynamic categorization & transactional data
1) Took care of the first one by Transformer. 2) Any input on 2 please? I need to store # of views and popularity with each document and that can change pretty often. Recommended to use database or can this be updated to SOLr directly? My issue with DB is that with every SOLR search hit, will have to do DB hit to retrieve meta-data. Any input id appreciated please caman wrote: > > Hello all, > > Please see below.any help much appreciated. > 1) Extracting data out of a text field to assign a category for certain > configured words. e.g. If the text is "Google does it again with Android" > and If 'Google' and 'Android' are the configured words, I want to b able > to assign the article to tags 'Google' and 'Android' and 'Technical' . Can > I do this with a custom filter during analysis? Similarly setting up > categories for each article based on keywords in the text. > 2) How about using SOLR as transactional datastore? Need to keep track of > rating for each document. Would 'ExternalFileField' be good choice for > this use-case? > > Thanks in advance. > -- View this message in context: http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27949786.html Sent from the Solr - User mailing list archive at Nabble.com.
dynamic categorization & transactional data
Hello all, Please see below.any help much appreciated. 1) Extracting data out of a text field to assign a category for certain configured words. e.g. If the text is "Google does it again with Android" and If 'Google' and 'Android' are the configured words, I want to b able to assign the article to tags 'Google' and 'Android' and 'Technical' . Can I do this with a custom filter during analysis? Similarly setting up categories for each article based on keywords in the text. 2) How about using SOLR as transactional datastore? Need to keep track of rating for each document. Would 'ExternalFileField' be good choice for this use-case? Thanks in advance. -- View this message in context: http://old.nabble.com/dynamic-categorization---transactional-data-tp27790233p27790233.html Sent from the Solr - User mailing list archive at Nabble.com.