Multi tokenizer
Hi all, I need to tokenize my field on whitespaces, html, punctuation, apostrophe but if I use HTMLStripStandardTokenizerFactory it strips only html but no apostrophes If I use PatternTokenizerFactory i don't know if i can create a pattern to tokenizer all of theese characters...(hmtl, apostrophes..)... I can filter with a pattern theese chars [^0-9A-Za-z] but with filter if I use as replacement it brokes my text could you help me to solve this problem? Bye
dismax difference between q=text:+toto AND q=toto
Hi, I would like to get the difference between q=text:+toto AND q=toto ? /select?fl=*qt=dismaxq=text:+toto : 4 docs find. lst name=params str name=fl*/str str name=qtext: toto/str str name=qtdismax/str /select?fl=*qt=dismaxq=toto : 5682 docs find. lst name=params str name=fl*/str str name=qtoto/str str name=qtdismax/str My schema just stored text field, I don't get this big difference. Thanks a lot for your time, -- View this message in context: http://www.nabble.com/dismax-difference-between--q%3Dtext%3A%2Btoto-AND-q%3Dtoto-tp20932303p20932303.html Sent from the Solr - User mailing list archive at Nabble.com.
Value based boosting - Design Help
We have a requirement for a keyword search in one of our projects and we are using Solr/Lucene for the same. We have the data, link_id, title, url and a collection of keywords associated to a link_id. Right now we have indexed link_id, title, url and keywords (multivalued field) in a single index. Also, in our requirement each keyword value has a weight associated to it and this weight is calculated based on certain factors like (if the keyword exist in title then it takes a specific weight etc…). This weight should drive the relevancy on the search result. For example, when a user enters a keyword called “Biology” and clicks search, we search the keywords field in the index. That document that contains the searched keyword with higher weight should come first. Eg: Document 1: LinkID = 100 Title = Biology Keywords = Biology, BioNews, Bio, Bio chemistry Document 2: LinkID = 102 Title = Nutrition Keywords = Biology, Nutrition, Dietics In the above example document 1 should come first because we will associate more weight to the keyword biology for link id 100 in document 1 We understand that this weight can be applied as a boost to a field. The problem is that in Solr/Lucene we cannot associate a different boost to different values of a same field. It would be vey helpful for us if you can provide your thoughts/inputs on how to achieve this requirement in Lucene: Do we have a way to associate a different boost to different values of a same field? Can we maintain the list of keywords associated to each link_id in a separate index, so that we can associate weight to each keyword value? If so, how do we relate the main index and the keyword index? -- View this message in context: http://www.nabble.com/Value-based--boosting---Design-Help-tp20934304p20934304.html Sent from the Solr - User mailing list archive at Nabble.com.
Setting Request Handler
Hi, I have a request handler in my solrconfig.xml : /spellCheckCompRH It utilizes the search component spellcheck. When I specify following query in browser, I get correct spelling suggestions from the file dictionary. http://localhost:8080/solr/spellCheckCompRH/?q=SolrDocsspellcheck.q=rel evancyspellcheck=truefl=title,scorespellcheck.dictionary=file Now I write a java program to achieve the same result: Code snippet . . server = new CommonsHttpSolrServer(http://localhost:8080/solr;); . . SolrQuery query = new SolrQuery(); query.setQuery(solr ); query.setFields(*,score); query.set(qt, spellCheckCompRH); query.set(spellcheck, true); query.set(SpellingParams.SPELLCHECK_DICT, file); query.set(SpellingParams.SPELLCHECK_Q , solt); . . QueryResponse rsp = server.query( query ); SolrDocumentList docs = rsp.getResults(); SpellCheckResponse srsp = rsp.getSpellCheckResponse(); I get documents for my query but I do not get any spelling suggestions. I think that the request handler is not getting set for the query correctly. Can someone please help. Best Regards, Mukta
Re: dismax difference between q=text:+toto AND q=toto
dismax doesn't support field selection in it's query syntax, only via the qf parameter. add debugQuery=true to see how the queries are being parsed, that'll reveal what is going on. Erik On Dec 10, 2008, at 5:07 AM, sunnyfr wrote: Hi, I would like to get the difference between q=text:+toto AND q=toto ? /select?fl=*qt=dismaxq=text:+toto : 4 docs find. lst name=params str name=fl*/str str name=qtext: toto/str str name=qtdismax/str /select?fl=*qt=dismaxq=toto : 5682 docs find. lst name=params str name=fl*/str str name=qtoto/str str name=qtdismax/str My schema just stored text field, I don't get this big difference. Thanks a lot for your time, -- View this message in context: http://www.nabble.com/dismax-difference-between--q%3Dtext%3A%2Btoto-AND-q%3Dtoto-tp20932303p20932303.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: full-import and empty ./core/data/index
On Wed, Dec 10, 2008 at 4:23 PM, Marc Sturlese [EMAIL PROTECTED]wrote: Is there any way to start solar having the index folder empty without having and error? What I would like to do is start with the empty folder, do a full import (wich would create the index from 0) and from there keep updating it with delta-import. At the moment I must have something in the index folder at the begining. Otherwise I get an error. You can delete the index folder (but keep the data folder) and Solr will create it at the start. There should be no errors. -- Regards, Shalin Shekhar Mangar.
Re: Value based boosting - Design Help
On Wed, Dec 10, 2008 at 5:54 PM, ayyanar [EMAIL PROTECTED]wrote: Also, in our requirement each keyword value has a weight associated to it and this weight is calculated based on certain factors like (if the keyword exist in title then it takes a specific weight etc…). This weight should drive the relevancy on the search result. For example, when a user enters a keyword called Biology and clicks search, we search the keywords field in the index. That document that contains the searched keyword with higher weight should come first. It would be vey helpful for us if you can provide your thoughts/inputs on how to achieve this requirement in Lucene: Do we have a way to associate a different boost to different values of a same field? So you are searching only on the keywords field and not the title field? You can search on both the title and the keywords field and provide different boosts to the title field. Why do you want to assign weights to keywords? If all keywords which are in title are supposed to be more relevant than all keywords only in keywords field then assigning a boost value to the title field is enough. Is there any other use-case? Can we maintain the list of keywords associated to each link_id in a separate index, so that we can associate weight to each keyword value? If so, how do we relate the main index and the keyword index? No joins like these are not possible in Lucene/Solr. Lucene has payloads which can be used for boosting a particular term but that functionality is not available in Solr. Look at BoostingTermQuery in Lucene on how to use it. http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/payloads/BoostingTermQuery.html -- Regards, Shalin Shekhar Mangar.
Re: Problems with SOLR-236 (field collapsing)
The first output is from the query component. You might just need to make the collapse component first and remove the query component completely. We perform geographic searching with localsolr first (if we need to), and then try to collapse those results (if collapse=true). If we don't have any results yet, that's the only time we use the standard query component. I'm making sure we set the builder.setNeedDocSet=false and then I modified the query component to only execute when builder.isNeedDocSet=true. In the field collapsing patch that I'm using, I've got code to remove a previous 'response' from the builder.rsp so we don't have duplicates. Now, if I could get field collapsing to work properly with a docSet/ docList from localsolr and also have faceting work, I'd be golden. Doug On Dec 9, 2008, at 9:37 PM, Stephen Weiss wrote: Hi Tracy, Well, I managed to get it working (I think) but the weird thing is, in the XML output it gives both recordsets (the filtered and unfiltered - filtered second). In the JSON (the one I actually use anyway, at least) I only get the filtered results (as expected). In my core's solrconfig.xml, I added: searchComponent name=collapse class=org.apache.solr.handler.component.CollapseComponent / (I'm not sure if it's supposed to go anywhere in particular but for me it's right before StandardRequestHandler) and then within StandardRequestHandler: requestHandler name=standard class=solr.StandardRequestHandler !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str -- /lst arr name=components strquery/str strfacet/str strmlt/str strhighlight/str strdebug/str strcollapse/str /arr /requestHandler Which is basically all the default values plus collapse. Not sure if this was needed for prior versions, I don't see it in any patch files (I just got a vague idea from looking at a comment from someone else who said it wasn't working for them). It would kinda be nice if someone working on the code might throw us a bone and say explicitly what the right options to put in the config file are (if there are even supposed to be any - for all I know, this is just a bandaid over a larger problem). I know it's not done yet though... just a pointer for this patch might be handy, it's really a useful feature if it works (I was kinda shocked this wasn't part of the standard distribution since it's something I had to do so often with mysql, kinda lucky I guess that it only came up now). Another issue I'm having now is the faceting doesn't seem to change - even if I set the collapse.facet option to after... I should really try before and see what happens. Of course, I just realized the integrity of my collapse field is not so great so I have to go back and redo the data :-) Best of luck. -- Steve On Dec 9, 2008, at 7:49 PM, Tracy Flynn (SOLR) wrote: Steve, I need this too. As my previous posting said, I adapted the 1.2 field collapsing back at the beginning of the year, so I'm somewhat familiar. I'll try and get a look this weekend. It's the earliest I''m likely to get spare cycles. I'll post any results. Tracy On Dec 9, 2008, at 4:18 PM, Stephen Weiss wrote: Hi, I'm trying to use field collapsing with our SOLR but I just can't seem to get it to do anything. I've downloaded a dist copy of solr 1.3 and applied Ivan de Prado's patch - reading through the source code, the patch definitely was applied successfully (all the changes are in the right places, I've checked every single one). I've run ant clean, ant compile, and ant dist to produce the war file in the dist/ folder, and then put the war file in place and restarted jetty. According to the logs, jetty is definitely loading the right war file. If I expand the war file and grep through the files, it would appear the collapsing code is there. However, when I add any sort of collapse parameters (I've tried any combination of collapse=true collapse.field=link_id collapse.threshold=1 collapse.type=normal collapse.info.doc=true), the result set is no different from normal query, and there is no collapse data returned in the XML. I'm not a java developer, this is my first time using ant period, and I'm just following basic directions I found on google. Here is the output of the compilation process: I really need this patch to work for a project... Can someone please tell me what I'm missing to get this to work? I can't really find any documentation beyond adding the collapse options to the query string, so it's hard to tell - is there an option in solrconfig.xml or in the core configuration that needs to be set? Am I going about this entirely the wrong way? Thanks for any advice, I
Re: How can i look for tom jerry
On Wed, Dec 10, 2008 at 5:12 PM, sunnyfr [EMAIL PROTECTED] wrote: When I look for this expression it does stop the search at the , taking that for a parameter i guess. You will need to URL encode the query parameter before you make the request. URLEncoder.encode(tom jerry, UTF-8); If you are using SolrJ, it will automatically take care of this. -- Regards, Shalin Shekhar Mangar.
Re: full-import and empty ./core/data/index
Thanks, it did work. Shalin Shekhar Mangar wrote: On Wed, Dec 10, 2008 at 4:23 PM, Marc Sturlese [EMAIL PROTECTED]wrote: Is there any way to start solar having the index folder empty without having and error? What I would like to do is start with the empty folder, do a full import (wich would create the index from 0) and from there keep updating it with delta-import. At the moment I must have something in the index folder at the begining. Otherwise I get an error. You can delete the index folder (but keep the data folder) and Solr will create it at the start. There should be no errors. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/full-import-and-empty-.-core-data-index-tp20932981p20933620.html Sent from the Solr - User mailing list archive at Nabble.com.
Can we extract contents from two Core folders
Hi All, Issue: Need to fetch the data available in different core folders. Scenario: We are storing the information on different core folders specific to website ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to any region get store in specific core folder. for e.g. for india specific information, CoreIndia folder is used. Now the requirement is that, we have to access the information stored in multiple cores that is CoreUSA and CoreUK folders simultaneously. Is it possible to do so and if what is the mechanism. Thanks in advance Payal -- View this message in context: http://www.nabble.com/Can-we-extract-contents-from-two-Core-folders-tp20933745p20933745.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can we extract contents from two Core folders
On Wed, Dec 10, 2008 at 5:19 PM, payalsharma [EMAIL PROTECTED] wrote: We are storing the information on different core folders specific to website ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to any region get store in specific core folder. for e.g. for india specific information, CoreIndia folder is used. Now the requirement is that, we have to access the information stored in multiple cores that is CoreUSA and CoreUK folders simultaneously. Is it possible to do so and if what is the mechanism. Make two queries :-) -- Regards, Shalin Shekhar Mangar.
Re: dismax difference between q=text:+toto AND q=toto
Thanks Erik, Have a good day, Erik Hatcher wrote: dismax doesn't support field selection in it's query syntax, only via the qf parameter. add debugQuery=true to see how the queries are being parsed, that'll reveal what is going on. Erik On Dec 10, 2008, at 5:07 AM, sunnyfr wrote: Hi, I would like to get the difference between q=text:+toto AND q=toto ? /select?fl=*qt=dismaxq=text:+toto : 4 docs find. lst name=params str name=fl*/str str name=qtext: toto/str str name=qtdismax/str /select?fl=*qt=dismaxq=toto : 5682 docs find. lst name=params str name=fl*/str str name=qtoto/str str name=qtdismax/str My schema just stored text field, I don't get this big difference. Thanks a lot for your time, -- View this message in context: http://www.nabble.com/dismax-difference-between--q%3Dtext%3A%2Btoto-AND-q%3Dtoto-tp20932303p20932303.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/dismax-difference-between--q%3Dtext%3A%2Btoto-AND-q%3Dtoto-tp20932303p20932585.html Sent from the Solr - User mailing list archive at Nabble.com.
Error, when i update the rich text documents such as .doc, .ppt files.
Hi all, I want to index the rich text documents like .doc, .xls, .ppt files. I had done the patch for updating the rich documents by followed the instructions in this below url. http://wiki.apache.org/solr/UpdateRichDocuments When i indexing the doc file, im getting this following error in the browser. HTTP ERROR: 500 lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:247) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:228) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.RichDocumentRequestHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:319) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:340) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:235) ... 21 more Caused by: java.lang.ClassNotFoundException: solr.RichDocumentRequestHandler at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:375) at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:337) at java.lang.ClassLoader.loadClassInternal(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:257) ... 24 more RequestURI=/solr/update/rich Better solutions will be appreciate more.. Thanks a lot Prabhu.K -- View this message in context: http://www.nabble.com/Error%2C-when-i-update-the-rich-text-documents-such-as-.doc%2C-.ppt-files.-tp20934026p20934026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can we extract contents from two Core folders
payalsharma wrote: Hi All, Issue: Need to fetch the data available in different core folders. Scenario: We are storing the information on different core folders specific to website ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to any region get store in specific core folder. for e.g. for india specific information, CoreIndia folder is used. Now the requirement is that, we have to access the information stored in multiple cores that is CoreUSA and CoreUK folders simultaneously. Is it possible to do so and if what is the mechanism. Thanks in advance Payal Try distributed search over the cores.
Re: Can we extract contents from two Core folders
Hi, Will you please explain what exactly you mean by : Distributed search over the cores. Please provide some context around this. Thanks markrmiller wrote: payalsharma wrote: Hi All, Issue: Need to fetch the data available in different core folders. Scenario: We are storing the information on different core folders specific to website ids (such as CoreUSA,CoreUK,CoreIndia ..). Thus information specific to any region get store in specific core folder. for e.g. for india specific information, CoreIndia folder is used. Now the requirement is that, we have to access the information stored in multiple cores that is CoreUSA and CoreUK folders simultaneously. Is it possible to do so and if what is the mechanism. Thanks in advance Payal Try distributed search over the cores. -- View this message in context: http://www.nabble.com/Can-we-extract-contents-from-two-Core-folders-tp20933745p20937150.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: snappuller issue with multicore
I notices that you are using the same rysncd port for both core. Do you have a scripts.conf for each core? Bill On Tue, Dec 9, 2008 at 11:40 PM, Kashyap, Raghu [EMAIL PROTECTED]wrote: Hi, We are seeing a strange behavior with snappuller We have 2 cores Hotel Location Here are the steps we perform 1. index hotel on master server 2. index location on master server 3. execute snapshooter for hotel core on master server 4. execute snapshooter for location core on master server 5. execute snappuller from slave machines (once for hotel core once for location core) However, the hotel core snapshot is pulled into the location data dir. Here are the commands that we execute in our ruby scripts system('solr/multicore/hotel/bin/snappuller -P 18983 -S /solr/data -M masterServer -D /solr/data/hotel ) system(solr/multicore/location/bin/snappuller -P 18983 -M masterServer -S /solr/data -D /solr/data/location) Thanks, Raghu
RE: snappuller issue with multicore
Bill, Yes I do have scripts.conf for each core. However, all the options needed for snappuller is specified in the command line itself (-D -S etc...) -Raghu -Original Message- From: Bill Au [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:17 AM To: solr-user@lucene.apache.org Subject: Re: snappuller issue with multicore I notices that you are using the same rysncd port for both core. Do you have a scripts.conf for each core? Bill On Tue, Dec 9, 2008 at 11:40 PM, Kashyap, Raghu [EMAIL PROTECTED]wrote: Hi, We are seeing a strange behavior with snappuller We have 2 cores Hotel Location Here are the steps we perform 1. index hotel on master server 2. index location on master server 3. execute snapshooter for hotel core on master server 4. execute snapshooter for location core on master server 5. execute snappuller from slave machines (once for hotel core once for location core) However, the hotel core snapshot is pulled into the location data dir. Here are the commands that we execute in our ruby scripts system('solr/multicore/hotel/bin/snappuller -P 18983 -S /solr/data -M masterServer -D /solr/data/hotel ) system(solr/multicore/location/bin/snappuller -P 18983 -M masterServer -S /solr/data -D /solr/data/location) Thanks, Raghu
RE: Can we extract contents from two Core folders
-Original Message- From: payalsharma [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:11 AM To: solr-user@lucene.apache.org Subject: Re: Can we extract contents from two Core folders Hi, Will you please explain what exactly you mean by : Distributed search over the cores. Please provide some context around this. Thanks http://wiki.apache.org/solr/DistributedSearch
Re: Look for three words, just two are weighted ?
On Dec 10, 2008, at 9:58 AM, sunnyfr wrote: Second question, if I want to weight status_official:true^2 should I do it this way ??? for weighting the true one? thanks /select?fl=*qt=dismaxq=+tom+jerry + cartoontv qf=status_official^2.5+owner_login^10+title^3debugQuery=true Use bq (boosting query) for boosting by status bq=status_official:true^2 and remove it from the qf parameter. That should do the trick. Erik
Re: Look for three words, just two are weighted ?
Yes but when I check the debug, there is no weight about it ??? /select?fl=*qt=dismaxq=+tom+jerry+cartoontvbq=status_official:true^12qf=owner_login^10+title^3debugQuery=true and its like if it doesnt weight as well my word cartoontv ?? ok maybe the doc which contain this three word is not enough weighted ? str name=rawquerystring tom jerry cartoontv/str str name=querystring tom jerry cartoontv/str − str name=parsedquery +((DisjunctionMaxQuery((owner_login:tom^10.0 | title:tom^3.0)~0.01) DisjunctionMaxQuery((owner_login:jerry^10.0 | title:jerri^3.0)~0.01) DisjunctionMaxQuery((owner_login:cartoontv^10.0 | title:cartoontv^3.0)~0.01))~2) DisjunctionMaxQuery((text:tom jerri cartoontv~100^0.2)~0.01) status_official:true^12.0 /str − str name=parsedquery_toString +(((owner_login:tom^10.0 | title:tom^3.0)~0.01 (owner_login:jerry^10.0 | title:jerri^3.0)~0.01 (owner_login:cartoontv^10.0 | title:cartoontv^3.0)~0.01)~2) (text:tom jerri cartoontv~100^0.2)~0.01 status_official:T^12.0 /str − lst name=explain − str name=559170 0.59949005 = (MATCH) sum of: 0.59949005 = (MATCH) product of: 0.899235 = (MATCH) sum of: 0.37824848 = (MATCH) max plus 0.01 times others of: 0.37824848 = (MATCH) weight(title:tom^3.0 in 918085), product of: 0.077876315 = queryWeight(title:tom^3.0), product of: 3.0 = boost 7.771266 = idf(docFreq=8887, numDocs=7753783) 0.003340353 = queryNorm 4.8570414 = (MATCH) fieldWeight(title:tom in 918085), product of: 1.0 = tf(termFreq(title:tom)=1) 7.771266 = idf(docFreq=8887, numDocs=7753783) 0.625 = fieldNorm(field=title, doc=918085) 0.52098656 = (MATCH) max plus 0.01 times others of: 0.52098656 = (MATCH) weight(title:jerri^3.0 in 918085), product of: 0.09139661 = queryWeight(title:jerri^3.0), product of: 3.0 = boost 9.120454 = idf(docFreq=2305, numDocs=7753783) 0.003340353 = queryNorm 5.7002835 = (MATCH) fieldWeight(title:jerri in 918085), product of: 1.0 = tf(termFreq(title:jerri)=1) 9.120454 = idf(docFreq=2305, numDocs=7753783) 0.625 = fieldNorm(field=title, doc=918085) 0.667 = coord(2/3) /str − Erik Hatcher wrote: On Dec 10, 2008, at 9:58 AM, sunnyfr wrote: Second question, if I want to weight status_official:true^2 should I do it this way ??? for weighting the true one? thanks /select?fl=*qt=dismaxq=+tom+jerry + cartoontv qf=status_official^2.5+owner_login^10+title^3debugQuery=true Use bq (boosting query) for boosting by status bq=status_official:true^2 and remove it from the qf parameter. That should do the trick. Erik -- View this message in context: http://www.nabble.com/Look-for-three-words%2C-just-two-are-weighted---tp20936945p20937676.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: snappuller issue with multicore
Try using the -d option with the snappuller so you can specify the path to the directory holding index data on local machine. Doug On Dec 10, 2008, at 10:20 AM, Kashyap, Raghu wrote: Bill, Yes I do have scripts.conf for each core. However, all the options needed for snappuller is specified in the command line itself (-D -S etc...) -Raghu -Original Message- From: Bill Au [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:17 AM To: solr-user@lucene.apache.org Subject: Re: snappuller issue with multicore I notices that you are using the same rysncd port for both core. Do you have a scripts.conf for each core? Bill On Tue, Dec 9, 2008 at 11:40 PM, Kashyap, Raghu [EMAIL PROTECTED]wrote: Hi, We are seeing a strange behavior with snappuller We have 2 cores Hotel Location Here are the steps we perform 1. index hotel on master server 2. index location on master server 3. execute snapshooter for hotel core on master server 4. execute snapshooter for location core on master server 5. execute snappuller from slave machines (once for hotel core once for location core) However, the hotel core snapshot is pulled into the location data dir. Here are the commands that we execute in our ruby scripts system('solr/multicore/hotel/bin/snappuller -P 18983 -S /solr/data -M masterServer -D /solr/data/hotel ) system(solr/multicore/location/bin/snappuller -P 18983 -M masterServer -S /solr/data -D /solr/data/location) Thanks, Raghu
Re: Setting Request Handler
Inline below... Also, though, you should note that the /spellCheckCompRH that is packaged with the example is not necessarily the best way to actually use the SpellCheckComponent. It is intended to be used as a component in whatever your MAIN Request Handler is, it merely shows the how of hooking it in. On Dec 10, 2008, at 7:51 AM, Deshpande, Mukta wrote: Hi, I have a request handler in my solrconfig.xml : /spellCheckCompRH It utilizes the search component spellcheck. When I specify following query in browser, I get correct spelling suggestions from the file dictionary. http://localhost:8080/solr/spellCheckCompRH/?q=SolrDocsspellcheck.q=rel evancyspellcheck=truefl=title,scorespellcheck.dictionary=file Now I write a java program to achieve the same result: Code snippet . . server = new CommonsHttpSolrServer(http://localhost:8080/solr;); . . SolrQuery query = new SolrQuery(); query.setQuery(solr ); query.setFields(*,score); query.set(qt, spellCheckCompRH); Is spellCheckCompRH a variable? Does it equal /spellCheckCompRH? query.set(spellcheck, true); query.set(SpellingParams.SPELLCHECK_DICT, file); query.set(SpellingParams.SPELLCHECK_Q , solt); . . QueryResponse rsp = server.query( query ); SolrDocumentList docs = rsp.getResults(); SpellCheckResponse srsp = rsp.getSpellCheckResponse(); I get documents for my query but I do not get any spelling suggestions. I think that the request handler is not getting set for the query correctly. Can someone please help. Best Regards, Mukta -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
RE: snappuller issue with multicore
Ok I think the problem is what Bill mentioned earlier. The rsync port was the same for both the cores and due to which it was copying the same snapshot for both the cores Thanks for all the help -Raghu -Original Message- From: Kashyap, Raghu [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 10:27 AM To: solr-user@lucene.apache.org Subject: RE: snappuller issue with multicore Doug, That doesn't help -Raghu -Original Message- From: Doug Steigerwald [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:35 AM To: solr-user@lucene.apache.org Subject: Re: snappuller issue with multicore Try using the -d option with the snappuller so you can specify the path to the directory holding index data on local machine. Doug On Dec 10, 2008, at 10:20 AM, Kashyap, Raghu wrote: Bill, Yes I do have scripts.conf for each core. However, all the options needed for snappuller is specified in the command line itself (-D -S etc...) -Raghu -Original Message- From: Bill Au [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:17 AM To: solr-user@lucene.apache.org Subject: Re: snappuller issue with multicore I notices that you are using the same rysncd port for both core. Do you have a scripts.conf for each core? Bill On Tue, Dec 9, 2008 at 11:40 PM, Kashyap, Raghu [EMAIL PROTECTED]wrote: Hi, We are seeing a strange behavior with snappuller We have 2 cores Hotel Location Here are the steps we perform 1. index hotel on master server 2. index location on master server 3. execute snapshooter for hotel core on master server 4. execute snapshooter for location core on master server 5. execute snappuller from slave machines (once for hotel core once for location core) However, the hotel core snapshot is pulled into the location data dir. Here are the commands that we execute in our ruby scripts system('solr/multicore/hotel/bin/snappuller -P 18983 -S /solr/data -M masterServer -D /solr/data/hotel ) system(solr/multicore/location/bin/snappuller -P 18983 -M masterServer -S /solr/data -D /solr/data/location) Thanks, Raghu
Re: Error, when i update the rich text documents such as .doc, .ppt files.
Hi, There is a ClassNotFound exception in there. Make sure you rebuild the war, completely remove the old one, and properly deploy the new one. Peek into the war and look for the class that the error below is missing to make sure the class is really there. Get the latest code for http://wiki.apache.org/solr/ExtractingRequestHandler and try that. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: RaghavPrabhu [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, December 10, 2008 7:09:17 AM Subject: Error, when i update the rich text documents such as .doc, .ppt files. Hi all, I want to index the rich text documents like .doc, .xls, .ppt files. I had done the patch for updating the rich documents by followed the instructions in this below url. http://wiki.apache.org/solr/UpdateRichDocuments When i indexing the doc file, im getting this following error in the browser. HTTP ERROR: 500 lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:247) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:228) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.RichDocumentRequestHandler' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:319) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:340) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:235) ... 21 more Caused by: java.lang.ClassNotFoundException: solr.RichDocumentRequestHandler at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:375) at org.mortbay.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:337) at java.lang.ClassLoader.loadClassInternal(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:257) ... 24 more RequestURI=/solr/update/rich Better solutions will be appreciate more.. Thanks a lot Prabhu.K -- View this message in context: http://www.nabble.com/Error%2C-when-i-update-the-rich-text-documents-such-as-.doc%2C-.ppt-files.-tp20934026p20934026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Error, when i update the rich text documents such as .doc, .ppt files.
Hi Raghav, Recently, integration with Tika was completed for SOLR-284 and it is now committed on the trunk (but does not use the old RichDocumentHandler approach). See http://wiki.apache.org/solr/ExtractingRequestHandler for how to use and configure. Otherwise, it looks to me like the jar file for the RichDocHandler is not in your WAR or in the Solr Home lib directory. HTH, Grant On Dec 10, 2008, at 7:09 AM, RaghavPrabhu wrote: Hi all, I want to index the rich text documents like .doc, .xls, .ppt files. I had done the patch for updating the rich documents by followed the instructions in this below url. http://wiki.apache.org/solr/UpdateRichDocuments When i indexing the doc file, im getting this following error in the browser. HTTP ERROR: 500 lazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers $LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:247) at org.apache.solr.core.RequestHandlers $LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:228) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java: 365) at org .mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java: 181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java: 712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 405) at org .mortbay .jetty .handler .ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org .mortbay .jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java: 139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 502) at org.mortbay.jetty.HttpConnection $RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector $Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool $PoolThread.run(BoundedThreadPool.java:442) Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.RichDocumentRequestHandler' at org .apache .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:273) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:319) at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java: 340) at org.apache.solr.core.RequestHandlers $LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:235) ... 21 more Caused by: java.lang.ClassNotFoundException: solr.RichDocumentRequestHandler at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at org .mortbay .jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:375) at org .mortbay .jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:337) at java.lang.ClassLoader.loadClassInternal(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at org .apache .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:257) ... 24 more RequestURI=/solr/update/rich Better solutions will be appreciate more.. Thanks a lot Prabhu.K -- View this message in context: http://www.nabble.com/Error%2C-when-i-update-the-rich-text-documents-such-as-.doc%2C-.ppt-files.-tp20934026p20934026.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Dates in Solr
Hi All, I'm curious about what people have done with dates. We Require: 1. multiple granularities to query and facet on: by year, by year/month, by year/month/day 2. sortability: sort/order by date 3. time typically isn't important to us 4. some of these items don't have a day or month associated with them 5. possibly consider seasonal like publications with FALL as a date This is the bulk of what I found documented in the mailing list and wiki: * http://www.nabble.com/dates---times-td10417533.html#a10421952 * http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing * http://wiki.apache.org/solr/SimpleFacetParameters#head-068dc96b0dac1cfc7264fe85528d7df5bf391acd o http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html o http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html * any queries on those fields (typically range queries) should use either the Complete ISO 8601 Date syntax that field supports, or the DateMath? http://trac.library.utoronto.ca/projectKO/wiki/DateMath Syntax to get relative dates This is great and valuable. I would like to be able to use the existing functionality but I'm not sure how I can use the DateField to specify a year without a time (what I guess would actually be a range of time) for a document. Any ideas? Tricia
Solr Newbie question
Hi - I am a new user of Solr tool and came across the introductory tutorial here - http://lucene.apache.org/solr/tutorial.html . I am planning to use Solr in one of my projects . I see that the tutorial mentions about a REST api / interface to add documents and to query the same. I would like to create the indices locally , where the web server (or pool of servers ) will have access to the database directly , but use the query REST api to query for the results. I am curious how this could be possible without taking the http rest api submission to add to indices. (For the sake of simplicity - we can assume it would be just one node to store the index but multiple readers / query machines that could potentially connect to the solr web service and retrieve the query results. Also the index might be locally present in the same machine as that of the Solr host or at least accessible through NFS etc. ) Thanks for helping out to some starting pointers regarding the same.
Re: Dates in Solr
Tricia, I think you might have missed the key nugget at the bottom of http://wiki.apache.org/jakarta-lucene/DateRangeQueries Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tricia Williams [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, December 10, 2008 12:12:11 PM Subject: Dates in Solr Hi All, I'm curious about what people have done with dates. We Require: 1. multiple granularities to query and facet on: by year, by year/month, by year/month/day 2. sortability: sort/order by date 3. time typically isn't important to us 4. some of these items don't have a day or month associated with them 5. possibly consider seasonal like publications with FALL as a date This is the bulk of what I found documented in the mailing list and wiki: * http://www.nabble.com/dates---times-td10417533.html#a10421952 * http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing * http://wiki.apache.org/solr/SimpleFacetParameters#head-068dc96b0dac1cfc7264fe85528d7df5bf391acd o http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html o http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html * any queries on those fields (typically range queries) should use either the Complete ISO 8601 Date syntax that field supports, or the DateMath? Syntax to get relative dates This is great and valuable. I would like to be able to use the existing functionality but I'm not sure how I can use the DateField to specify a year without a time (what I guess would actually be a range of time) for a document. Any ideas? Tricia
Re: Limitations of Distributed Search ....
Hi, I have not worked with a 50 node Solr cluster, but I've worked with pure Lucene clusters of that size, very high query and data volumes. I don't imagine a dist search involving 50 nodes will be a problem for Solr. As for handling query slave failures, I'm sure you'll want to involve a LB that can detect those, and have multiple replicas of each query node behind it for fail-over. As for the manageability, I think you'll find that management is really mostly on you - Solr doesn't provide tools for cluster / shard management. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: souravm [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Sunday, December 7, 2008 12:40:34 AM Subject: Limitations of Distributed Search Hi, We are planning to use Solr for processing large volume of application log files (around ~ 10 Billions documents of size 5-6 TB). One of the approach we are considering for the same is to use Distributed Search extensively. What we have in mind is distributing the log files in multiple boxes monthly or weekly basis - where at the weekly basis itself the volume can go to the level of 200 M of documents. And a search query can spread across all weeks (e.g. number of a given txn for 1st 6 months of a year) However, what we are not sure how well the distributed search would scale when we may use around 50-60 boxes to distribute indexed documents on weekly basis. The specific questions I have in mind are - a) How would be the impact on the performance when a query spreads over 50 boxes b) Is there any hard limit on the number of slaves which can be contacted from the master server? c) How much load will this type of approach create on master server for merging data, keeping the track whether a slave is down or not d) Any other manageability issues with so many slaves If anyone of you have deployed Solr in such a environment it would be great if you can share your experience on the same. Thanks in advance. Regards, Sourav CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: solr performance
For a similar idea, check: https://issues.apache.org/jira/browse/SOLR-906 This opens a single stream and writes all documents to that. It could easily be extended to have multiple threads draining the same Queue On Dec 9, 2008, at 4:02 AM, Noble Paul നോബിള് नोब्ळ् wrote: I guess this is the best idea . Let us have a new BatchHttpSolrServer which can help achieve this --Noble On Thu, Dec 4, 2008 at 7:14 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Thu, Dec 4, 2008 at 8:39 AM, Mark Miller [EMAIL PROTECTED] wrote: Kick off some indexing more than once - eg, post a folder of docs, and while thats working, post another. I've been thinking about a multi threaded UpdateProcessor as well - that could be interesting. Not sure how that would work (unless you didn't want responses), but I've thought about it from the SolrJ side - something you could quickly add documents to and it would manage a number of threads under the covers to maximize throughput. Not sure what would be the best for error handling though - perhaps just polling (allow user to ask for failed or successful operations). -Yonik -- --Noble Paul
Re: Dates in Solr
Hi Otis, Absolutely, I missed that nugget. I didn't think of using prefix filters/queries. This works really well with how we had already stored dates in a MMDD string. Thanks for pointing me in the right direction. Tricia Otis Gospodnetic wrote: Tricia, I think you might have missed the key nugget at the bottom of http://wiki.apache.org/jakarta-lucene/DateRangeQueries Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tricia Williams [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, December 10, 2008 12:12:11 PM Subject: Dates in Solr Hi All, I'm curious about what people have done with dates. We Require: 1. multiple granularities to query and facet on: by year, by year/month, by year/month/day 2. sortability: sort/order by date 3. time typically isn't important to us 4. some of these items don't have a day or month associated with them 5. possibly consider seasonal like publications with FALL as a date This is the bulk of what I found documented in the mailing list and wiki: * http://www.nabble.com/dates---times-td10417533.html#a10421952 * http://wiki.apache.org/jakarta-lucene/LargeScaleDateRangeProcessing * http://wiki.apache.org/solr/SimpleFacetParameters#head-068dc96b0dac1cfc7264fe85528d7df5bf391acd o http://lucene.apache.org/solr/api/org/apache/solr/util/DateMathParser.html o http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html * any queries on those fields (typically range queries) should use either the Complete ISO 8601 Date syntax that field supports, or the DateMath? Syntax to get relative dates This is great and valuable. I would like to be able to use the existing functionality but I'm not sure how I can use the DateField to specify a year without a time (what I guess would actually be a range of time) for a document. Any ideas? Tricia
Multi Core - Max Core Count Recommendation
I'm trying to see if anyone has any recommendations on the maximum number of cores that should be used within Solr. Is there significant overhead to each core? Should it be 10 or less, or is 100 or 1,000 cores acceptable. Thanks, Ryan
Re: Multi Core - Max Core Count Recommendation
it depends! yes there is overhead to each core -- how much it matters will depend entirely on your setup and typical usage pattern. sorry this is not a particularly useful answer. I think the choice of how many cores will come down to your domain logic needs more then hardware. If you are able to put things into a single index and get the performance you need, it will just be easier to deal with. ryan On Dec 10, 2008, at 3:35 PM, Ryan Peterson wrote: I'm trying to see if anyone has any recommendations on the maximum number of cores that should be used within Solr. Is there significant overhead to each core? Should it be 10 or less, or is 100 or 1,000 cores acceptable. Thanks, Ryan
Re: multiValued multiValued fields
: I want to index a field with an array of arrays, is that possible in Solr? Not out of the box ... you can implement custom FieldTypes that store any data you want in using a byte[] but you'd still need to do some tricks with your FieldType to get the ResponsWriter to write it out in a meaningful way. the simplest appraoch would be to just encode the multiple values into a String in some way (comma seperated, or something) -Hoss
Re: Multi Core - Max Core Count Recommendation
We are considering a migration to SOLR from a home grown Lucene solution. Currently we have 27,000 seperate lucene indexes that are separated based on business logic. Collectively the indexes are about 1.5 Terrabytes in size. We have some very small indexes and some that are quite large (up to 15GB). My hesitation of grouping all this data across say 4 SOLR instances is that each individual idex will still be about 400GB in size. How bit is too big for a singel Lucene index? Each SOLR instance will be on a dual/dual core xeon box with 6 SAS 15k drives in Raid 5 config and 16GB of RAM. If a 400GB instance is too much, I figured I could reduce the size of each individual index further by using multiple CORES, but again how many would depend on what size index is too big. Any suggestions would be greatly appreciated, thank you for your time. -Ryan From: Ryan McKinley [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, December 10, 2008 1:26:07 PM Subject: Re: Multi Core - Max Core Count Recommendation it depends! yes there is overhead to each core -- how much it matters will depend entirely on your setup and typical usage pattern. sorry this is not a particularly useful answer. I think the choice of how many cores will come down to your domain logic needs more then hardware. If you are able to put things into a single index and get the performance you need, it will just be easier to deal with. ryan On Dec 10, 2008, at 3:35 PM, Ryan Peterson wrote: I'm trying to see if anyone has any recommendations on the maximum number of cores that should be used within Solr. Is there significant overhead to each core? Should it be 10 or less, or is 100 or 1,000 cores acceptable. Thanks, Ryan
RE: Dealing with field values as key/value pairs
: This is really cool. U... How does it integrate with the Data Import : Handler? my DIH knowledge is extremely limited, but i'm guessing approach #1 is trivial (there is an easy way to concat DB values to build up solr field values right?); approach #2 would probably be possible using multiple root entities (assuming multiple root entites means what i think it means) : I've taken two approaches in the past... : : 1) encode the id and the label in the field value; facet on it; require : clients to know how to decode. This works really well for simple things : where the the id=label mappings don't ever change, and are easy to encode : (ie 01234:Chris Hostetter). This is a horrible approach when id=label : mappings do change with any frequency. : : 2) have a seperate type of metadata document, one per thing that you are : faceting on containing fields for id and the label (and probably a doc_type : field so you can tell it apart from your main docs) then once you've done : your main query and gotten the results back facetied on id, you can query : for those ids to get the corrisponding labels. this works realy well if the : labels ever change (just reindex the corrisponding metadata document) and : has the added bonus that you can store additional metadata in each of those : docs, and in many use cases for presenting an initial browse interface, : you can sometimes get away with a cheap search for all metadata docs (or all : metadata docs meeting a certain : criteria) instead of an expensive facet query across all of your main : documents. -Hoss
Sum of Fields and Record Count
Hi, I am a new solr user. I have an application that I would like to show the results but one result may be the part of larger set of results. So for example result #1 might also have 10 other results that are part of the same data set. Hopefully this makes sense. What I would like to find out is if there is a way within Solr to show the result that matched with the query, and then to also show that this result is part of a collection of 10 items. I have thought about doing it using some sort of external process that runs, and with doing multiple queries, so get the list of items and then query against each item. But those don't seem elegant. So I would like to find out if there is a way to do it within Solr that is a little more elegant, and hopefully without having to write additional code. Thank you in advance for the help. -John
SolrConfig.xml Replication
I am curious as to whether there is a solution to be able to replicate solrconfig.xml with the 1.4 replication. The obvious problem is that the master would replicate the solrconfig turning all slaves into masters with its config. I have also tried on a whim to configure the master and slave on the master so that the slave points to the same server but that seems to break the replication completely. Please let me know if anybody has any ideas -Jeff
Re: Sum of Fields and Record Count
Hi John, What is your process for determining that #1 is part of the other result set? My gut says this is a faceting problem, i.e. #1 has a field contain its category that is also shared by the 10 other results, and that all you need to do is facet on the category field. The other thing that comes to mind is More Like This: http://wiki.apache.org/solr/MoreLikeThis -Grant On Dec 10, 2008, at 6:16 PM, John Martyniak wrote: Hi, I am a new solr user. I have an application that I would like to show the results but one result may be the part of larger set of results. So for example result #1 might also have 10 other results that are part of the same data set. Hopefully this makes sense. What I would like to find out is if there is a way within Solr to show the result that matched with the query, and then to also show that this result is part of a collection of 10 items. I have thought about doing it using some sort of external process that runs, and with doing multiple queries, so get the list of items and then query against each item. But those don't seem elegant. So I would like to find out if there is a way to do it within Solr that is a little more elegant, and hopefully without having to write additional code. Thank you in advance for the help. -John -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Sum of Fields and Record Count
Grant, Basically I have created a text field that has the grouping value. All of the records would have the same value in this text field. This is accomplished with some pre-processing. When I capture the data, but before it is submitted into the index. -John On Dec 10, 2008, at 8:46 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: Hi John, What is your process for determining that #1 is part of the other result set? My gut says this is a faceting problem, i.e. #1 has a field contain its category that is also shared by the 10 other results, and that all you need to do is facet on the category field. The other thing that comes to mind is More Like This: http://wiki.apache.org/solr/MoreLikeThis -Grant On Dec 10, 2008, at 6:16 PM, John Martyniak wrote: Hi, I am a new solr user. I have an application that I would like to show the results but one result may be the part of larger set of results. So for example result #1 might also have 10 other results that are part of the same data set. Hopefully this makes sense. What I would like to find out is if there is a way within Solr to show the result that matched with the query, and then to also show that this result is part of a collection of 10 items. I have thought about doing it using some sort of external process that runs, and with doing multiple queries, so get the list of items and then query against each item. But those don't seem elegant. So I would like to find out if there is a way to do it within Solr that is a little more elegant, and hopefully without having to write additional code. Thank you in advance for the help. -John -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Sum of Fields and Record Count
Grant, For the more like this that would show the grouped results, once you have clicked on the item, so basically making another query, would it show a count of the more like this results? Something like cxxc and a collection 10 other items. -John On Dec 10, 2008, at 8:46 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: Hi John, What is your process for determining that #1 is part of the other result set? My gut says this is a faceting problem, i.e. #1 has a field contain its category that is also shared by the 10 other results, and that all you need to do is facet on the category field. The other thing that comes to mind is More Like This: http://wiki.apache.org/solr/MoreLikeThis -Grant On Dec 10, 2008, at 6:16 PM, John Martyniak wrote: Hi, I am a new solr user. I have an application that I would like to show the results but one result may be the part of larger set of results. So for example result #1 might also have 10 other results that are part of the same data set. Hopefully this makes sense. What I would like to find out is if there is a way within Solr to show the result that matched with the query, and then to also show that this result is part of a collection of 10 items. I have thought about doing it using some sort of external process that runs, and with doing multiple queries, so get the list of items and then query against each item. But those don't seem elegant. So I would like to find out if there is a way to do it within Solr that is a little more elegant, and hopefully without having to write additional code. Thank you in advance for the help. -John -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Ordinal Field value and exact value for date.
Hi All, I am trying to use ord() function query ord() on created_date. I am concrened with the warning of ord behaviour as it uses actual entry creation in indices instead of created_date value. Does all entries created initially with different created_date will have same or nearly ordinal value? If yes then how is the age calculation for the document works? Does creation of index should have knowledge of creation date of document while adding into indices? Thanks amit
RE: Multi Core - Max Core Count Recommendation
1) Our limit is: is how big a file do we want to copy around? We switched to multiple indexes because of the logistics of replicating/backing up giant Lucene index files. 2) Searching takes a little memory, sorting takes a lot of memory, and faceting eats like a black hole. There is an unwritten wiki page of practical experiences. -Original Message- From: Ryan Peterson [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 2:20 PM To: solr-user@lucene.apache.org Subject: Re: Multi Core - Max Core Count Recommendation We are considering a migration to SOLR from a home grown Lucene solution. Currently we have 27,000 seperate lucene indexes that are separated based on business logic. Collectively the indexes are about 1.5 Terrabytes in size. We have some very small indexes and some that are quite large (up to 15GB). My hesitation of grouping all this data across say 4 SOLR instances is that each individual idex will still be about 400GB in size. How bit is too big for a singel Lucene index? Each SOLR instance will be on a dual/dual core xeon box with 6 SAS 15k drives in Raid 5 config and 16GB of RAM. If a 400GB instance is too much, I figured I could reduce the size of each individual index further by using multiple CORES, but again how many would depend on what size index is too big. Any suggestions would be greatly appreciated, thank you for your time. -Ryan From: Ryan McKinley [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, December 10, 2008 1:26:07 PM Subject: Re: Multi Core - Max Core Count Recommendation it depends! yes there is overhead to each core -- how much it matters will depend entirely on your setup and typical usage pattern. sorry this is not a particularly useful answer. I think the choice of how many cores will come down to your domain logic needs more then hardware. If you are able to put things into a single index and get the performance you need, it will just be easier to deal with. ryan On Dec 10, 2008, at 3:35 PM, Ryan Peterson wrote: I'm trying to see if anyone has any recommendations on the maximum number of cores that should be used within Solr. Is there significant overhead to each core? Should it be 10 or less, or is 100 or 1,000 cores acceptable. Thanks, Ryan
Re: Sum of Fields and Record Count
Hi John, This sounds a lot like field collapsing functionality that a few people are working on in SOLR-236: https://issues.apache.org/jira/browse/SOLR-236 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: John Martyniak [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, December 10, 2008 6:16:21 PM Subject: Sum of Fields and Record Count Hi, I am a new solr user. I have an application that I would like to show the results but one result may be the part of larger set of results. So for example result #1 might also have 10 other results that are part of the same data set. Hopefully this makes sense. What I would like to find out is if there is a way within Solr to show the result that matched with the query, and then to also show that this result is part of a collection of 10 items. I have thought about doing it using some sort of external process that runs, and with doing multiple queries, so get the list of items and then query against each item. But those don't seem elegant. So I would like to find out if there is a way to do it within Solr that is a little more elegant, and hopefully without having to write additional code. Thank you in advance for the help. -John
Re: SolrConfig.xml Replication
Jeff, Are you using Solr 1.3 replication scripts? If so, I think it would be pretty simple to: 1) put all additional files to replicate to slaves to a specific location (or use a special naming scheme) on the master 2) write another script that uses scp or rsync to look for those additional files and copy them 3) run this new script whenever snappuller + snapinstaller run: snappuller snapinstaller my-file-copying-script It's not a part of Solr, but it's trivial to add. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jeff Newburn [EMAIL PROTECTED] To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wednesday, December 10, 2008 7:00:30 PM Subject: SolrConfig.xml Replication I am curious as to whether there is a solution to be able to replicate solrconfig.xml with the 1.4 replication. The obvious problem is that the master would replicate the solrconfig turning all slaves into masters with its config. I have also tried on a whim to configure the master and slave on the master so that the slave points to the same server but that seems to break the replication completely. Please let me know if anybody has any ideas -Jeff
ExtractingRequestHandler and XmlUpdateHandler
Hey folks, I'm looking at implementing ExtractingRequestHandler in the Apache_Solr_PHP library, and I'm wondering what we can do about adding meta-data. I saw the docs, which suggests you use different post headers to pass field values along with ext.literal. Is there anyway to use the XmlUpdateHandler instead along with a document? I'm not sure how this would work, perhaps it would require 2 trips, perhaps the XML would be in the post content and the file in something else? The thing is we would need to refactor the class pretty heavily in this case when indexing RichDocs and we were hoping to avoid it. Thanks, Jacob -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: [EMAIL PROTECTED]
Re: Sum of Fields and Record Count
Otis, Thanks for the information. It looks like the field collapsing is similar to what I am looking. But is that in the current release? Is it stable? Is there anyway to do it in Solr 1.3? -John On Dec 10, 2008, at 9:59 PM, Otis Gospodnetic wrote: Hi John, This sounds a lot like field collapsing functionality that a few people are working on in SOLR-236: https://issues.apache.org/jira/browse/SOLR-236 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: John Martyniak [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, December 10, 2008 6:16:21 PM Subject: Sum of Fields and Record Count Hi, I am a new solr user. I have an application that I would like to show the results but one result may be the part of larger set of results. So for example result #1 might also have 10 other results that are part of the same data set. Hopefully this makes sense. What I would like to find out is if there is a way within Solr to show the result that matched with the query, and then to also show that this result is part of a collection of 10 items. I have thought about doing it using some sort of external process that runs, and with doing multiple queries, so get the list of items and then query against each item. But those don't seem elegant. So I would like to find out if there is a way to do it within Solr that is a little more elegant, and hopefully without having to write additional code. Thank you in advance for the help. -John
RE: Setting Request Handler
Hi Grant, Thanks for the help. So now I can have multiple components, configured as last-components of standard request handler. Best Regards, Mukta -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:25 PM To: solr-user@lucene.apache.org Subject: Re: Setting Request Handler Inline below... Also, though, you should note that the /spellCheckCompRH that is packaged with the example is not necessarily the best way to actually use the SpellCheckComponent. It is intended to be used as a component in whatever your MAIN Request Handler is, it merely shows the how of hooking it in. On Dec 10, 2008, at 7:51 AM, Deshpande, Mukta wrote: Hi, I have a request handler in my solrconfig.xml : /spellCheckCompRH It utilizes the search component spellcheck. When I specify following query in browser, I get correct spelling suggestions from the file dictionary. http://localhost:8080/solr/spellCheckCompRH/?q=SolrDocsspellcheck.q=r el evancyspellcheck=truefl=title,scorespellcheck.dictionary=file Now I write a java program to achieve the same result: Code snippet . . server = new CommonsHttpSolrServer(http://localhost:8080/solr;); . . SolrQuery query = new SolrQuery(); query.setQuery(solr ); query.setFields(*,score); query.set(qt, spellCheckCompRH); Is spellCheckCompRH a variable? Does it equal /spellCheckCompRH? query.set(spellcheck, true); query.set(SpellingParams.SPELLCHECK_DICT, file); query.set(SpellingParams.SPELLCHECK_Q , solt); . . QueryResponse rsp = server.query( query ); SolrDocumentList docs = rsp.getResults(); SpellCheckResponse srsp = rsp.getSpellCheckResponse(); I get documents for my query but I do not get any spelling suggestions. I think that the request handler is not getting set for the query correctly. Can someone please help. Best Regards, Mukta -- Grant Ingersoll Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Dealing with field values as key/value pairs
On Thu, Dec 11, 2008 at 4:41 AM, Chris Hostetter [EMAIL PROTECTED] wrote: : This is really cool. U... How does it integrate with the Data Import : Handler? my DIH knowledge is extremely limited, but i'm guessing approach #1 is trivial (there is an easy way to concat DB values to build up solr field values right?); yes TemplateTransformer can help you here approach #2 would probably be possible using multiple root entities (assuming multiple root entites means what i think it means) Yes ,multiple rooot entities can do the trick (with a separate doctype). : I've taken two approaches in the past... : : 1) encode the id and the label in the field value; facet on it; require : clients to know how to decode. This works really well for simple things : where the the id=label mappings don't ever change, and are easy to encode : (ie 01234:Chris Hostetter). This is a horrible approach when id=label : mappings do change with any frequency. : : 2) have a seperate type of metadata document, one per thing that you are : faceting on containing fields for id and the label (and probably a doc_type : field so you can tell it apart from your main docs) then once you've done : your main query and gotten the results back facetied on id, you can query : for those ids to get the corrisponding labels. this works realy well if the : labels ever change (just reindex the corrisponding metadata document) and : has the added bonus that you can store additional metadata in each of those : docs, and in many use cases for presenting an initial browse interface, : you can sometimes get away with a cheap search for all metadata docs (or all : metadata docs meeting a certain : criteria) instead of an expensive facet query across all of your main : documents. -Hoss -- --Noble Paul
Re: SolrConfig.xml Replication
This is a known issue and I was planning to take it up soon. https://issues.apache.org/jira/browse/SOLR-821 On Thu, Dec 11, 2008 at 5:30 AM, Jeff Newburn [EMAIL PROTECTED] wrote: I am curious as to whether there is a solution to be able to replicate solrconfig.xml with the 1.4 replication. The obvious problem is that the master would replicate the solrconfig turning all slaves into masters with its config. I have also tried on a whim to configure the master and slave on the master so that the slave points to the same server but that seems to break the replication completely. Please let me know if anybody has any ideas -Jeff -- --Noble Paul
Re: Solr Newbie question
On Wed, Dec 10, 2008 at 11:00 PM, Rakesh Sinha [EMAIL PROTECTED] wrote: Hi - I am a new user of Solr tool and came across the introductory tutorial here - http://lucene.apache.org/solr/tutorial.html . I am planning to use Solr in one of my projects . I see that the tutorial mentions about a REST api / interface to add documents and to query the same. I would like to create the indices locally , where the web server (or pool of servers ) will have access to the database directly , but use the query REST api to query for the results. If your data resides in DB consider using DIH. http://wiki.apache.org/solr/DataImportHandler I am curious how this could be possible without taking the http rest api submission to add to indices. (For the sake of simplicity - we can assume it would be just one node to store the index but multiple readers / query machines that could potentially connect to the solr web service and retrieve the query results. Also the index might be locally present in the same machine as that of the Solr host or at least accessible through NFS etc. ) I guess you are thinking of using a master/slave setup. see this http://wiki.apache.org/solr/CollectionDistribution or http://wiki.apache.org/solr/SolrReplication Thanks for helping out to some starting pointers regarding the same. -- --Noble Paul
jboss and solr
I am trying to configure jboss wih solr As stated in wiki docs I copied the solr.war but there is no web-apps folder currently present in jboss. So should I create web-apps manually and paste the war file there. I tried configuring solr with tomcat as well. I paste the war file in tomcat's web-apps folder. Now when I set system property solr.solr.home It raises an class not found exception. Can any one help me with that. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: Sum of Fields and Record Count
Hi John, It's not in the current release, but the chances are it will make it into 1.4. You can try one of the recent patches and apply it to your Solr 1.3 sources. Check list archives for more discussion, this field collapsing was just discussed again today/yesterday. markmail.org is a good one. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: John Martyniak [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, December 10, 2008 10:51:57 PM Subject: Re: Sum of Fields and Record Count Otis, Thanks for the information. It looks like the field collapsing is similar to what I am looking. But is that in the current release? Is it stable? Is there anyway to do it in Solr 1.3? -John On Dec 10, 2008, at 9:59 PM, Otis Gospodnetic wrote: Hi John, This sounds a lot like field collapsing functionality that a few people are working on in SOLR-236: https://issues.apache.org/jira/browse/SOLR-236 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: John Martyniak To: solr-user@lucene.apache.org Sent: Wednesday, December 10, 2008 6:16:21 PM Subject: Sum of Fields and Record Count Hi, I am a new solr user. I have an application that I would like to show the results but one result may be the part of larger set of results. So for example result #1 might also have 10 other results that are part of the same data set. Hopefully this makes sense. What I would like to find out is if there is a way within Solr to show the result that matched with the query, and then to also show that this result is part of a collection of 10 items. I have thought about doing it using some sort of external process that runs, and with doing multiple queries, so get the list of items and then query against each item. But those don't seem elegant. So I would like to find out if there is a way to do it within Solr that is a little more elegant, and hopefully without having to write additional code. Thank you in advance for the help. -John
Re: jboss and solr
On Thu, Dec 11, 2008 at 11:21 AM, Neha Bhardwaj [EMAIL PROTECTED] wrote: I am trying to configure jboss wih solr As stated in wiki docs I copied the solr.war but there is no web-apps folder currently present in jboss. So should I create web-apps manually and paste the war file there. For JBoss, war files are deployed to this location: $JBOSS_HOME/server/default/deploy Please look up resources on the net for more information on running applications in JBoss. I tried configuring solr with tomcat as well. I paste the war file in tomcat's web-apps folder. Now when I set system property solr.solr.home It raises an class not found exception. Probably something is missing in the environment settings. One way to get solr running in Tomcat is to start the Tomcat server from the directory where solr home is present. E.g. solr home is at location /home/users/test-solr/solr then start tomcat server from /home/users/test-solr directory. This assumes that you have $TOMCAT_HOME/bin in your PATH env variable. Can any one help me with that. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. -- Regards, Akshay Ukey.
minimum match issue with dismax
Hi, do any one know how to make sure minimum match in dismax is working? i change the values and try doing solrCtl restart indexname but i don't see it taking into effect. any body have an idea on this? thank you vinay _ You live life online. So we put Windows on the web. http://clk.atdmt.com/MRT/go/127032869/direct/01/
Newbie Question boosting
I read many articles on boosting still iam not so clear on boosting. Can anyone explain the following questions with examples? 1) Can you given an example for field level boosting and document level boosting and the difference between two? 2) If we set the boost at field level (index time), should the query contains the that particular field? For example, if we set the boost for title field, should we create the termquery for title field? Also, based on your experience, can you explain why you need the boosting. Thanks, Ayyanar. A -- View this message in context: http://www.nabble.com/Newbie-Question-boosting-tp20950268p20950268.html Sent from the Solr - User mailing list archive at Nabble.com.
Nwebie Question on boosting
I read many articles on boosting still iam not so clear on boosting. Can anyone explain the following questions with examples? 1) Can you given an example for field level boosting and document level boosting and the difference between two? 2) If we set the boost at field level (index time), should the query contains the that particular field? For example, if we set the boost for title field, should we create the termquery for title field? Also, based on your experience, can you explain why you need the boosting. Thanks, Ayyanar. A -- View this message in context: http://www.nabble.com/Nwebie-Question-on-boosting-tp20950286p20950286.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nwebie Question on boosting
On Thu, Dec 11, 2008 at 6:49 AM, ayyanar [EMAIL PROTECTED]wrote: 1) Can you given an example for field level boosting and document level boosting and the difference between two? Field level boosting is used when one field is considered more or less important than another. For example, you may want the title field of a document to be considered more important so that if a term appears in the title this considered more important than if it appears in the body. On the other hand, document level boosting is about when a document is more or less important than another. For example, an FAQ is often considered a very important page and as such, may be required to appear higher in results than it otherwise would have. 2) If we set the boost at field level (index time), should the query contains the that particular field? For example, if we set the boost for title field, should we create the termquery for title field? Yes, if you want that it to make any difference. Rob