Solr scraping: Nutch and other alternatives.
Hello everyone. I've been thinking about a way to retrieve information from a domain (for example, http://www.ign.com) to process and index. My idea is to use Solr as a searcher. I'm familiarized with Apache Nutch and I know that the latest version has a gateway to Solr to retrieve and index information with it. I tried it and it worked fine, but it's a little bit complex to develop plugins to process info and index it in a new field desired. Perhaps one of you have tried another (and better) alternative to data mine web information. Which is your recommendation? Can you give me any scraping suggestion? Thank you very much. Luis Cappa.
Re: feeding while solr is running ?
Hello Alireza, thank you for your reply. I will read the solr tutorial ;-) Cheers Loren -- View this message in context: http://lucene.472066.n3.nabble.com/feeding-while-solr-is-running-tp3428500p3430478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: feeding while solr is running ?
Hello Robert, also many thanks to you for the LINKS and the short explanation. ;-) *hug* cheers Loren -- View this message in context: http://lucene.472066.n3.nabble.com/feeding-while-solr-is-running-tp3428500p3430483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Controlling the order of partial matches based on the position
this link is on he mailing list recently. http://www.lucidimagination.com/search/document/dfa18d52e7e8197c/getting_answers_starting_with_a_requested_string_first#b18e9f922c1e4149 On 18 October 2011 00:59, aronitin aro_ni...@yahoo.com wrote: Guys, It's been almost a week but there are no replies to the question that I posted. If its a small problem and already answered somewhere, please point me to that post. Otherwise please suggest any pointer to handle the requirement mentioned in the question, Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Controlling-the-order-of-partial-matches-based-on-the-position-tp3413867p3429823.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr scraping: Nutch and other alternatives.
Hi Luis, Have you tried the copyField function with custom analyzers and tokenizers? bye, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2011/10/18 Luis Cappa Banda luisca...@gmail.com Hello everyone. I've been thinking about a way to retrieve information from a domain (for example, http://www.ign.com) to process and index. My idea is to use Solr as a searcher. I'm familiarized with Apache Nutch and I know that the latest version has a gateway to Solr to retrieve and index information with it. I tried it and it worked fine, but it's a little bit complex to develop plugins to process info and index it in a new field desired. Perhaps one of you have tried another (and better) alternative to data mine web information. Which is your recommendation? Can you give me any scraping suggestion? Thank you very much. Luis Cappa.
Re: Controlling the order of partial matches based on the position
Hi, I would use a custom function query that uses termPositions to calculate the order of the values in the field to accomplished your requirements. Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2011/10/18 aronitin aro_ni...@yahoo.com Guys, It's been almost a week but there are no replies to the question that I posted. If its a small problem and already answered somewhere, please point me to that post. Otherwise please suggest any pointer to handle the requirement mentioned in the question, Nitin -- View this message in context: http://lucene.472066.n3.nabble.com/Controlling-the-order-of-partial-matches-based-on-the-position-tp3413867p3429823.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr scraping: Nutch and other alternatives.
I'm a bit biased but i would certainly use Nutch as it's the right tool for the job, it seems. Developing custom plugins is actually easier than you might think. Solr, with it's extracting request handling, can only help in a very limited way. Hello everyone. I've been thinking about a way to retrieve information from a domain (for example, http://www.ign.com) to process and index. My idea is to use Solr as a searcher. I'm familiarized with Apache Nutch and I know that the latest version has a gateway to Solr to retrieve and index information with it. I tried it and it worked fine, but it's a little bit complex to develop plugins to process info and index it in a new field desired. Perhaps one of you have tried another (and better) alternative to data mine web information. Which is your recommendation? Can you give me any scraping suggestion? Thank you very much. Luis Cappa.
Re: Question about near query order
analyze term~2 term analyze~2 In my case, two queries return different result set. Isn't that in your case? Hmm you are right, I tested with a trunk instance using lucene query parser. Results sets were different. If I am not wrong they were same at some version. I can suggest you two different solutions : One is to use SurroundQueryParser where you have explicit control over ordered versus unordered. But it has its own limitations. See below: http://wiki.apache.org/solr/SurroundQueryParser Second one : There is an example in lucene in action book about how to override QueryParser and replace PhraseQuery with SpanNearQuery. SpanNearQuery also has a Boolean parameter inOrder, You can use that example code. If you pass false to its constructor you will obtain unordered phrase query. But this second option assumes that you are using lucene query parser. defType=lucene
Re: Solr scraping: Nutch and other alternatives.
Hi Luis, just an opinion (worked with Nutch intensively, 2005-2008). Web crawling is a bitch, and Nutch won't make it any easier. Some problems you'll find along the way: 1. Spidering tunnels/traps 2. Duplicate and near-duplicate content removal 3. GET parameter explosion in dynamic pages 4. Compromises between breadth and depth of crawl (you only have that much time, and every site has its unique link geometry) Nutch has its own set of tools (urlfilters, depth control...) to cope with each problem, but sometimes you solve, say, 3, and 4 comes back again. My advice would be to use some popular search engines as a way to mine the web (you always can ask for all the pages indexed in a domain). They have done this job, and nicely done. In fact, due to their ranking algorithms (based on link geometry), a 'popular' page will always be indexed, and to me, that's a good circumstance (i.e: you can always claim that with your own web crawler you've covered more url's for a specific site, but what's the value if the extra url's are *not that important* ?) If I'm absolutely forced to crawl a site, I use plain old 'curl' or 'wget'. Open source, tunable via a vast array of parameters and 'black boxes'. I do not see any justification in deploying 'the nutch monster' just to crawl some web portion already crawled by popular search engines On the 'scrapping' / xhtml mining front, 'mechanize' library (python, perl, ruby, whatever flavour) and 'Beautiful Soup' for python have always fed my hunger for web scrapping. Good luck :D On Tue, Oct 18, 2011 at 9:16 AM, Marco Martinez mmarti...@paradigmatecnologico.com wrote: Hi Luis, Have you tried the copyField function with custom analyzers and tokenizers? bye, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2011/10/18 Luis Cappa Banda luisca...@gmail.com Hello everyone. I've been thinking about a way to retrieve information from a domain (for example, http://www.ign.com) to process and index. My idea is to use Solr as a searcher. I'm familiarized with Apache Nutch and I know that the latest version has a gateway to Solr to retrieve and index information with it. I tried it and it worked fine, but it's a little bit complex to develop plugins to process info and index it in a new field desired. Perhaps one of you have tried another (and better) alternative to data mine web information. Which is your recommendation? Can you give me any scraping suggestion? Thank you very much. Luis Cappa. -- Whether it's science, technology, personal experience, true love, astrology, or gut feelings, each of us has confidence in something that we will never fully comprehend. --Roy H. William
Re: upgrading 1.4 to 3.x
well i made a little diggin on web... so the problem is also described here too https://issues.apache.org/bugzilla/show_bug.cgi?id=40719 basically there was no details in the tomcat logs (maybe in some other logs but well i dont know) i came up with the same problem while implementing something else... anyway i hope this will be helpful for anyone who gets the same error thank you for you all who helped me with the issue - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/upgrading-1-4-to-3-x-tp3415044p3430748.html Sent from the Solr - User mailing list archive at Nabble.com.
How to change default operator in velocity?
in solr schema the defaultOperator value is OR but when i use browse(http://localhost:8983/solr/browse)for searching AND is a defaultOperator,and that config in solr is not affect on velocity how can i change the velocity template engine default operators? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3430871.html Sent from the Solr - User mailing list archive at Nabble.com.
How to retreive multiple documents using one unique field?
I have four different documents in single xml file(to be indexed), i don't want inject the unique field for each and every document .when i search with with the unique field all the four documents should come in result.i.e can common unique field should be applied to the all documents? My xml format : add docfield/field/doc docfield/field/doc docfield/field/doc docfield/field/doc commonuniqueid123/id/commonunique /add If i search for 123 all the four documents should come Is it possible ? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-retreive-multiple-documents-using-one-unique-field-tp3430931p3430931.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: help with phrase query
I think you can use pf2 and pf3 in your requestHandler. Best regards, Elisabeth 2011/10/16 Vijay Ramachandran vijay...@gmail.com Hello. I have an application where I try to match longer queries (sentences) to short documents (search phrases). Typically, the documents are 3-5 terms in length. I am facing a problem where phrase match in the indicated phrase fields via pf doesn't seem to match in most cases, and I am stumped. Please help! For instance, when my query is should I buy a house now while the rates are low. We filed BR 2 yrs ago. Rent now, w/ some sch loan debt I expect the document buy a house to match much higher than house loan rates. However, the latter is the document which always matches higher. I tried to do this the following way (solr 3.1): 1. Score phrase matches high 2. Score single word matches lower 3. Use dismax with a mm of 1, and very high boost for exact phrase match. I used the s text definition in the schema for the single words, and the following for the phrase: fieldType name=shingle class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=false/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.ShingleFilterFactory maxShingleSize=3 outputUnigrams=false/ /analyzer /fieldType and my schema fields look like this: field name=kw_stopped type=text_en indexed=true omitNorms=True / !-- keywords almost as is - to provide truer match for full phrases -- field name=kw_phrases type=shingle indexed=true omitNorms=True / This is my search handler config: requestHandler name=edismax class=solr.SearchHandler default=true lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str float name=tie0.1/float str name=fl kpid,advid,campaign,keywords /str str name=mm1/str str name=qf kw_stopped^1.0 /str str name=pf kw_phrases^50.0 /str int name=ps3/int int name=qs3/int str name=q.alt*:*/str !-- example highlighter config, enable per-query with hl=true -- str name=hl.flkeywords/str !-- for this field, we want no fragmenting, just highlighting -- str name=f.name.hl.fragsize0/str !-- instructs Solr to return the field itself if no query terms are found -- str name=f.name.hl.alternateFieldtitle/str str name=f.text.hl.fragmenterregex/str !-- defined below -- /lst /requestHandler These are the match score debugQuery explanations: 8.480054E-4 = (MATCH) sum of: 8.480054E-4 = (MATCH) product of: 0.0031093531 = (MATCH) sum of: 0.0015556295 = (MATCH) weight(kw_stopped:hous in 1812), product of: 2.8209004E-4 = queryWeight(kw_stopped:hous), product of: 5.514656 = idf(docFreq=25, maxDocs=2375) 5.1152787E-5 = queryNorm 5.514656 = (MATCH) fieldWeight(kw_stopped:hous in 1812), product of: 1.0 = tf(termFreq(kw_stopped:hous)=1) 5.514656 = idf(docFreq=25, maxDocs=2375) 1.0 = fieldNorm(field=kw_stopped, doc=1812) 8.192911E-4 = (MATCH) weight(kw_stopped:rate in 1812), product of: 2.0471694E-4 = queryWeight(kw_stopped:rate), product of: 4.002068 = idf(docFreq=117, maxDocs=2375) 5.1152787E-5 = queryNorm 4.002068 = (MATCH) fieldWeight(kw_stopped:rate in 1812), product of: 1.0 = tf(termFreq(kw_stopped:rate)=1) 4.002068 = idf(docFreq=117, maxDocs=2375) 1.0 = fieldNorm(field=kw_stopped, doc=1812) 7.344327E-4 = (MATCH) weight(kw_stopped:loan in 1812), product of: 1.9382538E-4 = queryWeight(kw_stopped:loan), product of: 3.7891462 = idf(docFreq=145, maxDocs=2375) 5.1152787E-5 = queryNorm 3.7891462 = (MATCH) fieldWeight(kw_stopped:loan in 1812), product of: 1.0 = tf(termFreq(kw_stopped:loan)=1) 3.7891462 = idf(docFreq=145, maxDocs=2375) 1.0 = fieldNorm(field=kw_stopped, doc=1812) 0.27272728 = coord(3/11) for house loan rates vs 8.480054E-4 = (MATCH) sum of: 8.480054E-4 =
IndexBasedSpellChecker on multiple fields
Hi all guys, I need to configure the IndexBasedSpellChecker that uses more than just one field as a spelling dictionary, is it possible to achieve? In the meanwhile I configured two spellcheckers and let users switch from a checkeer to another via params on GET request, but looks like people are not particularly happy about it... The main problem is that fields I need to speel contain different informations, I mean the intersection between the two sets could be empty. Many thanks in advance, all the best! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/
Re: Instructions for Multiple Server Webapps Configuring with JNDI
On 10/14/2011 2:44 PM, Chris Hostetter wrote: : modified the solr/home accordingly. I have an empty directory under : tomcat/webapps named after the solr home directory in the context fragment. if that empty directory has the same base name as your context fragment (ie: tomcat/webapps/solr0 and solr0.xml) that may give you problems ... the entire point of using context fragment files is to define webapps independently of a simple directory based hierarchy in tomcat/webapps ... if you have a directory there with the same name you create a conflict -- which webapp should it use, the empty one, or the one specified by your contextt file? Looks like that was the problem, once I removed the ./webapps/solr0 directory and started tomcat back up it was recreated correctly. : I expected to fire up tomcat and have it unpack the war file contents into the : solr home directory specified in the context fragment, but its empty, as is : the webapps directory. that's not what the solr/home env variable is for at all. tomcat will put the unpacked war where ever it needs/wants to (in theory it could just load it in memory) ... the point of the solr/home env variable is for you to tell the solr.war where to find the configuration files for this context. Sorry, my mistake. I wasn't referring to solr/home I was referring literally to the new solr home under tomcat - in this instance ./webapps/solr0. One more question, is there a particular advantage of multiple solr instances vs. multiple solr cores? Thanks.
Re: How to change default operator in velocity?
Hi, The reason why AND is default with /browse is that it uses the dismax query parser, which does not currently respect defaultOperator. If you want an OR like behaviour, try to add at the end of the url: mm=0 (which means minumum number of terms that should match=0), e.g. http://localhost:8983/solr/browse?q=samsung+maxtormm=0 For more about mm, see http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 NB: In trunk (4.0), even dismax will respect the defaultOperator from schema. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 18. okt. 2011, at 12:36, hadi wrote: in solr schema the defaultOperator value is OR but when i use browse(http://localhost:8983/solr/browse)for searching AND is a defaultOperator,and that config in solr is not affect on velocity how can i change the velocity template engine default operators? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3430871.html Sent from the Solr - User mailing list archive at Nabble.com.
solr fullpledged
Hi everybody i just downloaded solr application and modified the config files as per my requirement and i successfully got the results and i also developed a sample client application using javascript and i used my solr url there to retrieve the results evrything is fine now i would like to develop same application as a full pledged i mean i dont want to give risk to the user i need to develop an application with gud UI and i should get the config file inputs from user ui and i should store it onto solr xml files using JAva(jsp,springs).is there any way pls give me suggestions -- View this message in context: http://lucene.472066.n3.nabble.com/solr-fullpledged-tp3431187p3431187.html Sent from the Solr - User mailing list archive at Nabble.com.
Find Documents with field = maxValue
Hi, It might be a naive question. Assume we have a list of Document, each Document contains the information of a person, there is a numeric field named 'age', how can we find those Documents whose *age* field is *max(age) *in one query. So far I've found that function queries don't support aggregate functions, but how about nested queries. * *Thanks* * -- Alireza Salimi Java EE Developer
Re: How to change default operator in velocity?
thanks for your reply,i delete the dismax conf from solrconf.xml and it works,is it any side effect? On 10/18/11, Jan Høydahl / Cominvent [via Lucene] ml-node+s472066n3431189...@n3.nabble.com wrote: Hi, The reason why AND is default with /browse is that it uses the dismax query parser, which does not currently respect defaultOperator. If you want an OR like behaviour, try to add at the end of the url: mm=0 (which means minumum number of terms that should match=0), e.g. http://localhost:8983/solr/browse?q=samsung+maxtormm=0 For more about mm, see http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 NB: In trunk (4.0), even dismax will respect the defaultOperator from schema. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 18. okt. 2011, at 12:36, hadi wrote: in solr schema the defaultOperator value is OR but when i use browse(http://localhost:8983/solr/browse)for searching AND is a defaultOperator,and that config in solr is not affect on velocity how can i change the velocity template engine default operators? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3430871.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431189.html To unsubscribe from How to change default operator in velocity?, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3430871code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzNDMwODcxfC02NDQ5ODMwMjM= -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431294.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: millions of records problem
Getting a solid-state drive might help -- View this message in context: http://lucene.472066.n3.nabble.com/millions-of-records-problem-tp3427796p3431309.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Find Documents with field = maxValue
--- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote: From: Alireza Salimi alireza.sal...@gmail.com Subject: Find Documents with field = maxValue To: solr-user@lucene.apache.org Date: Tuesday, October 18, 2011, 4:10 PM Hi, It might be a naive question. Assume we have a list of Document, each Document contains the information of a person, there is a numeric field named 'age', how can we find those Documents whose *age* field is *max(age) *in one query. May be http://wiki.apache.org/solr/StatsComponent? Or sort by age? q=*:*start=0rows=1sort=age desc
performace jetty (jetty.xml)
Hi, i just change my solr installation from 1.4 to 3.4.. i can notice that also jetty configuration file (jetty.xml) is changed. default threads number is higher, theadpool is higher and other default value are higher. is it normal?? what number of these value do you seems are correct for me? i have a dedicated machine with 2 solr istances inside my machine has 8gb of ram and 8 cpu.. i do like 200.000 - 250.000 calls to solr a day... someone can help me?? - Theads number (min,max and low) - corepool size and maximum poolsize * *
Re: Find Documents with field = maxValue
Hi Ahmet, Thanks for your reply, but I want ALL documents with age = max_age. On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan iori...@yahoo.com wrote: --- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote: From: Alireza Salimi alireza.sal...@gmail.com Subject: Find Documents with field = maxValue To: solr-user@lucene.apache.org Date: Tuesday, October 18, 2011, 4:10 PM Hi, It might be a naive question. Assume we have a list of Document, each Document contains the information of a person, there is a numeric field named 'age', how can we find those Documents whose *age* field is *max(age) *in one query. May be http://wiki.apache.org/solr/StatsComponent? Or sort by age? q=*:*start=0rows=1sort=age desc -- Alireza Salimi Java EE Developer
Re: performace jetty (jetty.xml)
Can't you use some profilers to find out about your new performance? I'm new to Solr, but I think 200,000 req/day is not that many. On Tue, Oct 18, 2011 at 10:03 AM, Gastone Penzo gastone.pe...@gmail.comwrote: Hi, i just change my solr installation from 1.4 to 3.4.. i can notice that also jetty configuration file (jetty.xml) is changed. default threads number is higher, theadpool is higher and other default value are higher. is it normal?? what number of these value do you seems are correct for me? i have a dedicated machine with 2 solr istances inside my machine has 8gb of ram and 8 cpu.. i do like 200.000 - 250.000 calls to solr a day... someone can help me?? - Theads number (min,max and low) - corepool size and maximum poolsize * * -- Alireza Salimi Java EE Developer
RE: Find Documents with field = maxValue
I don't know anything about your environment, so maybe this doesn't make sense, but maybe you can check your source system (database or whatnot) to get the max_age, then search for the max_age in your Solr index. It's not as elegant, but may be a lot easier. To reduce the risk of interacting with potentially stale data, you may want to change your = to = or whatever is appropriate. Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 Software Engineer II | Element K | www.elementk.com -Original Message- From: Alireza Salimi [mailto:alireza.sal...@gmail.com] Sent: Tuesday, October 18, 2011 10:15 AM To: solr-user@lucene.apache.org Subject: Re: Find Documents with field = maxValue Hi Ahmet, Thanks for your reply, but I want ALL documents with age = max_age. On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan iori...@yahoo.com wrote: --- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote: From: Alireza Salimi alireza.sal...@gmail.com Subject: Find Documents with field = maxValue To: solr-user@lucene.apache.org Date: Tuesday, October 18, 2011, 4:10 PM Hi, It might be a naive question. Assume we have a list of Document, each Document contains the information of a person, there is a numeric field named 'age', how can we find those Documents whose *age* field is *max(age) *in one query. May be http://wiki.apache.org/solr/StatsComponent? Or sort by age? q=*:*start=0rows=1sort=age desc -- Alireza Salimi Java EE Developer
solr/lucene and its database (a silly question)
Hello expert, I have just a silly question regarding to Solr/Lucene, pls. Where are the importing data stored ? In Lucene or Solr ? Here is a picture of the architecture. http://3.bp.blogspot.com/-rTZPN3sm9e0/TjAdqciXHgI/Cs0/N_W_iSAI8cY/s1600/solr_arch.jpg I mean when importing the data to Lucene. As for my understanding, the data will be gone through some processes (document processing), then finally it will store as Index (XML structure???) in Lucene engine ? Is this correct ? If yes, what kind of Database does Lucene use ? Or how are the data stored in lucene ? Thank you for your answer. :-) Cheers Loren -- View this message in context: http://lucene.472066.n3.nabble.com/solr-lucene-and-its-database-a-silly-question-tp3431436p3431436.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr/lucene and its database (a silly question)
In here: http://wiki.apache.org/solr/SolrConfigXml#dataDir_parameter On Tue, Oct 18, 2011 at 10:38 AM, lorenlai loren...@yahoo.com wrote: Hello expert, I have just a silly question regarding to Solr/Lucene, pls. Where are the importing data stored ? In Lucene or Solr ? Here is a picture of the architecture. http://3.bp.blogspot.com/-rTZPN3sm9e0/TjAdqciXHgI/Cs0/N_W_iSAI8cY/s1600/solr_arch.jpg I mean when importing the data to Lucene. As for my understanding, the data will be gone through some processes (document processing), then finally it will store as Index (XML structure???) in Lucene engine ? Is this correct ? If yes, what kind of Database does Lucene use ? Or how are the data stored in lucene ? Thank you for your answer. :-) Cheers Loren -- View this message in context: http://lucene.472066.n3.nabble.com/solr-lucene-and-its-database-a-silly-question-tp3431436p3431436.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alireza Salimi Java EE Developer
Re: Question about near query order
Thank you for your kind reply. Is it possible only defType=lucnee in your second suggestion? I'm using ComplexPhraseQueryParser. So my defType is complexphrase. -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-near-query-order-tp3427312p3431465.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr/lucene and its database (a silly question)
SOLR stores all data in the directory you specify in solrconfig.xml in dataDir setting. SOLR uses Lucene to store all the data in one or more proprietary binary files called segment files. As a SOLR user typically you should not be too concerned with binary index structure. You can see details here (some details may be out of date): http://lucene.apache.org/java/2_3_2/fileformats.html Bob On Oct 18, 2011, at 10:38 AM, lorenlai wrote: Hello expert, I have just a silly question regarding to Solr/Lucene, pls. Where are the importing data stored ? In Lucene or Solr ? Here is a picture of the architecture. http://3.bp.blogspot.com/-rTZPN3sm9e0/TjAdqciXHgI/Cs0/N_W_iSAI8cY/s1600/solr_arch.jpg I mean when importing the data to Lucene. As for my understanding, the data will be gone through some processes (document processing), then finally it will store as Index (XML structure???) in Lucene engine ? Is this correct ? If yes, what kind of Database does Lucene use ? Or how are the data stored in lucene ? Thank you for your answer. :-) Cheers Loren -- View this message in context: http://lucene.472066.n3.nabble.com/solr-lucene-and-its-database-a-silly-question-tp3431436p3431436.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: IndexBasedSpellChecker on multiple fields
Simone, You can set up a master dictionary but with a few caveats. What you'll need to do is copyfield all of the fields you want to include in your master dictionary into one field and base your IndexBasedSpellChecker dictionary on that. In addition, I would recommend you use the collate feature and set spellcheck.maxCollationTries to something greater than zero (5-10 is usually good). Otherwise, you probably will get a lot of ridiculous suggestions from it trying to correct words from one field with values from another. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more information. There is still a big problem with approach, however. Unless you set onlyMorePopular=true, Solr will never suggest a correction for a word that exists in the dictionary. By creating a huge master dictionary, you will be increasing the chances that Solr will assume your users' misspelled words are in fact correct. One way to work around this is instead of blindly using copyField, to hand-pick a subset of your terms for the master field on which you base your dictionary. Another workaround is to use onlyMorePopular, although this has its own problems. See the discussion for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these problems. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of Simone Tripodi Sent: Tuesday, October 18, 2011 7:06 AM To: solr-user@lucene.apache.org Subject: IndexBasedSpellChecker on multiple fields Hi all guys, I need to configure the IndexBasedSpellChecker that uses more than just one field as a spelling dictionary, is it possible to achieve? In the meanwhile I configured two spellcheckers and let users switch from a checkeer to another via params on GET request, but looks like people are not particularly happy about it... The main problem is that fields I need to speel contain different informations, I mean the intersection between the two sets could be empty. Many thanks in advance, all the best! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/
Re: Question about near query order
Is it possible only defType=lucnee in your second suggestion? I'm using ComplexPhraseQueryParser. So my defType is complexphrase. Oh, then life is easy. Just setting the inOrder parameter to false in solrconfig.xml should do the trick. queryParser name=complexphrase class=org.apache.solr.search.ComplexPhraseQParserPlugin bool name=inOrderfalse/bool /queryParser
Re: How to change default operator in velocity?
Rather than deleting the dismax config, I would recommend adding a new entry inside your /browse request handler config's lst name=defaults tag: str name=mm0/str This will go OR mode, and you will still benefit from all the advantages that DisMax gives you for weighted search across different fields. See http://wiki.apache.org/solr/DisMaxQParserPlugin to learn more about DisMax. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 18. okt. 2011, at 15:56, hadi wrote: thanks for your reply,i delete the dismax conf from solrconf.xml and it works,is it any side effect? On 10/18/11, Jan Høydahl / Cominvent [via Lucene] ml-node+s472066n3431189...@n3.nabble.com wrote: Hi, The reason why AND is default with /browse is that it uses the dismax query parser, which does not currently respect defaultOperator. If you want an OR like behaviour, try to add at the end of the url: mm=0 (which means minumum number of terms that should match=0), e.g. http://localhost:8983/solr/browse?q=samsung+maxtormm=0 For more about mm, see http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 NB: In trunk (4.0), even dismax will respect the defaultOperator from schema. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 18. okt. 2011, at 12:36, hadi wrote: in solr schema the defaultOperator value is OR but when i use browse(http://localhost:8983/solr/browse)for searching AND is a defaultOperator,and that config in solr is not affect on velocity how can i change the velocity template engine default operators? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3430871.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431189.html To unsubscribe from How to change default operator in velocity?, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3430871code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzNDMwODcxfC02NDQ5ODMwMjM= -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431294.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query with star returns double type values equal 0
Hi iorixxx, I am using lucene On Monday, October 17, 2011 5:58:31 PM, iorixxx [via Lucene] wrote: I am experiencing an unexpected behavior using solr 3.4.0. if my query includes a star, all the properties of type 'long' or 'LatLon' have 0 as value (ex: select/?start=0q=way*rows=10version=2) Though the same request without stars returns correct values (ex: select/?start=0q=wayrows=10version=2) Does anyone have an idea? Please keep in mind that wildcard queries are not analyzed. What query parser are you using? lucene, dismax, edismax? If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Query-with-star-returns-double-type-values-equal-0-tp3428721p3429578.html To unsubscribe from Query with star returns double type values equal 0, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3428721code=cm9tYWluLmR1cGFzQGdtYWlsLmNvbXwzNDI4NzIxfDE3MzgwNjIyOTA=. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-with-star-returns-double-type-values-equal-0-tp3428721p3432312.html Sent from the Solr - User mailing list archive at Nabble.com.
Access Document Score in Custom Function Query (ValueSource)
Hi, I use the following 2 components in ranking documents: Normal Query : myField^2 Custom Function Query(ValueSource): myFunc() In this value source I compute another score for every document using some features. I want to access the score of the query myField^2 (for a given document) in this same value source. Ideas? Thanks Sid -- View this message in context: http://lucene.472066.n3.nabble.com/Access-Document-Score-in-Custom-Function-Query-ValueSource-tp3432459p3432459.html Sent from the Solr - User mailing list archive at Nabble.com.
Term Frequency - tf() ?
I've revised the tf() function to always return 1, regardless of the number of terms it finds. However, I run into a problem when a stemming words and root words appear together. These documents get a higher boost than documents with just the root. For example: woman walking fast gets tf(woman) = 1 woman walking fast women walking fast gets tf(woman) = 1 and tf(women) = 1, resulting in higher score than just woman Is there a way to always return 1 for tf(), regardless of stemming words or synonyms? Thanks, Hung
Dismax boost + payload boost
Is it possible to combine dismax boost (query time) and payload boost (index time)? I've done something very similar to this post http://sujitpal.blogspot.com/2011/01/payloads-with-solr.html but it seems that query time boosts get ignored.
Re: Find Documents with field = maxValue
Hi, Are you just looking for: age:target age This will return all documents/records where age field is equal to target age. But maybe you want age:[0 TO target age here] This will include people aged from 0 to target age. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Alireza Salimi alireza.sal...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 10:15 AM Subject: Re: Find Documents with field = maxValue Hi Ahmet, Thanks for your reply, but I want ALL documents with age = max_age. On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan iori...@yahoo.com wrote: --- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote: From: Alireza Salimi alireza.sal...@gmail.com Subject: Find Documents with field = maxValue To: solr-user@lucene.apache.org Date: Tuesday, October 18, 2011, 4:10 PM Hi, It might be a naive question. Assume we have a list of Document, each Document contains the information of a person, there is a numeric field named 'age', how can we find those Documents whose *age* field is *max(age) *in one query. May be http://wiki.apache.org/solr/StatsComponent? Or sort by age? q=*:*start=0rows=1sort=age desc -- Alireza Salimi Java EE Developer
Re: performace jetty (jetty.xml)
Gastone, Those numbers are probably OK. Let us know if you have any actual problems with Solr 3.4. Oh, and use the solr-user mailing list instead please. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Gastone Penzo gastone.pe...@gmail.com To: solr-user@lucene.apache.org; d...@lucene.apache.org Sent: Tuesday, October 18, 2011 10:03 AM Subject: performace jetty (jetty.xml) Hi, i just change my solr installation from 1.4 to 3.4.. i can notice that also jetty configuration file (jetty.xml) is changed. default threads number is higher, theadpool is higher and other default value are higher. is it normal?? what number of these value do you seems are correct for me? i have a dedicated machine with 2 solr istances inside my machine has 8gb of ram and 8 cpu.. i do like 200.000 - 250.000 calls to solr a day... someone can help me?? - Theads number (min,max and low) - corepool size and maximum poolsize
Re: Find Documents with field = maxValue
Hi Alireza, Would this work? Sort the results by age desc, then loop through the results as long as age == age[0]. -sujit On Tue, 2011-10-18 at 15:23 -0700, Otis Gospodnetic wrote: Hi, Are you just looking for: age:target age This will return all documents/records where age field is equal to target age. But maybe you want age:[0 TO target age here] This will include people aged from 0 to target age. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Alireza Salimi alireza.sal...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 10:15 AM Subject: Re: Find Documents with field = maxValue Hi Ahmet, Thanks for your reply, but I want ALL documents with age = max_age. On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan iori...@yahoo.com wrote: --- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote: From: Alireza Salimi alireza.sal...@gmail.com Subject: Find Documents with field = maxValue To: solr-user@lucene.apache.org Date: Tuesday, October 18, 2011, 4:10 PM Hi, It might be a naive question. Assume we have a list of Document, each Document contains the information of a person, there is a numeric field named 'age', how can we find those Documents whose *age* field is *max(age) *in one query. May be http://wiki.apache.org/solr/StatsComponent? Or sort by age? q=*:*start=0rows=1sort=age desc -- Alireza Salimi Java EE Developer
Re: How to retreive multiple documents using one unique field?
This won't work. But you could add all 4 docs with the same 123 value in their id fields, just comment out uniqueKey field. Don't ask me what will or will not happen when you later try updating a document with id:123... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: kiran.bodigam kiran.bodi...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 7:14 AM Subject: How to retreive multiple documents using one unique field? I have four different documents in single xml file(to be indexed), i don't want inject the unique field for each and every document .when i search with with the unique field all the four documents should come in result.i.e can common unique field should be applied to the all documents? My xml format : add docfield/field/doc docfield/field/doc docfield/field/doc docfield/field/doc commonuniqueid123/id/commonunique /add If i search for 123 all the four documents should come Is it possible ? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-retreive-multiple-documents-using-one-unique-field-tp3430931p3430931.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OS Cache - Solr
Maybe your Solr Document cache is big and that's consuming a big part of that JVM heap? If you want to be able to run with a smaller heap, consider making your caches smaller. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Sujatha Arun suja.a...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 12:53 AM Subject: Re: OS Cache - Solr Hello Jan, Thanks for your response and clarification. We are monitoring the JVM cache utilization and we are currently using about 18 GB of the 20 GB assigned to JVM. Out total index size being abt 14GB Regards Sujatha On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl jan@cominvent.com wrote: Hi Sujatha, Are you sure you need 20Gb for Tomcat? Have you profiled using JConsole or similar? Try with 15Gb and see how it goes. The reason why this is beneficial is that you WANT your OS to have available memory for disk caching. If you have 17Gb free after starting Solr, your OS will be able to cache all index files in memory and you get very high search performance. With your current settings, there is only 12Gb free for both caching the index and for your MySql activities. Chances are that when you backup MySql, the cached part of your Solr index gets flushed from disk caches and need to be re-cached later. How to interpret memory stats vary between OSes, and seing 163Mb free may simply mean that your OS has used most RAM for various caches and paging, but will flush it once an application asks for more memory. Have you seen http://wiki.apache.org/solr/SolrPerformanceFactors ? You should also slim down your index maximally by setting stored=false and indexed=false wherever possible. I would also upgrade to a more current Solr version. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 17. okt. 2011, at 19:51, Sujatha Arun wrote: Hello I am trying to understand the OS cache utilization of Solr .Our server has several solr instances on a server .The total combined Index size of all instances is abt 14 Gb and the size of the maximum single Index is abt 2.5 GB . Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has been assigned to JVM. We are running solr1.3 on tomcat 5.5 and Java 1.6 Our current Statistics indicate that solr uses 18-19 GB of 20 GB RAM assigned to JVM .However the Free physical seems to remain constant as below. Free physical memory = 163 Mb Total physical memory = 32,232 Mb, The server also serves as a backup server for Mysql where the application DB is backed up and restored .During this activity we see that lot of queries that nearly take even 10+ minutes to execute .But other wise maximum query time is less than 1-2 secs The physical memory that is free seems to be constant . Why is this constant and how this will be used between the Mysql backup and solr while backup activity is happening How much free physical memory should be available to OS given out stats.? Any pointers would be helpful. Regards Sujatha
score based on unique words matching???
Heres my problem : field1 (text) - subject q=david bowie changes Problem : If a record mentions david bowie a lot, it beats out something more relevant (more unique matches) ... A. (now appearing david bowie at the cineplex 7pm david bowie goes on stage, then mr. bowie will sign autographs) B. song :david bowie - changes (A) ends up more relevant because of the frequency or number of words in it.. not cool... I want it so the number of words matching will trump density/weight Thanks im a newbie. -Craig
Hit search-lucene.com a little harder
Hello folks, Do you ever use http://search-lucene.com (SL) or http://search-hadoop.com (SH)? If you do, I'd like to ask you for a small favour: We are at Lucene Eurocon in Barcelona and we are about to show the Search Analytics [1] and Performance Monitoring [2] tools/services we've built and that we use on these two sites. We would like to show the audience various pretty graphs and would love those graph to be a little less sparse. :) So if you use SL and/or SH, please feel free to use them a little extra now, if you feel like helping. [1] http://sematext.com/search-analytics/index.html [2] http://sematext.com/spm/solr-performance-monitoring/index.html I think we'll open up both of the above services to the public tomorrow (and 100% free for undetermined length of time), but if you don't have time to sign up and set it up for yourself, yet are interested in reports, graphs, etc., let me know and we'll put together a blog post or something and include interesting things in it. Thanks, Otis
changing base URLs in indexes
Hi, I am getting ready to index a recent copy of Wikipedia's pages-articles dump. I have two servers, foo and bar. On foo.com/mediawiki I have a Mediawiki install serving up the pages. On bar.com/solr I have my solr install. I have the pages-articles.xml file from Wikipedia and the solr instructions at http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia. It looks pretty straightforward but I have a couple of preparatory questions. If I index the pages-articles.xml on bar.com/solr, they will then be pointing to the relative links on solr.com/mediawiki, which don't exist, right? So is there a way to tell solr that the base url for a bunch of index records is different than what it thinks they are? Or would it be easier simply to put a solr installation on foo.com? \ FredZ
Re: Instructions for Multiple Server Webapps Configuring with JNDI
On 10/18/2011 6:59 AM, Tod wrote: One more question, is there a particular advantage of multiple solr instances vs. multiple solr cores? One way of doing multiple instances is running more than one copy of your container (tomcat/jetty/whatever). I've never tried to put more than one .war file into a container ... I have no idea how to tell each one where its solr home is. It may be possible, but I've never tried. Either way, you'd end up with overhead because a certain amount of memory is required just to get each copy of Solr started. There is some additional flexibility with multiple containers - they can be easily stopped and started independently at the OS level. With cores, there isn't as much overhead because there's only one application running, handling multiple indexes. There is some ability to load/unload each index independently with CoreAdmin, but it's not controllable at the OS level. In a well designed full system that includes software and hardware redundancy, being unable to independently stop/start an index isn't much of a worry. Thanks, Shawn
Re: Question about near query order
Thanks a ton iorixxx. Jason. -- View this message in context: http://lucene.472066.n3.nabble.com/Question-about-near-query-order-tp3427312p3432922.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: changing base URLs in indexes
Is this a crawler indexing the pages? If so, i would point it to whatever you need. If, for some reason, you cannot, you can modifiy the host/domain in your index using pattern char filters or maybe the stored (returned) values using a custom update processor. Hi, I am getting ready to index a recent copy of Wikipedia's pages-articles dump. I have two servers, foo and bar. On foo.com/mediawiki I have a Mediawiki install serving up the pages. On bar.com/solr I have my solr install. I have the pages-articles.xml file from Wikipedia and the solr instructions at http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia. It looks pretty straightforward but I have a couple of preparatory questions. If I index the pages-articles.xml on bar.com/solr, they will then be pointing to the relative links on solr.com/mediawiki, which don't exist, right? So is there a way to tell solr that the base url for a bunch of index records is different than what it thinks they are? Or would it be easier simply to put a solr installation on foo.com? \ FredZ
use lucene to create index(with synonym) and solr query index
1.use lucene to create index(with synonym) 2.config solr open synonym functionality 3.user solr to query lucene index but the result missing the synonym word why? and how can i do with each other. thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/use-lucene-to-create-index-with-synonym-and-solr-query-index-tp3433124p3433124.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OS Cache - Solr
Thanks ,Otis, This is our Solr Cache Allocation.We have the same Cache allocation for all our *200+ instances* in the single Server.Is this too high? *Query Result Cache*:LRU Cache(maxSize=16384, initialSize=4096, autowarmCount=1024, ) *Document Cache *:LRU Cache(maxSize=16384, initialSize=16384) *Filter Cache* LRU Cache(maxSize=16384, initialSize=4096, autowarmCount=4096, ) Regards Sujatha On Wed, Oct 19, 2011 at 4:05 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Maybe your Solr Document cache is big and that's consuming a big part of that JVM heap? If you want to be able to run with a smaller heap, consider making your caches smaller. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Sujatha Arun suja.a...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 12:53 AM Subject: Re: OS Cache - Solr Hello Jan, Thanks for your response and clarification. We are monitoring the JVM cache utilization and we are currently using about 18 GB of the 20 GB assigned to JVM. Out total index size being abt 14GB Regards Sujatha On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl jan@cominvent.com wrote: Hi Sujatha, Are you sure you need 20Gb for Tomcat? Have you profiled using JConsole or similar? Try with 15Gb and see how it goes. The reason why this is beneficial is that you WANT your OS to have available memory for disk caching. If you have 17Gb free after starting Solr, your OS will be able to cache all index files in memory and you get very high search performance. With your current settings, there is only 12Gb free for both caching the index and for your MySql activities. Chances are that when you backup MySql, the cached part of your Solr index gets flushed from disk caches and need to be re-cached later. How to interpret memory stats vary between OSes, and seing 163Mb free may simply mean that your OS has used most RAM for various caches and paging, but will flush it once an application asks for more memory. Have you seen http://wiki.apache.org/solr/SolrPerformanceFactors ? You should also slim down your index maximally by setting stored=false and indexed=false wherever possible. I would also upgrade to a more current Solr version. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 17. okt. 2011, at 19:51, Sujatha Arun wrote: Hello I am trying to understand the OS cache utilization of Solr .Our server has several solr instances on a server .The total combined Index size of all instances is abt 14 Gb and the size of the maximum single Index is abt 2.5 GB . Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has been assigned to JVM. We are running solr1.3 on tomcat 5.5 and Java 1.6 Our current Statistics indicate that solr uses 18-19 GB of 20 GB RAM assigned to JVM .However the Free physical seems to remain constant as below. Free physical memory = 163 Mb Total physical memory = 32,232 Mb, The server also serves as a backup server for Mysql where the application DB is backed up and restored .During this activity we see that lot of queries that nearly take even 10+ minutes to execute .But other wise maximum query time is less than 1-2 secs The physical memory that is free seems to be constant . Why is this constant and how this will be used between the Mysql backup and solr while backup activity is happening How much free physical memory should be available to OS given out stats.? Any pointers would be helpful. Regards Sujatha
Re: How to change default operator in velocity?
thanks a lot,your answer is great On 10/18/11, Jan Høydahl / Cominvent [via Lucene] ml-node+s472066n3431940...@n3.nabble.com wrote: Rather than deleting the dismax config, I would recommend adding a new entry inside your /browse request handler config's lst name=defaults tag: str name=mm0/str This will go OR mode, and you will still benefit from all the advantages that DisMax gives you for weighted search across different fields. See http://wiki.apache.org/solr/DisMaxQParserPlugin to learn more about DisMax. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 18. okt. 2011, at 15:56, hadi wrote: thanks for your reply,i delete the dismax conf from solrconf.xml and it works,is it any side effect? On 10/18/11, Jan Høydahl / Cominvent [via Lucene] ml-node+s472066n3431189...@n3.nabble.com wrote: Hi, The reason why AND is default with /browse is that it uses the dismax query parser, which does not currently respect defaultOperator. If you want an OR like behaviour, try to add at the end of the url: mm=0 (which means minumum number of terms that should match=0), e.g. http://localhost:8983/solr/browse?q=samsung+maxtormm=0 For more about mm, see http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 NB: In trunk (4.0), even dismax will respect the defaultOperator from schema. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 18. okt. 2011, at 12:36, hadi wrote: in solr schema the defaultOperator value is OR but when i use browse(http://localhost:8983/solr/browse)for searching AND is a defaultOperator,and that config in solr is not affect on velocity how can i change the velocity template engine default operators? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3430871.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431189.html To unsubscribe from How to change default operator in velocity?, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3430871code=bWQuYW5iYXJpQGdtYWlsLmNvbXwzNDMwODcxfC02NDQ5ODMwMjM= -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431294.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3431940.html To unsubscribe from Solr, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472067code=bWQuYW5iYXJpQGdtYWlsLmNvbXw0NzIwNjd8LTY0NDk4MzAyMw== -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-default-operator-in-velocity-tp3430871p3433415.html Sent from the Solr - User mailing list archive at Nabble.com.
How to update document with solrj?
I have indexed some files that do not have any tag or description and i want to add some field without deleting them,how can i update or add info to my index files with solrj? my idea for this issue is query on specific file and delete it and add some info and re index it but i think it is not a good idea -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-update-document-with-solrj-tp3433434p3433434.html Sent from the Solr - User mailing list archive at Nabble.com.
add thumnail image for search result
I want to know how can i add thumbnail image for my files when i am indexing files with solrj? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/add-thumnail-image-for-search-result-tp3433440p3433440.html Sent from the Solr - User mailing list archive at Nabble.com.