eDismax parser and the mm parameter
Hi All, I am planning to use the eDismax query parser in SOLR to give boost to documents that have a phrase in their fields present. Now there is a mm parameter in the edismax parser query , since the query typed by the user could be of any length (i.e. =1) I would like to set the mm value to 1 . I have the following questions regarding this parameter. 1. Is it set to 1 by default ? 2. In my schema.xml the defaultOperator is set to AND should I set it to OR inorder for the edismax parser to be effective with a mm of 1? Thanks in advance!
Re: eDismax parser and the mm parameter
1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the minimum number of OR/SHOULD clauses that must match on the top level of a query. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 2:25 AM To: solr-user@lucene.apache.org Subject: eDismax parser and the mm parameter Hi All, I am planning to use the eDismax query parser in SOLR to give boost to documents that have a phrase in their fields present. Now there is a mm parameter in the edismax parser query , since the query typed by the user could be of any length (i.e. =1) I would like to set the mm value to 1 . I have the following questions regarding this parameter. 1. Is it set to 1 by default ? 2. In my schema.xml the defaultOperator is set to AND should I set it to OR inorder for the edismax parser to be effective with a mm of 1? Thanks in advance!
Re: eDismax parser and the mm parameter
Hi, Using mm=1 with (e)dismax is not a good idea. Your user will be unhappy. Because there in no coord factor with this parser. coord is about : Typically, a document that contains more of the query's terms will receive a higher score than another document with fewer query terms. I suggest you to use something more restrictive : 3-1 680% I think there is a new feature autoRelax in some ticket. Even better start with mm=100% and relax mm value until you retrieve *enough* documents. It is OK to use default operator of OR with default operator because coord factor kicks in. http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/search/Similarity.html#formula_coord https://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 Ahmet On Sunday, March 30, 2014 12:21 PM, Jack Krupansky j...@basetechnology.com wrote: 1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the minimum number of OR/SHOULD clauses that must match on the top level of a query. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 2:25 AM To: solr-user@lucene.apache.org Subject: eDismax parser and the mm parameter Hi All, I am planning to use the eDismax query parser in SOLR to give boost to documents that have a phrase in their fields present. Now there is a mm parameter in the edismax parser query , since the query typed by the user could be of any length (i.e. =1) I would like to set the mm value to 1 . I have the following questions regarding this parameter. 1. Is it set to 1 by default ? 2. In my schema.xml the defaultOperator is set to AND should I set it to OR inorder for the edismax parser to be effective with a mm of 1? Thanks in advance!
Re: eDismax parser and the mm parameter
Thanks Ahmet. So if its single term query like 'Ginseng' what does a mm=3 do to the query .I am guessing it would be reduced to 1 automatically in this case. Sent from my HTC - Reply message - From: Ahmet Arslan iori...@yahoo.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Subject: eDismax parser and the mm parameter Date: Sun, Mar 30, 2014 7:52 AM Hi, Using mm=1 with (e)dismax is not a good idea. Your user will be unhappy. Because there in no coord factor with this parser. coord is about : Typically, a document that contains more of the query's terms will receive a higher score than another document with fewer query terms. I suggest you to use something more restrictive : 3-1 680% I think there is a new feature autoRelax in some ticket. Even better start with mm=100% and relax mm value until you retrieve *enough* documents. It is OK to use default operator of OR with default operator because coord factor kicks in. http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/search/Similarity.html#formula_coord https://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 Ahmet On Sunday, March 30, 2014 12:21 PM, Jack Krupansky j...@basetechnology.com wrote: 1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the minimum number of OR/SHOULD clauses that must match on the top level of a query. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 2:25 AM To: solr-user@lucene.apache.org Subject: eDismax parser and the mm parameter Hi All, I am planning to use the eDismax query parser in SOLR to give boost to documents that have a phrase in their fields present. Now there is a mm parameter in the edismax parser query , since the query typed by the user could be of any length (i.e. =1) I would like to set the mm value to 1 . I have the following questions regarding this parameter. 1. Is it set to 1 by default ? 2. In my schema.xml the defaultOperator is set to AND should I set it to OR inorder for the edismax parser to be effective with a mm of 1? Thanks in advance!
Re: eDismax parser and the mm parameter
Thanks Jack! I understand the intent of mm parameter, my question is that since the query terms being provided are not of fixed length I do not know what the mm should like for example Ginseng,Siberian Ginseng are my search terms. The first one can have an mm upto 1 and the second one can have an mm of upto 2 . Should I dynamically set the mm based on the number of search terms in my query ? Thanks again. On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.comwrote: 1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the minimum number of OR/SHOULD clauses that must match on the top level of a query. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 2:25 AM To: solr-user@lucene.apache.org Subject: eDismax parser and the mm parameter Hi All, I am planning to use the eDismax query parser in SOLR to give boost to documents that have a phrase in their fields present. Now there is a mm parameter in the edismax parser query , since the query typed by the user could be of any length (i.e. =1) I would like to set the mm value to 1 . I have the following questions regarding this parameter. 1. Is it set to 1 by default ? 2. In my schema.xml the defaultOperator is set to AND should I set it to OR inorder for the edismax parser to be effective with a mm of 1? Thanks in advance!
Re: eDismax parser and the mm parameter
It still depends on your objective - which you haven't told us yet. Show us some use cases and detail what your expectations are for each use case. The edismax phrase boosting is probably a lot more useful than messing around with mm. Take a look at pf, pf2, and pf3. See: http://wiki.apache.org/solr/ExtendedDisMax https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser The focus on mm may indeed be a classic XY Problem - a premature focus on a solution without detailing the problem. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 11:18 AM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Thanks Jack! I understand the intent of mm parameter, my question is that since the query terms being provided are not of fixed length I do not know what the mm should like for example Ginseng,Siberian Ginseng are my search terms. The first one can have an mm upto 1 and the second one can have an mm of upto 2 . Should I dynamically set the mm based on the number of search terms in my query ? Thanks again. On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.comwrote: 1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the minimum number of OR/SHOULD clauses that must match on the top level of a query. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 2:25 AM To: solr-user@lucene.apache.org Subject: eDismax parser and the mm parameter Hi All, I am planning to use the eDismax query parser in SOLR to give boost to documents that have a phrase in their fields present. Now there is a mm parameter in the edismax parser query , since the query typed by the user could be of any length (i.e. =1) I would like to set the mm value to 1 . I have the following questions regarding this parameter. 1. Is it set to 1 by default ? 2. In my schema.xml the defaultOperator is set to AND should I set it to OR inorder for the edismax parser to be effective with a mm of 1? Thanks in advance!
Re: Context-aware suggesters in Solr
Thanks Areek. So looking at the code in trunk, exposing it to Solr looks to be pretty straightforward - just extending DocumentDictionaryFactory to take a 'contextField' parameter as well, and passing that on to the DocumentDictionary constructor. I'll give it a go! Thanks again. Alan Woodward www.flax.co.uk On 29 Mar 2014, at 22:29, Areek Zillur wrote: The context field can only be set at configuration-time for the AnalyzingInfixSuggester (FYI: CONTEXTS_FIELD_NAME refers to the field in Lucene index that is internally maintained by the suggester and does not reflect any field in user's index). The context field can be specified and fed into the suggester using DocumentDictionary, DocumentValueSourceDictionary etc, (the support for contexts in FileDictionary is not there yet). The context-aware functionality is not yet exposed to Solr. There were attempts made to make Analyzing/FuzzySuggester to be context-aware (LUCENE-5350; patch might be outdated), but its still not in trunk (see jira discussion). Hope that helps, Areek On Fri, Mar 28, 2014 at 3:47 AM, Alan Woodward a...@flax.co.uk wrote: Hi all, I have a few of questions about the context-aware AnalyzingInfixSuggester: - is it possible to choose a specific field for the context at runtime (say, I want to limit suggestions by a field that I've already faceted on), or is it limited to the hardcoded CONTEXTS_FIELD_NAME? - is the context-aware functionality exposed to Solr yet? - how difficult would it be to add similar functionality to the other suggesters, if say I only wanted to do prefix matching? Thanks, Alan Woodward www.flax.co.uk
SolrCloud OR distributed Solr
Hello Member, Is there any difference between distributed solr solrCloud ? Consider I have three countries' product. I have indexed one country data and it's index size is 160 gb+ Now we have other two countries and now I am confused ! My client ask me what is the difference if we procure another Solr server and indexed separatelyI was thinking for solrcloud.Can someone explain how we can explain these two approaches in simple words and if there are any reading links please share. Thanks
Re: SolrCloud OR distributed Solr
On 30 March 2014 23:12, Priti Solanki pritiatw...@gmail.com wrote: Hello Member, Is there any difference between distributed solr solrCloud ? You might be confusing the older Solr distributed search with the new SolrCloud: * Older distributed search: https://wiki.apache.org/solr/DistributedSearch * SolrCloud: https://cwiki.apache.org/confluence/display/solr/SolrCloud Consider I have three countries' product. I have indexed one country data and it's index size is 160 gb+ Now we have other two countries and now I am confused ! My client ask me what is the difference if we procure another Solr server and indexed separatelyI was thinking for solrcloud.Can someone explain how we can explain these two approaches in simple words and if there are any reading links please share. With 4.0+ versions of Solr, you probably want to go for SolrCloud. Regards, Gora
Re: SolrCloud OR distributed Solr
Distributed solr is simply the ability for Solr to take the incoming query and send it to multiple shards, then aggregate the response. Here a shard is a physical partition of a single logical index. The assumption is that you can't fit the entire index on a single machine and still get the performance you need, so you use N smaller parts. So, there has to be some mechanism to send the request to each sub-index and assemble the response and give it back to the client. That's distrubuted solr. Before 4.0, splitting the index up was entirely manual, _you_ decided what document went to what shard. _you_ configured Solr to know about where the other shards were. _you_ handled the situation where a node went down and you had to heal the network. But it was still using distributed search As of 4.0, SolrCloud happens. The differences are 1 you can have Solr automatically distribute the docs to the right shard. 2 when a node goes down, Solr can automatically compensate (assuming more than one replica/shard) 3 when the node comes back up, Solr will automatically re-synchronize the node before (automatically) bringing it back into service NOTE: you can still use old-style manual sharding if you choose, it's available in 4.x But be careful here and draw a distinction between distributed search and federated search. Distributed search - what we've been talking about, the underlying assumption is that the sub-indexes are all substantially similar. Federated search - the sub-indexes (or, indeed, complete self-contained indexes) may have no relation to each other and you're somehow expected to search them all and return the results. In this case you'll probably be firing off N separate queries (one to each of N indexes) and assembling them at the app layer. Best, Erick On Sun, Mar 30, 2014 at 1:42 PM, Priti Solanki pritiatw...@gmail.com wrote: Hello Member, Is there any difference between distributed solr solrCloud ? Consider I have three countries' product. I have indexed one country data and it's index size is 160 gb+ Now we have other two countries and now I am confused ! My client ask me what is the difference if we procure another Solr server and indexed separatelyI was thinking for solrcloud.Can someone explain how we can explain these two approaches in simple words and if there are any reading links please share. Thanks
Re: zookeeper reconnect failure
We don’t currently retry, but I don’t think it would hurt much if we did - at least briefly. If you want to file a JIRA issue, that would be the best way to get it in a future release. -- Mark Miller about.me/markrmiller On March 28, 2014 at 5:40:47 PM, Michael Della Bitta (michael.della.bi...@appinions.com) wrote: Hi, Jessica, We've had a similar problem when DNS resolution of our Hadoop task nodes has failed. They tend to take a dirt nap until you fix the problem manually. Are you experiencing this in AWS as well? I'd say the two things to do are to poll the node state via HTTP using a monitoring tool so you get an immediate notification of the problem, and to install some sort of caching server like nscd if you expect to have DNS resolution failures regularly. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Fri, Mar 28, 2014 at 4:27 PM, Jessica Mallet mewmewb...@gmail.comwrote: Hi, First off, I'd like to give a disclaimer that this probably is a very edge case issue. However, since it happened to us, I would like to get some advice on how to best handle this failure scenario. Basically, we had some network issue where we temporarily lost connection and DNS. The zookeeper client properly triggered the watcher. However, when trying to reconnect, this following Exception is thrown: 2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java (line 121) :java.net.UnknownHostException: host name (scrubbed): Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258) at java.net.InetAddress.getAllByName0(InetAddress.java:1211) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at java.net.InetAddress.getAllByName(InetAddress.java:1063) at org.apache.zookeeper.client.StaticHostProvider.init(StaticHostProvider.java:60) at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:445) at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:380) at org.apache.solr.common.cloud.SolrZooKeeper.init(SolrZooKeeper.java:41) at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53) at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) I tried to look at the code and it seems that there'd be no further retries to connect to Zookeeper, and the node is basically left in a bad state and will not recover on its own. (Please correct me if I'm reading this wrong.) Thinking about it, this is probably fair, since normally you wouldn't expect retries to fix an unknown host issue--even though in our case it would have--but I'm wondering what we should do to handle this situation if it happens again in the future. Any advice is appreciated. Thanks, Jessica
Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2
RAM shouldn't be a problem. I have a box with 144GB RAM, running 12 instances with 4GB Java heap each. There are 9 instances wrting to 1TB of SSD disk space. Other 3 are writing to SATA drives, and have autosoftcommit disabled. -Original Message- From: Shawn Heisey elyog...@elyograg.org To: solr-user solr-user@lucene.apache.org Sent: Fri, Mar 28, 2014 8:35 pm Subject: Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2 On 3/28/2014 4:07 PM, Rishi Easwaran wrote: Shawn, I changed the autoSoftCommit value to 15000 (15 sec). My index size is pretty small ~4GB and its running on a SSD drive with ~100 GB space on it. Now I see the warn message every 15 seconds. The caches I think are minimal filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ queryResultMaxDocsCached200/queryResultMaxDocsCached I think still something is going on. I mean 15s on SSD drives is a long time to handle a 4GB index. How much RAM do you have and what size is your max java heap? https://wiki.apache.org/solr/SolrPerformanceProblems#RAM Thanks, Shawn
Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2
On 3/30/2014 2:59 PM, Rishi Easwaran wrote: RAM shouldn't be a problem. I have a box with 144GB RAM, running 12 instances with 4GB Java heap each. There are 9 instances wrting to 1TB of SSD disk space. Other 3 are writing to SATA drives, and have autosoftcommit disabled. This brought up more questions than it answered. I was assuming that you only had a total of 4GB of index data, but after reading this, I think my assumption may be incorrect. If you add up all the Solr index data on the SSD, how much disk space does it take? You should not be running more than one instance of Solr per machine. One instance of Solr can run multiple indexes. Running more than one results in quite a lot of overhead, and it seems unlikely that you would need to dedicate 48GB of total RAM to the Java heap. Thanks, Shawn
Re: eDismax parser and the mm parameter
Jacks Thanks Again, I am searching Chinese medicine documents , as the example I gave earlier a user can search for Ginseng or Siberian Ginseng or Red Siberian Ginseng , I certainly want to use pf parameter (which is not driven by mm parameter) , however for giving higher score to documents that have more of the terms I want to use edismax now if I give a mm of 3 and the search term is of only length 1 (like Ginseng) what does edisMax do ? On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky j...@basetechnology.comwrote: It still depends on your objective - which you haven't told us yet. Show us some use cases and detail what your expectations are for each use case. The edismax phrase boosting is probably a lot more useful than messing around with mm. Take a look at pf, pf2, and pf3. See: http://wiki.apache.org/solr/ExtendedDisMax https://cwiki.apache.org/confluence/display/solr/The+ Extended+DisMax+Query+Parser The focus on mm may indeed be a classic XY Problem - a premature focus on a solution without detailing the problem. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 11:18 AM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Thanks Jack! I understand the intent of mm parameter, my question is that since the query terms being provided are not of fixed length I do not know what the mm should like for example Ginseng,Siberian Ginseng are my search terms. The first one can have an mm upto 1 and the second one can have an mm of upto 2 . Should I dynamically set the mm based on the number of search terms in my query ? Thanks again. On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.com wrote: 1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the minimum number of OR/SHOULD clauses that must match on the top level of a query. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 2:25 AM To: solr-user@lucene.apache.org Subject: eDismax parser and the mm parameter Hi All, I am planning to use the eDismax query parser in SOLR to give boost to documents that have a phrase in their fields present. Now there is a mm parameter in the edismax parser query , since the query typed by the user could be of any length (i.e. =1) I would like to set the mm value to 1 . I have the following questions regarding this parameter. 1. Is it set to 1 by default ? 2. In my schema.xml the defaultOperator is set to AND should I set it to OR inorder for the edismax parser to be effective with a mm of 1? Thanks in advance!
Re: eDismax parser and the mm parameter
If you use pf, pf2, and pf3 and boost appropriately, the effects of mm will be dwarfed. The general goal is to assure that the top documents really are the best, not to necessarily limit the total document count. Focusing on the latter could be a real waste of time. It's still not clear why or how you need or want to use OR as the default operator - you still haven't given us a use case for that. To repeat: Give us a full set of use cases before taking this XY Problem approach of pursuing a solution before the problem is understood. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 6:14 PM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Jacks Thanks Again, I am searching Chinese medicine documents , as the example I gave earlier a user can search for Ginseng or Siberian Ginseng or Red Siberian Ginseng , I certainly want to use pf parameter (which is not driven by mm parameter) , however for giving higher score to documents that have more of the terms I want to use edismax now if I give a mm of 3 and the search term is of only length 1 (like Ginseng) what does edisMax do ? On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky j...@basetechnology.comwrote: It still depends on your objective - which you haven't told us yet. Show us some use cases and detail what your expectations are for each use case. The edismax phrase boosting is probably a lot more useful than messing around with mm. Take a look at pf, pf2, and pf3. See: http://wiki.apache.org/solr/ExtendedDisMax https://cwiki.apache.org/confluence/display/solr/The+ Extended+DisMax+Query+Parser The focus on mm may indeed be a classic XY Problem - a premature focus on a solution without detailing the problem. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 11:18 AM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Thanks Jack! I understand the intent of mm parameter, my question is that since the query terms being provided are not of fixed length I do not know what the mm should like for example Ginseng,Siberian Ginseng are my search terms. The first one can have an mm upto 1 and the second one can have an mm of upto 2 . Should I dynamically set the mm based on the number of search terms in my query ? Thanks again. On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.com wrote: 1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the minimum number of OR/SHOULD clauses that must match on the top level of a query. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 2:25 AM To: solr-user@lucene.apache.org Subject: eDismax parser and the mm parameter Hi All, I am planning to use the eDismax query parser in SOLR to give boost to documents that have a phrase in their fields present. Now there is a mm parameter in the edismax parser query , since the query typed by the user could be of any length (i.e. =1) I would like to set the mm value to 1 . I have the following questions regarding this parameter. 1. Is it set to 1 by default ? 2. In my schema.xml the defaultOperator is set to AND should I set it to OR inorder for the edismax parser to be effective with a mm of 1? Thanks in advance!
Re: eDismax parser and the mm parameter
Jack, I mis-stated the problem , I am not using the OR operator as default now(now that I think about it it does not make sense to use the default operator OR along with the mm parameter) , the reason I want to use pf and mm in conjunction is because of my understanding of the edismax parser and I have not looked into pf2 and pf3 parameters yet. I will state my understanding here below. Pf - Is used to boost the result score if the complete phrase matches. mm (less than) search term length would help limit the query results to a certain number of better matches. With that being said would it make sense to have dynamic mm (set to the length of search term - 1)? I also have a question around using a fuzzy search along with eDismax parser , but I will ask that in a seperate post once I go thru that aspect of eDismax parser. Thanks again ! On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky j...@basetechnology.comwrote: If you use pf, pf2, and pf3 and boost appropriately, the effects of mm will be dwarfed. The general goal is to assure that the top documents really are the best, not to necessarily limit the total document count. Focusing on the latter could be a real waste of time. It's still not clear why or how you need or want to use OR as the default operator - you still haven't given us a use case for that. To repeat: Give us a full set of use cases before taking this XY Problem approach of pursuing a solution before the problem is understood. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 6:14 PM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Jacks Thanks Again, I am searching Chinese medicine documents , as the example I gave earlier a user can search for Ginseng or Siberian Ginseng or Red Siberian Ginseng , I certainly want to use pf parameter (which is not driven by mm parameter) , however for giving higher score to documents that have more of the terms I want to use edismax now if I give a mm of 3 and the search term is of only length 1 (like Ginseng) what does edisMax do ? On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky j...@basetechnology.com wrote: It still depends on your objective - which you haven't told us yet. Show us some use cases and detail what your expectations are for each use case. The edismax phrase boosting is probably a lot more useful than messing around with mm. Take a look at pf, pf2, and pf3. See: http://wiki.apache.org/solr/ExtendedDisMax https://cwiki.apache.org/confluence/display/solr/The+ Extended+DisMax+Query+Parser The focus on mm may indeed be a classic XY Problem - a premature focus on a solution without detailing the problem. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 11:18 AM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Thanks Jack! I understand the intent of mm parameter, my question is that since the query terms being provided are not of fixed length I do not know what the mm should like for example Ginseng,Siberian Ginseng are my search terms. The first one can have an mm upto 1 and the second one can have an mm of upto 2 . Should I dynamically set the mm based on the number of search terms in my query ? Thanks again. On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.com wrote: 1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the minimum number of OR/SHOULD clauses that must match on the top level of a query. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 2:25 AM To: solr-user@lucene.apache.org Subject: eDismax parser and the mm parameter Hi All, I am planning to use the eDismax query parser in SOLR to give boost to documents that have a phrase in their fields present. Now there is a mm parameter in the edismax parser query , since the query typed by the user could be of any length (i.e. =1) I would like to set the mm value to 1 . I have the following questions regarding this parameter. 1. Is it set to 1 by default ? 2. In my schema.xml the defaultOperator is set to AND should I set it to OR inorder for the edismax parser to be effective with a mm of 1? Thanks in advance!
Re: eDismax parser and the mm parameter
The mm parameter is really only relevant when the default operator is OR or explicit OR operators are used. Again: Please provide your use case examples and your expectations for each use case. It really doesn't make a lot of sense to prematurely focus on a solution when you haven't clearly defined your use cases. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 9:13 PM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Jack, I mis-stated the problem , I am not using the OR operator as default now(now that I think about it it does not make sense to use the default operator OR along with the mm parameter) , the reason I want to use pf and mm in conjunction is because of my understanding of the edismax parser and I have not looked into pf2 and pf3 parameters yet. I will state my understanding here below. Pf - Is used to boost the result score if the complete phrase matches. mm (less than) search term length would help limit the query results to a certain number of better matches. With that being said would it make sense to have dynamic mm (set to the length of search term - 1)? I also have a question around using a fuzzy search along with eDismax parser , but I will ask that in a seperate post once I go thru that aspect of eDismax parser. Thanks again ! On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky j...@basetechnology.comwrote: If you use pf, pf2, and pf3 and boost appropriately, the effects of mm will be dwarfed. The general goal is to assure that the top documents really are the best, not to necessarily limit the total document count. Focusing on the latter could be a real waste of time. It's still not clear why or how you need or want to use OR as the default operator - you still haven't given us a use case for that. To repeat: Give us a full set of use cases before taking this XY Problem approach of pursuing a solution before the problem is understood. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 6:14 PM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Jacks Thanks Again, I am searching Chinese medicine documents , as the example I gave earlier a user can search for Ginseng or Siberian Ginseng or Red Siberian Ginseng , I certainly want to use pf parameter (which is not driven by mm parameter) , however for giving higher score to documents that have more of the terms I want to use edismax now if I give a mm of 3 and the search term is of only length 1 (like Ginseng) what does edisMax do ? On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky j...@basetechnology.com wrote: It still depends on your objective - which you haven't told us yet. Show us some use cases and detail what your expectations are for each use case. The edismax phrase boosting is probably a lot more useful than messing around with mm. Take a look at pf, pf2, and pf3. See: http://wiki.apache.org/solr/ExtendedDisMax https://cwiki.apache.org/confluence/display/solr/The+ Extended+DisMax+Query+Parser The focus on mm may indeed be a classic XY Problem - a premature focus on a solution without detailing the problem. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 11:18 AM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Thanks Jack! I understand the intent of mm parameter, my question is that since the query terms being provided are not of fixed length I do not know what the mm should like for example Ginseng,Siberian Ginseng are my search terms. The first one can have an mm upto 1 and the second one can have an mm of upto 2 . Should I dynamically set the mm based on the number of search terms in my query ? Thanks again. On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.com wrote: 1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the minimum number of OR/SHOULD clauses that must match on the top level of a query. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 2:25 AM To: solr-user@lucene.apache.org Subject: eDismax parser and the mm parameter Hi All, I am planning to use the eDismax query parser in SOLR to give boost to documents that have a phrase in their fields present. Now there is a mm parameter in the edismax parser query , since the query typed by the user could be of any length (i.e. =1) I would like to set the mm value to 1 . I have the following questions regarding this parameter. 1. Is it set to 1 by default ? 2. In my schema.xml the defaultOperator is set to AND should I set it to OR inorder for the edismax parser to be effective with a mm of 1? Thanks in
Re: eDismax parser and the mm parameter
Thanks Jack , my use cases are as follows. 1. Search for Ginseng everything related to ginseng should show up. 2. Search For White Siberian Ginseng results with the whole phrase show up first followed by 2 words from the phrase followed by a single word in the phrase 3. Fuzzy Search Whte Sberia Ginsng (please note the typos here) documents with White Siberian Ginseng Should show up , this looks like the most complicated of all as Solr does not support fuzzy phrase searches . (I have no solution for this yet). Thanks again! On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky j...@basetechnology.comwrote: The mm parameter is really only relevant when the default operator is OR or explicit OR operators are used. Again: Please provide your use case examples and your expectations for each use case. It really doesn't make a lot of sense to prematurely focus on a solution when you haven't clearly defined your use cases. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 9:13 PM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Jack, I mis-stated the problem , I am not using the OR operator as default now(now that I think about it it does not make sense to use the default operator OR along with the mm parameter) , the reason I want to use pf and mm in conjunction is because of my understanding of the edismax parser and I have not looked into pf2 and pf3 parameters yet. I will state my understanding here below. Pf - Is used to boost the result score if the complete phrase matches. mm (less than) search term length would help limit the query results to a certain number of better matches. With that being said would it make sense to have dynamic mm (set to the length of search term - 1)? I also have a question around using a fuzzy search along with eDismax parser , but I will ask that in a seperate post once I go thru that aspect of eDismax parser. Thanks again ! On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky j...@basetechnology.com wrote: If you use pf, pf2, and pf3 and boost appropriately, the effects of mm will be dwarfed. The general goal is to assure that the top documents really are the best, not to necessarily limit the total document count. Focusing on the latter could be a real waste of time. It's still not clear why or how you need or want to use OR as the default operator - you still haven't given us a use case for that. To repeat: Give us a full set of use cases before taking this XY Problem approach of pursuing a solution before the problem is understood. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 6:14 PM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Jacks Thanks Again, I am searching Chinese medicine documents , as the example I gave earlier a user can search for Ginseng or Siberian Ginseng or Red Siberian Ginseng , I certainly want to use pf parameter (which is not driven by mm parameter) , however for giving higher score to documents that have more of the terms I want to use edismax now if I give a mm of 3 and the search term is of only length 1 (like Ginseng) what does edisMax do ? On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky j...@basetechnology.com wrote: It still depends on your objective - which you haven't told us yet. Show us some use cases and detail what your expectations are for each use case. The edismax phrase boosting is probably a lot more useful than messing around with mm. Take a look at pf, pf2, and pf3. See: http://wiki.apache.org/solr/ExtendedDisMax https://cwiki.apache.org/confluence/display/solr/The+ Extended+DisMax+Query+Parser The focus on mm may indeed be a classic XY Problem - a premature focus on a solution without detailing the problem. -- Jack Krupansky -Original Message- From: S.L Sent: Sunday, March 30, 2014 11:18 AM To: solr-user@lucene.apache.org Subject: Re: eDismax parser and the mm parameter Thanks Jack! I understand the intent of mm parameter, my question is that since the query terms being provided are not of fixed length I do not know what the mm should like for example Ginseng,Siberian Ginseng are my search terms. The first one can have an mm upto 1 and the second one can have an mm of upto 2 . Should I dynamically set the mm based on the number of search terms in my query ? Thanks again. On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.com wrote: 1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the minimum number of OR/SHOULD clauses that must match on the top level of a query. -- Jack
how to index 20 MB plain-text xml
I have many plain text xml that I transfer to form of solr xml format. But every time I send them to solr, I hit OOM exception. How to configure solr to eat these big xml? Please guide me a way. Thanks floyd
Re: how to index 20 MB plain-text xml
Without digging too deep into why exactly this is happening, here are the general options: 0. Are you actually committing? Check the messages in the logs and see if the records show up when you expect them too. 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP buffer that's blowing up? Try using stream.file instead (notice security warning though): http://wiki.apache.org/solr/ContentStream 2. Split file into smaller ones and and commit each separately 3. Set hard auto-commit in solrconfig.xml based on number of documents to flush in-memory structures to disk 4. Switch to using DataImportHandler to pull from XML instead of pushing 5. Increase amount of memory to Solr (-X command line flags) Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu floyd...@gmail.com wrote: I have many plain text xml that I transfer to form of solr xml format. But every time I send them to solr, I hit OOM exception. How to configure solr to eat these big xml? Please guide me a way. Thanks floyd