eDismax parser and the mm parameter

2014-03-30 Thread S.L
Hi All,

I am planning to use the eDismax query parser in SOLR to give boost to
documents that have a phrase in their fields present. Now there is a mm
parameter in the edismax parser query , since the query typed by the user
could be of any length (i.e. =1) I would like to set the mm value to 1 . I
have the following questions regarding this parameter.

   1. Is it set to 1 by default ?
   2. In my schema.xml the defaultOperator is set to AND should I set it
   to OR inorder for the edismax parser to be effective with a mm of 1?


Thanks in advance!


Re: eDismax parser and the mm parameter

2014-03-30 Thread Jack Krupansky

1. Yes, the default for mm is 1.

2. It depends on what you are really trying to do - you haven't told us.

Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to 
q.op=AND.


Generally, use q.op unless you really know what you are doing.

Generally, the intent of mm is to set the minimum number of OR/SHOULD 
clauses that must match on the top level of a query.


-- Jack Krupansky

-Original Message- 
From: S.L

Sent: Sunday, March 30, 2014 2:25 AM
To: solr-user@lucene.apache.org
Subject: eDismax parser and the mm parameter

Hi All,

I am planning to use the eDismax query parser in SOLR to give boost to
documents that have a phrase in their fields present. Now there is a mm
parameter in the edismax parser query , since the query typed by the user
could be of any length (i.e. =1) I would like to set the mm value to 1 . I
have the following questions regarding this parameter.

  1. Is it set to 1 by default ?
  2. In my schema.xml the defaultOperator is set to AND should I set it
  to OR inorder for the edismax parser to be effective with a mm of 1?


Thanks in advance! 



Re: eDismax parser and the mm parameter

2014-03-30 Thread Ahmet Arslan
Hi,

Using mm=1 with (e)dismax is not a good idea. Your user will be unhappy. 
Because there in no coord factor with this parser.
coord is about : Typically, a document that contains more of the query's terms 
will receive a higher score than another document with fewer query terms.

I suggest you to use something more restrictive  : 3-1 680%  


I think there is a new feature autoRelax in some ticket. Even better start with 
mm=100% and relax mm value until you retrieve *enough* documents. 

It is OK to use default operator of OR with default operator because coord 
factor kicks in.

http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/search/Similarity.html#formula_coord

https://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29


Ahmet


On Sunday, March 30, 2014 12:21 PM, Jack Krupansky j...@basetechnology.com 
wrote:
1. Yes, the default for mm is 1.

2. It depends on what you are really trying to do - you haven't told us.

Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to 
q.op=AND.

Generally, use q.op unless you really know what you are doing.

Generally, the intent of mm is to set the minimum number of OR/SHOULD 
clauses that must match on the top level of a query.

-- Jack Krupansky


-Original Message- 
From: S.L
Sent: Sunday, March 30, 2014 2:25 AM
To: solr-user@lucene.apache.org
Subject: eDismax parser and the mm parameter

Hi All,

I am planning to use the eDismax query parser in SOLR to give boost to
documents that have a phrase in their fields present. Now there is a mm
parameter in the edismax parser query , since the query typed by the user
could be of any length (i.e. =1) I would like to set the mm value to 1 . I
have the following questions regarding this parameter.

   1. Is it set to 1 by default ?
   2. In my schema.xml the defaultOperator is set to AND should I set it
   to OR inorder for the edismax parser to be effective with a mm of 1?


Thanks in advance!


Re: eDismax parser and the mm parameter

2014-03-30 Thread simpleliving...@gmail.com
Thanks Ahmet.

So if its single term query like 'Ginseng' what does a mm=3 do to the query .I 
am guessing it would be reduced to 1 automatically in this case.

Sent from my HTC

- Reply message -
From: Ahmet Arslan iori...@yahoo.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Subject: eDismax parser and the mm parameter
Date: Sun, Mar 30, 2014 7:52 AM

Hi,

Using mm=1 with (e)dismax is not a good idea. Your user will be unhappy. 
Because there in no coord factor with this parser.
coord is about : Typically, a document that contains more of the query's terms 
will receive a higher score than another document with fewer query terms.

I suggest you to use something more restrictive  : 3-1 680%  


I think there is a new feature autoRelax in some ticket. Even better start with 
mm=100% and relax mm value until you retrieve *enough* documents. 

It is OK to use default operator of OR with default operator because coord 
factor kicks in.

http://lucene.apache.org/core/3_0_3/api/all/org/apache/lucene/search/Similarity.html#formula_coord

https://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29


Ahmet


On Sunday, March 30, 2014 12:21 PM, Jack Krupansky j...@basetechnology.com 
wrote:
1. Yes, the default for mm is 1.

2. It depends on what you are really trying to do - you haven't told us.

Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to 
q.op=AND.

Generally, use q.op unless you really know what you are doing.

Generally, the intent of mm is to set the minimum number of OR/SHOULD 
clauses that must match on the top level of a query.

-- Jack Krupansky


-Original Message- 
From: S.L
Sent: Sunday, March 30, 2014 2:25 AM
To: solr-user@lucene.apache.org
Subject: eDismax parser and the mm parameter

Hi All,

I am planning to use the eDismax query parser in SOLR to give boost to
documents that have a phrase in their fields present. Now there is a mm
parameter in the edismax parser query , since the query typed by the user
could be of any length (i.e. =1) I would like to set the mm value to 1 . I
have the following questions regarding this parameter.

1. Is it set to 1 by default ?
2. In my schema.xml the defaultOperator is set to AND should I set it
to OR inorder for the edismax parser to be effective with a mm of 1?


Thanks in advance!

Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Thanks Jack! I understand the intent of mm parameter, my question is that
since the query terms being provided are not of fixed length I do not know
what the mm should like for example Ginseng,Siberian Ginseng are my
search terms. The first one can have an mm upto 1 and the second one can
have an mm of upto 2 .

Should I dynamically set the mm based on the number of search terms in my
query ?

Thanks again.


On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.comwrote:

 1. Yes, the default for mm is 1.

 2. It depends on what you are really trying to do - you haven't told us.

 Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to
 q.op=AND.

 Generally, use q.op unless you really know what you are doing.

 Generally, the intent of mm is to set the minimum number of OR/SHOULD
 clauses that must match on the top level of a query.

 -- Jack Krupansky

 -Original Message- From: S.L
 Sent: Sunday, March 30, 2014 2:25 AM
 To: solr-user@lucene.apache.org
 Subject: eDismax parser and the mm parameter

 Hi All,

 I am planning to use the eDismax query parser in SOLR to give boost to
 documents that have a phrase in their fields present. Now there is a mm
 parameter in the edismax parser query , since the query typed by the user
 could be of any length (i.e. =1) I would like to set the mm value to 1 . I
 have the following questions regarding this parameter.

   1. Is it set to 1 by default ?
   2. In my schema.xml the defaultOperator is set to AND should I set it
   to OR inorder for the edismax parser to be effective with a mm of 1?


 Thanks in advance!



Re: eDismax parser and the mm parameter

2014-03-30 Thread Jack Krupansky
It still depends on your objective - which you haven't told us yet. Show us 
some use cases and detail what your expectations are for each use case.


The edismax phrase boosting is probably a lot more useful than messing 
around with mm. Take a look at pf, pf2, and pf3.


See:
http://wiki.apache.org/solr/ExtendedDisMax
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

The focus on mm may indeed be a classic XY Problem - a premature focus on 
a solution without detailing the problem.


-- Jack Krupansky

-Original Message- 
From: S.L

Sent: Sunday, March 30, 2014 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: eDismax parser and the mm parameter

Thanks Jack! I understand the intent of mm parameter, my question is that
since the query terms being provided are not of fixed length I do not know
what the mm should like for example Ginseng,Siberian Ginseng are my
search terms. The first one can have an mm upto 1 and the second one can
have an mm of upto 2 .

Should I dynamically set the mm based on the number of search terms in my
query ?

Thanks again.


On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky 
j...@basetechnology.comwrote:



1. Yes, the default for mm is 1.

2. It depends on what you are really trying to do - you haven't told us.

Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to
q.op=AND.

Generally, use q.op unless you really know what you are doing.

Generally, the intent of mm is to set the minimum number of OR/SHOULD
clauses that must match on the top level of a query.

-- Jack Krupansky

-Original Message- From: S.L
Sent: Sunday, March 30, 2014 2:25 AM
To: solr-user@lucene.apache.org
Subject: eDismax parser and the mm parameter

Hi All,

I am planning to use the eDismax query parser in SOLR to give boost to
documents that have a phrase in their fields present. Now there is a mm
parameter in the edismax parser query , since the query typed by the user
could be of any length (i.e. =1) I would like to set the mm value to 1 . 
I

have the following questions regarding this parameter.

  1. Is it set to 1 by default ?
  2. In my schema.xml the defaultOperator is set to AND should I set it
  to OR inorder for the edismax parser to be effective with a mm of 1?


Thanks in advance!





Re: Context-aware suggesters in Solr

2014-03-30 Thread Alan Woodward
Thanks Areek.  So looking at the code in trunk, exposing it to Solr looks to be 
pretty straightforward - just extending DocumentDictionaryFactory to take a 
'contextField' parameter as well, and passing that on to the DocumentDictionary 
constructor.  I'll give it a go!

Thanks again.

Alan Woodward
www.flax.co.uk


On 29 Mar 2014, at 22:29, Areek Zillur wrote:

 The context field can only be set at configuration-time for the
 AnalyzingInfixSuggester (FYI: CONTEXTS_FIELD_NAME refers to the field in
 Lucene index that is internally maintained by the suggester and does not
 reflect any field in user's index). The context field can be specified and
 fed into the suggester using DocumentDictionary,
 DocumentValueSourceDictionary etc, (the support for contexts in
 FileDictionary is not there yet).
 
 The context-aware functionality is not yet exposed to Solr.
 
 There were attempts made to make Analyzing/FuzzySuggester to be
 context-aware (LUCENE-5350; patch might be outdated), but its still not in
 trunk (see jira discussion).
 
 Hope that helps,
 
 Areek
 
 
 On Fri, Mar 28, 2014 at 3:47 AM, Alan Woodward a...@flax.co.uk wrote:
 
 Hi all,
 
 I have a few of questions about the context-aware AnalyzingInfixSuggester:
 - is it possible to choose a specific field for the context at runtime
 (say, I want to limit suggestions by a field that I've already faceted on),
 or is it limited to the hardcoded CONTEXTS_FIELD_NAME?
 - is the context-aware functionality exposed to Solr yet?
 - how difficult would it be to add similar functionality to the other
 suggesters, if say I only wanted to do prefix matching?
 
 Thanks,
 
 Alan Woodward
 www.flax.co.uk
 
 
 



SolrCloud OR distributed Solr

2014-03-30 Thread Priti Solanki
Hello Member,

Is there any difference between distributed solr  solrCloud ?

Consider I have three countries' product. I have indexed one country data
and it's index size is 160 gb+

Now we have other two countries and now I am confused !

My client ask me what is the difference if we procure another Solr server
and indexed separatelyI was thinking for solrcloud.Can someone explain
how we can explain these two approaches in simple words and if there are
any reading links please share.

Thanks


Re: SolrCloud OR distributed Solr

2014-03-30 Thread Gora Mohanty
On 30 March 2014 23:12, Priti Solanki pritiatw...@gmail.com wrote:

 Hello Member,

 Is there any difference between distributed solr  solrCloud ?

You might be confusing the older Solr distributed search with the new SolrCloud:
* Older distributed search: https://wiki.apache.org/solr/DistributedSearch
* SolrCloud: https://cwiki.apache.org/confluence/display/solr/SolrCloud

 Consider I have three countries' product. I have indexed one country data
 and it's index size is 160 gb+

 Now we have other two countries and now I am confused !

 My client ask me what is the difference if we procure another Solr server
 and indexed separatelyI was thinking for solrcloud.Can someone explain
 how we can explain these two approaches in simple words and if there are
 any reading links please share.

With 4.0+ versions of Solr, you probably want to go for SolrCloud.

Regards,
Gora


Re: SolrCloud OR distributed Solr

2014-03-30 Thread Erick Erickson
Distributed solr is simply the ability for Solr to take the incoming
query and send it to multiple shards, then aggregate the response.
Here a shard is a physical partition of a single logical index. The
assumption is that you can't fit the entire index on a single machine
and still get the performance you need, so you use N smaller parts.

So, there has to be some mechanism to send the request to each
sub-index and assemble the response and give it back to the client.
That's distrubuted solr.

Before 4.0, splitting the index up was entirely manual, _you_ decided
what document went to what shard. _you_ configured Solr to know
about where the other shards were. _you_ handled the situation where a
node went down and you had to heal the network. But it was still
using distributed search


As of 4.0, SolrCloud happens. The differences are
1 you can have Solr automatically distribute the docs to the right shard.
2 when a node goes down, Solr can automatically compensate (assuming
more than one replica/shard)
3 when the node comes back up, Solr will automatically re-synchronize
the node before (automatically) bringing it back into service

NOTE: you can still use old-style manual sharding if you choose, it's
available in 4.x

But be careful here and draw a distinction between distributed
search and federated search.
Distributed search - what we've been talking about, the underlying
assumption is that the sub-indexes are all substantially similar.

Federated search - the sub-indexes (or, indeed, complete
self-contained indexes) may have no relation to each other and you're
somehow expected to search them all and return the results. In this
case you'll probably be firing off N separate queries (one to each of
N indexes) and assembling them at the app layer.

Best,
Erick

On Sun, Mar 30, 2014 at 1:42 PM, Priti Solanki pritiatw...@gmail.com wrote:
 Hello Member,

 Is there any difference between distributed solr  solrCloud ?

 Consider I have three countries' product. I have indexed one country data
 and it's index size is 160 gb+

 Now we have other two countries and now I am confused !

 My client ask me what is the difference if we procure another Solr server
 and indexed separatelyI was thinking for solrcloud.Can someone explain
 how we can explain these two approaches in simple words and if there are
 any reading links please share.

 Thanks


Re: zookeeper reconnect failure

2014-03-30 Thread Mark Miller
We don’t currently retry, but I don’t think it would hurt much if we did - at 
least briefly.

If you want to file a JIRA issue, that would be the best way to get it in a 
future release.

-- 
Mark Miller
about.me/markrmiller

On March 28, 2014 at 5:40:47 PM, Michael Della Bitta 
(michael.della.bi...@appinions.com) wrote:

Hi, Jessica,  

We've had a similar problem when DNS resolution of our Hadoop task nodes  
has failed. They tend to take a dirt nap until you fix the problem  
manually. Are you experiencing this in AWS as well?  

I'd say the two things to do are to poll the node state via HTTP using a  
monitoring tool so you get an immediate notification of the problem, and to  
install some sort of caching server like nscd if you expect to have DNS  
resolution failures regularly.  



Michael Della Bitta  

Applications Developer  

o: +1 646 532 3062  

appinions inc.  

The Science of Influence Marketing  

18 East 41st Street  

New York, NY 10017  

t: @appinions https://twitter.com/Appinions | g+:  
plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
  
w: appinions.com http://www.appinions.com/  


On Fri, Mar 28, 2014 at 4:27 PM, Jessica Mallet mewmewb...@gmail.comwrote:  

 Hi,  
  
 First off, I'd like to give a disclaimer that this probably is a very edge  
 case issue. However, since it happened to us, I would like to get some  
 advice on how to best handle this failure scenario.  
  
 Basically, we had some network issue where we temporarily lost connection  
 and DNS. The zookeeper client properly triggered the watcher. However, when  
 trying to reconnect, this following Exception is thrown:  
  
 2014-03-27 17:24:46,882 ERROR [main-EventThread] SolrException.java (line  
 121) :java.net.UnknownHostException: host name (scrubbed): Name or  
 service not known  
 at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)  
 at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:866)  
 at  
 java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1258)  
 at java.net.InetAddress.getAllByName0(InetAddress.java:1211)  
 at java.net.InetAddress.getAllByName(InetAddress.java:1127)  
 at java.net.InetAddress.getAllByName(InetAddress.java:1063)  
 at  
  
 org.apache.zookeeper.client.StaticHostProvider.init(StaticHostProvider.java:60)
   
 at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:445)  
 at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:380)  
 at  
 org.apache.solr.common.cloud.SolrZooKeeper.init(SolrZooKeeper.java:41)  
 at  
  
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:53)
   
 at  
  
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:147)
   
 at  
  
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) 
  
 at  
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)  
  
 I tried to look at the code and it seems that there'd be no further retries  
 to connect to Zookeeper, and the node is basically left in a bad state and  
 will not recover on its own. (Please correct me if I'm reading this wrong.)  
 Thinking about it, this is probably fair, since normally you wouldn't  
 expect retries to fix an unknown host issue--even though in our case it  
 would have--but I'm wondering what we should do to handle this situation if  
 it happens again in the future.  
  
 Any advice is appreciated.  
  
 Thanks,  
 Jessica  
  


Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2014-03-30 Thread Rishi Easwaran
RAM shouldn't be a problem. 
I have a box with 144GB RAM, running 12 instances with 4GB Java heap each.
There are 9 instances wrting to 1TB of SSD disk space. 
 Other 3 are writing to SATA drives, and have autosoftcommit disabled.

 

 

-Original Message-
From: Shawn Heisey elyog...@elyograg.org
To: solr-user solr-user@lucene.apache.org
Sent: Fri, Mar 28, 2014 8:35 pm
Subject: Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2


On 3/28/2014 4:07 PM, Rishi Easwaran wrote:
 
  Shawn,
 
 I changed the autoSoftCommit value to 15000 (15 sec). 
 My index size is pretty small ~4GB and its running on a SSD drive with ~100 
 GB 
space on it. 
 Now I see the warn message every 15 seconds.
 
 The caches I think are minimal
 
 filterCache class=solr.FastLRUCache size=512 initialSize=512 
autowarmCount=0/
 
  queryResultCache class=solr.LRUCache size=512   
   
initialSize=512 autowarmCount=0/
  documentCache class=solr.LRUCache   size=512

initialSize=512   autowarmCount=0/
 
 queryResultMaxDocsCached200/queryResultMaxDocsCached
 
 I think still something is going on. I mean 15s on SSD drives is a long time 
to handle a 4GB index.

How much RAM do you have and what size is your max java heap?

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Thanks,
Shawn


 


Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2014-03-30 Thread Shawn Heisey
On 3/30/2014 2:59 PM, Rishi Easwaran wrote:
 RAM shouldn't be a problem. 
 I have a box with 144GB RAM, running 12 instances with 4GB Java heap each.
 There are 9 instances wrting to 1TB of SSD disk space. 
  Other 3 are writing to SATA drives, and have autosoftcommit disabled.

This brought up more questions than it answered.  I was assuming that
you only had a total of 4GB of index data, but after reading this, I
think my assumption may be incorrect.  If you add up all the Solr index
data on the SSD, how much disk space does it take?

You should not be running more than one instance of Solr per machine.
One instance of Solr can run multiple indexes.  Running more than one
results in quite a lot of overhead, and it seems unlikely that you would
need to dedicate 48GB of total RAM to the Java heap.

Thanks,
Shawn



Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Jacks Thanks Again,

I am searching  Chinese medicine  documents , as the example I gave earlier
a user can search for Ginseng or Siberian Ginseng or Red Siberian Ginseng
, I certainly want to use pf parameter (which is not driven by mm
parameter) , however for giving higher score to documents that have more of
the terms I want to use edismax now if I give a mm of 3 and the search term
is of only length 1 (like Ginseng) what does edisMax do ?


On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky j...@basetechnology.comwrote:

 It still depends on your objective - which you haven't told us yet. Show
 us some use cases and detail what your expectations are for each use case.

 The edismax phrase boosting is probably a lot more useful than messing
 around with mm. Take a look at pf, pf2, and pf3.

 See:
 http://wiki.apache.org/solr/ExtendedDisMax
 https://cwiki.apache.org/confluence/display/solr/The+
 Extended+DisMax+Query+Parser

 The focus on mm may indeed be a classic XY Problem - a premature focus
 on a solution without detailing the problem.

 -- Jack Krupansky

 -Original Message- From: S.L
 Sent: Sunday, March 30, 2014 11:18 AM
 To: solr-user@lucene.apache.org
 Subject: Re: eDismax parser and the mm parameter

 Thanks Jack! I understand the intent of mm parameter, my question is that
 since the query terms being provided are not of fixed length I do not know
 what the mm should like for example Ginseng,Siberian Ginseng are my
 search terms. The first one can have an mm upto 1 and the second one can
 have an mm of upto 2 .

 Should I dynamically set the mm based on the number of search terms in my
 query ?

 Thanks again.


 On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.com
 wrote:

  1. Yes, the default for mm is 1.

 2. It depends on what you are really trying to do - you haven't told us.

 Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to
 q.op=AND.

 Generally, use q.op unless you really know what you are doing.

 Generally, the intent of mm is to set the minimum number of OR/SHOULD
 clauses that must match on the top level of a query.

 -- Jack Krupansky

 -Original Message- From: S.L
 Sent: Sunday, March 30, 2014 2:25 AM
 To: solr-user@lucene.apache.org
 Subject: eDismax parser and the mm parameter

 Hi All,

 I am planning to use the eDismax query parser in SOLR to give boost to
 documents that have a phrase in their fields present. Now there is a mm
 parameter in the edismax parser query , since the query typed by the user
 could be of any length (i.e. =1) I would like to set the mm value to 1 .
 I
 have the following questions regarding this parameter.

   1. Is it set to 1 by default ?
   2. In my schema.xml the defaultOperator is set to AND should I set it
   to OR inorder for the edismax parser to be effective with a mm of 1?


 Thanks in advance!





Re: eDismax parser and the mm parameter

2014-03-30 Thread Jack Krupansky
If you use pf, pf2, and pf3 and boost appropriately, the effects of mm will 
be dwarfed.


The general goal is to assure that the top documents really are the best, 
not to necessarily limit the total document count. Focusing on the latter 
could be a real waste of time.


It's still not clear why or how you need or want to use OR as the default 
operator - you still haven't given us a use case for that.


To repeat: Give us a full set of use cases before taking this XY Problem 
approach of pursuing a solution before the problem is understood.


-- Jack Krupansky

-Original Message- 
From: S.L

Sent: Sunday, March 30, 2014 6:14 PM
To: solr-user@lucene.apache.org
Subject: Re: eDismax parser and the mm parameter

Jacks Thanks Again,

I am searching  Chinese medicine  documents , as the example I gave earlier
a user can search for Ginseng or Siberian Ginseng or Red Siberian Ginseng
, I certainly want to use pf parameter (which is not driven by mm
parameter) , however for giving higher score to documents that have more of
the terms I want to use edismax now if I give a mm of 3 and the search term
is of only length 1 (like Ginseng) what does edisMax do ?


On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky 
j...@basetechnology.comwrote:



It still depends on your objective - which you haven't told us yet. Show
us some use cases and detail what your expectations are for each use case.

The edismax phrase boosting is probably a lot more useful than messing
around with mm. Take a look at pf, pf2, and pf3.

See:
http://wiki.apache.org/solr/ExtendedDisMax
https://cwiki.apache.org/confluence/display/solr/The+
Extended+DisMax+Query+Parser

The focus on mm may indeed be a classic XY Problem - a premature focus
on a solution without detailing the problem.

-- Jack Krupansky

-Original Message- From: S.L
Sent: Sunday, March 30, 2014 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: eDismax parser and the mm parameter

Thanks Jack! I understand the intent of mm parameter, my question is that
since the query terms being provided are not of fixed length I do not know
what the mm should like for example Ginseng,Siberian Ginseng are my
search terms. The first one can have an mm upto 1 and the second one can
have an mm of upto 2 .

Should I dynamically set the mm based on the number of search terms in my
query ?

Thanks again.


On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.com
wrote:

 1. Yes, the default for mm is 1.


2. It depends on what you are really trying to do - you haven't told us.

Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to
q.op=AND.

Generally, use q.op unless you really know what you are doing.

Generally, the intent of mm is to set the minimum number of OR/SHOULD
clauses that must match on the top level of a query.

-- Jack Krupansky

-Original Message- From: S.L
Sent: Sunday, March 30, 2014 2:25 AM
To: solr-user@lucene.apache.org
Subject: eDismax parser and the mm parameter

Hi All,

I am planning to use the eDismax query parser in SOLR to give boost to
documents that have a phrase in their fields present. Now there is a mm
parameter in the edismax parser query , since the query typed by the user
could be of any length (i.e. =1) I would like to set the mm value to 1 .
I
have the following questions regarding this parameter.

  1. Is it set to 1 by default ?
  2. In my schema.xml the defaultOperator is set to AND should I set it
  to OR inorder for the edismax parser to be effective with a mm of 1?


Thanks in advance!








Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Jack,

 I mis-stated the problem , I am not using the OR operator as default
now(now that I think about it it does not make sense to use the default
operator OR along with the mm parameter) , the reason I want to use pf and
mm in conjunction is because of my understanding of the edismax parser and
I have not looked into pf2 and pf3 parameters yet.

I will state my understanding here below.

Pf -  Is used to boost the result score if the complete phrase matches.
mm (less than) search term length would help limit the query results  to a
certain number of better matches.

With that being said would it make sense to have dynamic mm (set to the
length of search term - 1)?

I also have a question around using a fuzzy search along with eDismax
parser , but I will ask that in a seperate post once I go thru that aspect
of eDismax parser.

Thanks again !





On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky j...@basetechnology.comwrote:

 If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
 will be dwarfed.

 The general goal is to assure that the top documents really are the best,
 not to necessarily limit the total document count. Focusing on the latter
 could be a real waste of time.

 It's still not clear why or how you need or want to use OR as the default
 operator - you still haven't given us a use case for that.

 To repeat: Give us a full set of use cases before taking this XY Problem
 approach of pursuing a solution before the problem is understood.

 -- Jack Krupansky

 -Original Message- From: S.L
 Sent: Sunday, March 30, 2014 6:14 PM
 To: solr-user@lucene.apache.org
 Subject: Re: eDismax parser and the mm parameter

 Jacks Thanks Again,

 I am searching  Chinese medicine  documents , as the example I gave earlier
 a user can search for Ginseng or Siberian Ginseng or Red Siberian Ginseng
 , I certainly want to use pf parameter (which is not driven by mm
 parameter) , however for giving higher score to documents that have more of
 the terms I want to use edismax now if I give a mm of 3 and the search term
 is of only length 1 (like Ginseng) what does edisMax do ?


 On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  It still depends on your objective - which you haven't told us yet. Show
 us some use cases and detail what your expectations are for each use case.

 The edismax phrase boosting is probably a lot more useful than messing
 around with mm. Take a look at pf, pf2, and pf3.

 See:
 http://wiki.apache.org/solr/ExtendedDisMax
 https://cwiki.apache.org/confluence/display/solr/The+
 Extended+DisMax+Query+Parser

 The focus on mm may indeed be a classic XY Problem - a premature focus
 on a solution without detailing the problem.

 -- Jack Krupansky

 -Original Message- From: S.L
 Sent: Sunday, March 30, 2014 11:18 AM
 To: solr-user@lucene.apache.org
 Subject: Re: eDismax parser and the mm parameter

 Thanks Jack! I understand the intent of mm parameter, my question is that
 since the query terms being provided are not of fixed length I do not know
 what the mm should like for example Ginseng,Siberian Ginseng are my
 search terms. The first one can have an mm upto 1 and the second one can
 have an mm of upto 2 .

 Should I dynamically set the mm based on the number of search terms in my
 query ?

 Thanks again.


 On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.com
 wrote:

  1. Yes, the default for mm is 1.


 2. It depends on what you are really trying to do - you haven't told us.

 Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to
 q.op=AND.

 Generally, use q.op unless you really know what you are doing.

 Generally, the intent of mm is to set the minimum number of OR/SHOULD
 clauses that must match on the top level of a query.

 -- Jack Krupansky

 -Original Message- From: S.L
 Sent: Sunday, March 30, 2014 2:25 AM
 To: solr-user@lucene.apache.org
 Subject: eDismax parser and the mm parameter

 Hi All,

 I am planning to use the eDismax query parser in SOLR to give boost to
 documents that have a phrase in their fields present. Now there is a mm
 parameter in the edismax parser query , since the query typed by the user
 could be of any length (i.e. =1) I would like to set the mm value to 1 .
 I
 have the following questions regarding this parameter.

   1. Is it set to 1 by default ?
   2. In my schema.xml the defaultOperator is set to AND should I set it
   to OR inorder for the edismax parser to be effective with a mm of 1?


 Thanks in advance!







Re: eDismax parser and the mm parameter

2014-03-30 Thread Jack Krupansky
The mm parameter is really only relevant when the default operator is OR or 
explicit OR operators are used.


Again: Please provide your use case examples and your expectations for each 
use case. It really doesn't make a lot of sense to prematurely focus on a 
solution when you haven't clearly defined your use cases.


-- Jack Krupansky

-Original Message- 
From: S.L

Sent: Sunday, March 30, 2014 9:13 PM
To: solr-user@lucene.apache.org
Subject: Re: eDismax parser and the mm parameter

Jack,

I mis-stated the problem , I am not using the OR operator as default
now(now that I think about it it does not make sense to use the default
operator OR along with the mm parameter) , the reason I want to use pf and
mm in conjunction is because of my understanding of the edismax parser and
I have not looked into pf2 and pf3 parameters yet.

I will state my understanding here below.

Pf -  Is used to boost the result score if the complete phrase matches.
mm (less than) search term length would help limit the query results  to a
certain number of better matches.

With that being said would it make sense to have dynamic mm (set to the
length of search term - 1)?

I also have a question around using a fuzzy search along with eDismax
parser , but I will ask that in a seperate post once I go thru that aspect
of eDismax parser.

Thanks again !





On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky 
j...@basetechnology.comwrote:



If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
will be dwarfed.

The general goal is to assure that the top documents really are the best,
not to necessarily limit the total document count. Focusing on the latter
could be a real waste of time.

It's still not clear why or how you need or want to use OR as the default
operator - you still haven't given us a use case for that.

To repeat: Give us a full set of use cases before taking this XY Problem
approach of pursuing a solution before the problem is understood.

-- Jack Krupansky

-Original Message- From: S.L
Sent: Sunday, March 30, 2014 6:14 PM
To: solr-user@lucene.apache.org
Subject: Re: eDismax parser and the mm parameter

Jacks Thanks Again,

I am searching  Chinese medicine  documents , as the example I gave 
earlier
a user can search for Ginseng or Siberian Ginseng or Red Siberian 
Ginseng

, I certainly want to use pf parameter (which is not driven by mm
parameter) , however for giving higher score to documents that have more 
of
the terms I want to use edismax now if I give a mm of 3 and the search 
term

is of only length 1 (like Ginseng) what does edisMax do ?


On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky j...@basetechnology.com
wrote:

 It still depends on your objective - which you haven't told us yet. Show
us some use cases and detail what your expectations are for each use 
case.


The edismax phrase boosting is probably a lot more useful than messing
around with mm. Take a look at pf, pf2, and pf3.

See:
http://wiki.apache.org/solr/ExtendedDisMax
https://cwiki.apache.org/confluence/display/solr/The+
Extended+DisMax+Query+Parser

The focus on mm may indeed be a classic XY Problem - a premature focus
on a solution without detailing the problem.

-- Jack Krupansky

-Original Message- From: S.L
Sent: Sunday, March 30, 2014 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: eDismax parser and the mm parameter

Thanks Jack! I understand the intent of mm parameter, my question is that
since the query terms being provided are not of fixed length I do not 
know

what the mm should like for example Ginseng,Siberian Ginseng are my
search terms. The first one can have an mm upto 1 and the second one can
have an mm of upto 2 .

Should I dynamically set the mm based on the number of search terms in my
query ?

Thanks again.


On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.com
wrote:

 1. Yes, the default for mm is 1.



2. It depends on what you are really trying to do - you haven't told us.

Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to
q.op=AND.

Generally, use q.op unless you really know what you are doing.

Generally, the intent of mm is to set the minimum number of OR/SHOULD
clauses that must match on the top level of a query.

-- Jack Krupansky

-Original Message- From: S.L
Sent: Sunday, March 30, 2014 2:25 AM
To: solr-user@lucene.apache.org
Subject: eDismax parser and the mm parameter

Hi All,

I am planning to use the eDismax query parser in SOLR to give boost to
documents that have a phrase in their fields present. Now there is a mm
parameter in the edismax parser query , since the query typed by the 
user
could be of any length (i.e. =1) I would like to set the mm value to 1 
.

I
have the following questions regarding this parameter.

  1. Is it set to 1 by default ?
  2. In my schema.xml the defaultOperator is set to AND should I set 
it

  to OR inorder for the edismax parser to be effective with a mm of 1?


Thanks in 

Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Thanks Jack , my use cases are as follows.


   1. Search for Ginseng everything related to ginseng should show up.
   2. Search For White Siberian Ginseng results with the whole phrase
   show up first followed by 2 words from the phrase followed by a single word
   in the phrase
   3. Fuzzy Search Whte Sberia Ginsng (please note the typos here)
   documents with White Siberian Ginseng Should show up , this looks like the
   most complicated of all as Solr does not support fuzzy phrase searches . (I
   have no solution for this yet).

Thanks again!


On Sun, Mar 30, 2014 at 11:21 PM, Jack Krupansky j...@basetechnology.comwrote:

 The mm parameter is really only relevant when the default operator is OR
 or explicit OR operators are used.

 Again: Please provide your use case examples and your expectations for
 each use case. It really doesn't make a lot of sense to prematurely focus
 on a solution when you haven't clearly defined your use cases.

 -- Jack Krupansky

 -Original Message- From: S.L
 Sent: Sunday, March 30, 2014 9:13 PM
 To: solr-user@lucene.apache.org
 Subject: Re: eDismax parser and the mm parameter

 Jack,

 I mis-stated the problem , I am not using the OR operator as default
 now(now that I think about it it does not make sense to use the default
 operator OR along with the mm parameter) , the reason I want to use pf and
 mm in conjunction is because of my understanding of the edismax parser and
 I have not looked into pf2 and pf3 parameters yet.

 I will state my understanding here below.

 Pf -  Is used to boost the result score if the complete phrase matches.
 mm (less than) search term length would help limit the query results  to a
 certain number of better matches.

 With that being said would it make sense to have dynamic mm (set to the
 length of search term - 1)?

 I also have a question around using a fuzzy search along with eDismax
 parser , but I will ask that in a seperate post once I go thru that aspect
 of eDismax parser.

 Thanks again !





 On Sun, Mar 30, 2014 at 6:44 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  If you use pf, pf2, and pf3 and boost appropriately, the effects of mm
 will be dwarfed.

 The general goal is to assure that the top documents really are the best,
 not to necessarily limit the total document count. Focusing on the latter
 could be a real waste of time.

 It's still not clear why or how you need or want to use OR as the default
 operator - you still haven't given us a use case for that.

 To repeat: Give us a full set of use cases before taking this XY Problem
 approach of pursuing a solution before the problem is understood.

 -- Jack Krupansky

 -Original Message- From: S.L
 Sent: Sunday, March 30, 2014 6:14 PM
 To: solr-user@lucene.apache.org
 Subject: Re: eDismax parser and the mm parameter

 Jacks Thanks Again,

 I am searching  Chinese medicine  documents , as the example I gave
 earlier
 a user can search for Ginseng or Siberian Ginseng or Red Siberian
 Ginseng
 , I certainly want to use pf parameter (which is not driven by mm
 parameter) , however for giving higher score to documents that have more
 of
 the terms I want to use edismax now if I give a mm of 3 and the search
 term
 is of only length 1 (like Ginseng) what does edisMax do ?


 On Sun, Mar 30, 2014 at 1:21 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  It still depends on your objective - which you haven't told us yet. Show

 us some use cases and detail what your expectations are for each use
 case.

 The edismax phrase boosting is probably a lot more useful than messing
 around with mm. Take a look at pf, pf2, and pf3.

 See:
 http://wiki.apache.org/solr/ExtendedDisMax
 https://cwiki.apache.org/confluence/display/solr/The+
 Extended+DisMax+Query+Parser

 The focus on mm may indeed be a classic XY Problem - a premature focus
 on a solution without detailing the problem.

 -- Jack Krupansky

 -Original Message- From: S.L
 Sent: Sunday, March 30, 2014 11:18 AM
 To: solr-user@lucene.apache.org
 Subject: Re: eDismax parser and the mm parameter

 Thanks Jack! I understand the intent of mm parameter, my question is that
 since the query terms being provided are not of fixed length I do not
 know
 what the mm should like for example Ginseng,Siberian Ginseng are my
 search terms. The first one can have an mm upto 1 and the second one can
 have an mm of upto 2 .

 Should I dynamically set the mm based on the number of search terms in my
 query ?

 Thanks again.


 On Sun, Mar 30, 2014 at 5:20 AM, Jack Krupansky j...@basetechnology.com
 
 wrote:

  1. Yes, the default for mm is 1.


 2. It depends on what you are really trying to do - you haven't told us.

 Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to
 q.op=AND.

 Generally, use q.op unless you really know what you are doing.

 Generally, the intent of mm is to set the minimum number of OR/SHOULD
 clauses that must match on the top level of a query.

 -- Jack 

how to index 20 MB plain-text xml

2014-03-30 Thread Floyd Wu
I have many plain text xml that I transfer to form of solr xml format.
But every time I send them to solr, I hit OOM exception.
How to configure solr to eat these big xml?
Please guide me a way. Thanks

floyd


Re: how to index 20 MB plain-text xml

2014-03-30 Thread Alexandre Rafalovitch
Without digging too deep into why exactly this is happening, here are
the general options:

0. Are you actually committing? Check the messages in the logs and see
if the records show up when you expect them too.
1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP
buffer that's blowing up? Try using stream.file instead (notice
security warning though): http://wiki.apache.org/solr/ContentStream
2. Split file into smaller ones and and commit each separately
3. Set hard auto-commit in solrconfig.xml based on number of documents
to flush in-memory structures to disk
4. Switch to using DataImportHandler to pull from XML instead of pushing
5. Increase amount of memory to Solr (-X command line flags)

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu floyd...@gmail.com wrote:
 I have many plain text xml that I transfer to form of solr xml format.
 But every time I send them to solr, I hit OOM exception.
 How to configure solr to eat these big xml?
 Please guide me a way. Thanks

 floyd