Re: [Migration Solr4 to Solr5] Collection reload error

2016-03-10 Thread Dmitry Kan
Thanks Shawn,

Missed the openSearcher=false setting.

So another thing to check really is whether there are concurrent
commitWithin calls ever to the same shard.
10 марта 2016 г. 4:39 PM пользователь "Shawn Heisey" 
написал:

> On 3/10/2016 3:05 AM, Dmitry Kan wrote:
> > The only thing that I spot is that you use both auto-commit with 900 sec
> > frequency AND commitWithin. Solr is smart enough to skip empty commits.
> But
> > if auto-commit kicks in during the doc add / delete, there will be at
> least
> > two commits ongoing. Could you change you Full recovery case to commit
> > eventually from the client code? Then you won't need the autoCommit
> section.
>
> The autoCommit config has openSearcher=false, so I wouldn't touch it.
> With version 4.0 and later, autoCommit with openSearcher=false should be
> part of *every* config.  The commitWithin parameter *will* open a new
> searcher, so it has no conflict with an autoCommit config using
> openSearcher=false.
>
> Thanks,
> Shawn
>
>


timeAllowed

2016-03-10 Thread Anil
HI,

is timeallowed is max threshold of Qtime ? or overall time ? Please clarify.

Thanks,
Anil


Re: Multiple custom Similarity implementations

2016-03-10 Thread Parvesh Garg
Hi Ahmet,

Thanks for the pointer. I have similar thoughts on the subject. The risk
assumptions are based on not testing your stuff before taking it in. That
risk is still valid with similarity configuration. And sometimes, it may
not be possible to use multiple similarities (custom or otherwise). But
overall, it seems like a nice feature to have.



Parvesh Garg,
Head of Engineering

http://www.zettata.com

On Thu, Mar 10, 2016 at 3:05 PM, Ahmet Arslan 
wrote:

> Hi Parvesh,
>
> Please see the similar discussion :
> http://search-lucene.com/m/eHNlijx91I7etm1
>
> Ahmet
>
>
> On Thursday, March 10, 2016 6:57 AM, Parvesh Garg 
> wrote:
>
>
>
> Thanks Markus. We will look at other options. May I ask what can be the
> reasons for not supporting this ever?
>
>
> Parvesh Garg,
>
> http://www.zettata.com
>
>
> On Tue, Mar 8, 2016 at 8:59 PM, Markus Jelsma 
> wrote:
>
> > Hello, you can not change similarities per request, and this is likely
> > never going to be supported for good reasons. You need multiple cores, or
> > multiple fields with different similarity defined in the same core.
> > Markus
> >
> > -Original message-
> > > From:Parvesh Garg 
> > > Sent: Tuesday 8th March 2016 5:36
> > > To: solr-user@lucene.apache.org
> > > Subject: Multiple custom Similarity implementations
> > >
> > > Hi,
> > >
> > > We have a requirement where we want to run an A/B test over multiple
> > > Similarity implementations. Is it possible to define multiple
> similarity
> > > tags in schema.xml file and chose one using the URL parameter? We are
> > using
> > > solr 4.7
> > >
> > > Currently, we are planning to have different cores with different
> > > similarity configured and split traffic based on core names. This is
> > > leading to index duplication and un-necessary resource usage.
> > >
> > > Any help is highly appreciated.
> > >
> > > Parvesh Garg,
> > >
> > > http://www.zettata.com
> > >
> >
>


Using group.ngroups during query search

2016-03-10 Thread Zheng Lin Edwin Yeo
Hi,

I would like to check, will using the results grouping with group.ngroups
(which will include the number of groups that have matched the query) in
the search affects the performance of the Solr? I found that the searching
speed has slowed down quite significantly after I added in the
group.ngroups parameters.

I required the value of the number of groups that have matched the query.
Besides this, is there other way which I can retrieve that value?

I have more than 10 million documents, with an index size of more than
500GB, and I'm using Solr 5.4.0.

Regards,
Edwin


Re: Load pre-built index to Solr

2016-03-10 Thread Erick Erickson
bq: is there a better way to load a pre-built index quickly like
before

In a word (well, two) "Collection Aliasing". You have two collections
and an alias.
So your search URL stays constant, say 'aliasedcollection'. Then you index to
collectionA and point aliasedcollection to it. That's your live system.

Now you index to collectionB, and when it's done just point
aliasedcollection to collectionB.

You can still address collectionA and collectionB explicitly. So the
indexing process becomes
> point the alias to collectionA
> delete all docs on collectionB
> index to collectionB
> point the alias to collectionB

Repeat  switching A for B next time you index.

See the Collections API CREATEALIAS command.

Best,
Erick

On Thu, Mar 10, 2016 at 3:00 PM, praneethvarma
 wrote:
> I'm building an index on HDFS using the MapReduceIndexerTool which I'd later
> like to load into my Solr cores with minimal delay. With Solr 4.4, I was
> able to switch out the underlying index directory of a core (I don't need to
> keep any of the existing index) and reload the core, and it worked fine. I'm
> upgrading to Solr 4.10.3 which behaves little differently. Upon reload it
> deletes all the index files that are not referenced by the SegmentInfo that
> was in memory (which would not know about the new index files). I end up
> with a clean index directory after reload. To work around this, I'm creating
> a new core with a datadirectory that already has the index I built for the
> same shard and then unloading the original core hoping for this new core to
> become the leader. But the problem here is that the new core gets stuck in
> the recovering state and cannot join the leader election since its state is
> "recovering". However, after one hour, I think (from the logs) is updating
> the status of these cores to "down" and they are brought back up. Then the
> core registers itself as a leader.
>
> Firstly, I'm trying to force a leader elect (including this recovering
> core).
>
> Secondly, I'm very curious as to what happens every 1 hour (or this is
> probably a timeout). I just want to understand.
>
> Thirdly, is there a better way to load a pre-built index quickly like
> before?
>
> Can anyone help me find answers to above questions?
>
> Thanks in advance
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Load-pre-built-index-to-Solr-tp4263162.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Timeout error during commit

2016-03-10 Thread Erick Erickson
Probably more than you want to know about commits, hard and soft:
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Thu, Mar 10, 2016 at 3:40 PM, Shawn Heisey  wrote:
> On 3/10/2016 4:06 PM, Steven White wrote:
>> Last question on this topic (maybe), wouldn't a commit at the very end take
>> too long on a 1 billion items?  Wouldn't a commit every, lets say 10,000
>> items be more efficient?
>
> The behavior that I have witnessed suggests that commit speed on a
> well-tuned index depends more on the autowarm config than anything
> else.  The total size of the index might make a difference, but I
> suspect that the slow commit times I've seen on large shards are just
> from the autowarming -- each warming query takes longer if the index is
> large.
>
> If you have the autoCommit config I recommended, the "last" commit
> should be very fast, because those auto commits will flush data to disk
> as you index, and the final manual commit should only need to deal with
> data that has not yet been flushed.
>
> More info than you wanted (TL;DR):  Even if you don't do the autoCommit,
> you'll find that indexing tons of data without any commit at all *will*
> cause older segments to be flushed to disk ... but the transaction logs
> won't be rotated, and that's a whole separate problem.
>
> Thanks,
> Shawn
>


Re: Timeout error during commit

2016-03-10 Thread Shawn Heisey
On 3/10/2016 4:06 PM, Steven White wrote:
> Last question on this topic (maybe), wouldn't a commit at the very end take
> too long on a 1 billion items?  Wouldn't a commit every, lets say 10,000
> items be more efficient?

The behavior that I have witnessed suggests that commit speed on a
well-tuned index depends more on the autowarm config than anything
else.  The total size of the index might make a difference, but I
suspect that the slow commit times I've seen on large shards are just
from the autowarming -- each warming query takes longer if the index is
large.

If you have the autoCommit config I recommended, the "last" commit
should be very fast, because those auto commits will flush data to disk
as you index, and the final manual commit should only need to deal with
data that has not yet been flushed.

More info than you wanted (TL;DR):  Even if you don't do the autoCommit,
you'll find that indexing tons of data without any commit at all *will*
cause older segments to be flushed to disk ... but the transaction logs
won't be rotated, and that's a whole separate problem.

Thanks,
Shawn



Solr Stats or Analytic Query Help

2016-03-10 Thread Gopal Patwa
I am trying write query to get below stats for orders data

Find Total Order Count, sum(Quantity) and sum(Cost) for specified date
range in a gap of 1 day. example if date range is for 10 days then get
these result for every day for 10 days.

Solr Version : 5.2.1

Example Order Solr Doc



2014-09-30T18:56:17Z

2

233.00



Query example using range facet to get count but not sure how to get
sum(quantity) and sum(cost) in same query. is there other way to get these
data from single or multiple query. single query would be better

solr/orders/select?q=*%3A*=xml=true=true=ORDER_DATE_DATE.facet.range.start=NOW/DAY-1DAYS_DATE.facet.range.end=NOW/DAY%2B10DAYS_DATE.facet.range.gap=%2B1DAY


Response:















1289

295

0

0

0

0

0

0

0

0

0



+1DAY

2016-03-09T00:00:00Z

2016-03-20T00:00:00Z










Re: Solr on AIX

2016-03-10 Thread Shawn Heisey
On 3/10/2016 9:07 AM, Stephane Bouchard wrote:
> Hi, the company where I work is planning to migrate to AIX. Does anyone had
> any issues running Solr 5 on AIX?

Solr's start scripts were designed for the OS tools found on recent
versions of free operating systems like Linux, FreeBSD, etc.  When the
system tools are proprietary, which is typical of systems like AIX,
there's a chance that Solr's scripts will not work right.

Another possible problem is that AIX most likely includes IBM's Java. 
IBM's Java is strongly discouraged -- that JVM has bugs that show up
when running Lucene/Solr. 

https://wiki.apache.org/lucene-java/JavaBugs#IBM_J9_Bugs

One user trying to run on Solr on AIX had some problems that were
documented in Jira.  These problems were due to differences between
IBM's Java and the other major implementations -- Oracle and OpenJDK. 
You definitely should install Oracle or OpenJDK and set the JAVA_HOME
variable.  I would recommend the latest 1.8 version.

https://issues.apache.org/jira/browse/SOLR-7924

Thanks,
Shawn



Re: Timeout error during commit

2016-03-10 Thread Steven White
Got it.

Last question on this topic (maybe), wouldn't a commit at the very end take
too long on a 1 billion items?  Wouldn't a commit every, lets say 10,000
items be more efficient?

Steve

On Thu, Mar 10, 2016 at 5:44 PM, Shawn Heisey  wrote:

> On 3/10/2016 3:29 PM, Steven White wrote:
> > Thanks you for your insight Shawn, they are always valuable.
> >
> > Question, if I wait to the very end to issue a commit, wouldn't that
> mean I
> > could lose everything if there was an OOM or some other server issue?  I
> > don't have any commit setting set in my solrconfig.xml.
>
> This should not be a worry.  The transaction log should keep everything
> safe.
>
> As I said before, no matter what your intentions with commits are, you
> do want to have autoCommit with openSearcher set to false and a
> reasonably long maxTime.  I recommend one minute or five minutes, but
> you will see 15 seconds commonly recommended.  I use the longer time
> because I don't want Solr to be spending a lot of time doing commits.  A
> commit that doesn't open a new searcher is pretty quick, but it still
> requires CPU/memory/IO resources.
>
> Thanks,
> Shawn
>
>


Load pre-built index to Solr

2016-03-10 Thread praneethvarma
I'm building an index on HDFS using the MapReduceIndexerTool which I'd later
like to load into my Solr cores with minimal delay. With Solr 4.4, I was
able to switch out the underlying index directory of a core (I don't need to
keep any of the existing index) and reload the core, and it worked fine. I'm
upgrading to Solr 4.10.3 which behaves little differently. Upon reload it
deletes all the index files that are not referenced by the SegmentInfo that
was in memory (which would not know about the new index files). I end up
with a clean index directory after reload. To work around this, I'm creating
a new core with a datadirectory that already has the index I built for the
same shard and then unloading the original core hoping for this new core to
become the leader. But the problem here is that the new core gets stuck in
the recovering state and cannot join the leader election since its state is
"recovering". However, after one hour, I think (from the logs) is updating
the status of these cores to "down" and they are brought back up. Then the
core registers itself as a leader. 

Firstly, I'm trying to force a leader elect (including this recovering
core).

Secondly, I'm very curious as to what happens every 1 hour (or this is
probably a timeout). I just want to understand.

Thirdly, is there a better way to load a pre-built index quickly like
before? 

Can anyone help me find answers to above questions?

Thanks in advance



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Load-pre-built-index-to-Solr-tp4263162.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Timeout error during commit

2016-03-10 Thread Shawn Heisey
On 3/10/2016 3:29 PM, Steven White wrote:
> Thanks you for your insight Shawn, they are always valuable.
>
> Question, if I wait to the very end to issue a commit, wouldn't that mean I
> could lose everything if there was an OOM or some other server issue?  I
> don't have any commit setting set in my solrconfig.xml.

This should not be a worry.  The transaction log should keep everything
safe.

As I said before, no matter what your intentions with commits are, you
do want to have autoCommit with openSearcher set to false and a
reasonably long maxTime.  I recommend one minute or five minutes, but
you will see 15 seconds commonly recommended.  I use the longer time
because I don't want Solr to be spending a lot of time doing commits.  A
commit that doesn't open a new searcher is pretty quick, but it still
requires CPU/memory/IO resources.

Thanks,
Shawn



Foot, Inch: Stripping Out Special Characters: DisMax: WhitespaceTokenizer vs. Keyword Tokenizer

2016-03-10 Thread Fuad Efendi
Hello,


I finally got it work: search for 5’ 3” (5 feet 3 inches)

It is strange for me that if I use WhitespaceTokenizer for field query-type 
analyzer then it will receive only 5 and 3 with special characters removed.

It is also strange that EDisMax does not strips out odd number of quotes.

But it works fine with KeywordTokenizer.

Any idea why? Thanks,


-- 
Fuad Efendi
http://www.tokenizer.ca
Data Mining, Vertical Search

Re: Timeout error during commit

2016-03-10 Thread Steven White
Thanks you for your insight Shawn, they are always valuable.

Question, if I wait to the very end to issue a commit, wouldn't that mean I
could lose everything if there was an OOM or some other server issue?  I
don't have any commit setting set in my solrconfig.xml.

Steve

On Wed, Mar 9, 2016 at 8:32 PM, Shawn Heisey  wrote:

> On 3/9/2016 6:10 PM, Steven White wrote:
> > I'm indexing about 1 billion records (each are small Solr doc, no more
> than
> > 20 bytes each).  The logic is basically as follows:
> >
> > while (data-of-1-billion) {
> > read-1000-items from DB
> > at-100-items send 100 items to Solr: i.e.:
> > solrConnection.add(docs);
> > }
> > solrConnection.commit()
> >
> > I'm seeing the following expection from SolrJ:
> >
> > org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> > waiting response from server at: http://localhost:8983/solr/test_data
> 
> > Which tells me it took Solr a bit over 5 sec. to complete the commit.
> >
> > Now when I created the Solr connection, I used 5 seconds like so:
> >
> > solrClient.setConnectionTimeout(5000;
> >   solrClient.setSoTimeout(5000);
> >
> > Two questions:
> >
> > 1) Is the time out error because of my use of 5000?
> > 2) Should I be calling "solrConnection.commit()" every now and than
> inside
> > the loop?
>
> Yes, this problem is happening because you set the SoTimeout value to 5
> seconds.  This is an inactivity timeout on the TCP socket.  It's not
> clear whether the problem happened on the commit operation or on the add
> operation -- it could be either.
>
> Your SoTimeout value should either remain unset, or should be set to
> something *significantly* longer than you ever expect the request to
> take.  I would suggest something between five and fifteen minutes.  I
> use fifteen minutes.  This is long enough that it should only be reached
> if there's a real problem, but short enough that my build program will
> not hang indefinitely, and will have an opportunity to send me email to
> tell me there's a problem.
>
> I would suggest that you don't do *any* commits until the end of the
> loop -- after all one billion docs have been indexed.  If you want to do
> them in your loop, set up something that will do them far less
> frequently, perhaps every 100 times through the loop.  You could include
> a commitWithin parameter on the add request instead of sending actual
> commits, which I would recommend you set to a fairly large value.  I
> would use at least five minutes, but never less than one minute.
> Alternately, you could configure autoSoftCommit in your solrconfig.xml
> file.  I would recommend a maxTime value on that config of at least five
> minutes.
>
> Also, consider increasing your batch size to something larger than 100
> or 1000.  Use 1 or more.  With 20 byte documents, you could send a
> LOT of documents in each batch without worrying too much about memory.
>
> Regardless of what else you do with commits, if you're running at least
> Solr 4.0, your solrconfig.xml file should include an autoCommit section
> configured with openSearcher set to false and a maxTime between one and
> five minutes.
>
> By now, I hope you've seen a recommendation to read this blog post:
>
>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Thanks,
> Shawn
>
>


Re: NoSuchFileException errors common on version 5.5.0

2016-03-10 Thread Shawn Heisey
On 3/10/2016 12:18 PM, Shawn Heisey wrote:
> I pulled down branch_5_5 and installed a 5.5.1 snapshot.  Had to edit
> lucene/version.properties to get it to be 5.5.1.  I also had to edit the
> SolrIdentifierValidator class to allow hyphens, since I have them in
> some of my core names.  The NoSuchFileException errors are gone now.

Spoke too soon.

The log message did change a little bit.  Now it's only one log entry on
LukeRequestHandler instead of two separate log entries, and it's a WARN
instead of ERROR.

2016-03-10 14:35:00.038 WARN  (qtp1012570586-11405) [   x:spark3live]
org.apache.solr.handler.admin.LukeRequestHandler Error getting file
length for [segments_c5t]
java.nio.file.NoSuchFileException:
/index/solr5/data/data/spark3_0/index/segments_c5t
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.Files.size(Files.java:2332)
at
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:210)

Something else to note:  It wasn't 5.5.0 that I had installed, it was
5.5.0-SNAPSHOT -- I installed it some time before 5.5.0 was released. 
Looks like I did the install of that version on January 29th.

Thanks,
Shawn



Re: Query behavior.

2016-03-10 Thread Jack Krupansky
We probably need a Jira to investigate whether this really is an explicitly
intentional feature change, or whether it really is a bug. And if it truly
was intentional, how people can work around the change to get the desired,
pre-5.5 behavior. Personally, I always thought it was a mistake that q.op
and mm were so tightly linked in Solr even though they are independent in
Lucene.

In short, I think people want to be able to set the default behavior for
individual terms (MUST vs. SHOULD) if explicit operators are not used, and
that OR is an explicit operator. And that mm should control only how many
SHOULD terms are required (Lucene MinShouldMatch.)


-- Jack Krupansky

On Thu, Mar 10, 2016 at 3:41 AM, Modassar Ather 
wrote:

> Thanks Shawn for pointing to the jira issue. I was not sure that if it is
> an expected behavior or a bug or there could have been a way to get the
> desired result.
>
> Best,
> Modassar
>
> On Thu, Mar 10, 2016 at 11:32 AM, Shawn Heisey 
> wrote:
>
> > On 3/9/2016 10:55 PM, Shawn Heisey wrote:
> > > The ~2 syntax, when not attached to a phrase query (quotes) is the way
> > > you express a fuzzy query. If it's attached to a query in quotes, then
> > > it is a proximity query. I'm not sure whether it means something
> > > different when it's attached to a query clause in parentheses, someone
> > > with more knowledge will need to comment.
> > 
> > > https://issues.apache.org/jira/browse/SOLR-8812
> >
> > After I read SOLR-8812 more closely, it seems that the ~2 syntax with
> > parentheses is the way that the effective mm value is expressed for a
> > particular query clause in the parsed query.  I've learned something new
> > today.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: ngrams with position

2016-03-10 Thread Jack Krupansky
I suspect that what you really want is analogous to PF2/PF3, but based on
the ngram terms that come out of query token analysis rather than using
pairs/triples of source terms before analysis that are then analyzed as
phrases so that all of the ngrams for a PF2/PF3 phrase must be in order
rather potentially shuffled.

Also, phrase query is an implicit AND while you may really want more of a
SpanOr query where the terms are ORed but must be within a close proximity.

-- Jack Krupansky

On Thu, Mar 10, 2016 at 6:31 AM, Alessandro Benedetti  wrote:

> The reason pf2 and pf3 seems not a good solution to me is the fact that the
> edismax query parser calculate those grams on top of words shingles.
> So it takes the query in input, and produces the shingle based on the white
> space separator.
>
> i.e. if you search :
> "white tiger jumping"
>  and pf2 configured on field1.
> You are going to end up searching in field1 :
> "white tiger", "tiger jumping" .
> This is really useful in full text search oriented to phrases and partial
> phrases match.
> But it has nothing to do with the analysis type associated at query time at
> this moment.
> First it is used the query parser tokenisation to build the grams and then
> the query time analysis is applied.
> This according to my remembering,
> I will double check in the code and let you know.
>
> Cheers
>
>
> On 10 March 2016 at 11:02, elisabeth benoit 
> wrote:
>
> > That's the use cas, yes. Find Amsterdam with Asmtreadm.
> >
> > And yes, we're only doing approximative search if we get 0 result.
> >
> > I don't quite get why pf2 pf3 not a good solution.
> >
> > We're actually testing a solution close to phonetic. Some kind of word
> > reduction.
> >
> > Thanks for the suggestion (and the link), this makes me think maybe
> > phonetic is the good solution.
> >
> > Thanks for your help,
> > Elisabeth
> >
> > 2016-03-10 11:32 GMT+01:00 Alessandro Benedetti :
> >
> > >  If I followed your use case is:
> > >
> > > I type Asmtreadm and I want document matching Amsterdam ( even if the
> > edit
> > > distance is greater than 2) .
> > > First of all is something I hope you do only if you get 0 results, if
> not
> > > the overhead can be great and you are going to lose a lot of precision
> > > causing confusion in the customer.
> > >
> > > Pf2 and Pf3 is ngram of white space separated tokens, to make partial
> > > phrase query to affect the scoring.
> > > Not a good fit for your problem.
> > >
> > > More than grams, have you considered using some sort of phonetic
> > matching ?
> > > Could this help :
> > > https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching
> > >
> > > Cheers
> > >
> > > On 10 March 2016 at 08:47, elisabeth benoit  >
> > > wrote:
> > >
> > > > I am trying to do approximative search with solr. We've tried fuzzy
> > > search,
> > > > and spellcheck search, it's working ok but edit distance is limited
> > (to 2
> > > > for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator,
> we've
> > > had
> > > > performance issues, and I don't think you can have an edit distance
> > more
> > > > than 2.
> > > >
> > > > What we used to do with a database was more efficient: storing
> trigrams
> > > > with position, and then searching arround that position (not
> precisely
> > at
> > > > that position, since it's approximative search)
> > > >
> > > > Position is to avoid  for a trigram like ams (amsterdam) to get
> answers
> > > > where the same trigram is for instance at the end of the word. I
> would
> > > like
> > > > answers with the same relative position between trigrams to score
> > higher.
> > > > Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see
> any
> > > > other way. Please tell me if you do.
> > > >
> > > > From you're answer, I get that position is stored, but I dont
> > understand
> > > > how I can preserve relative order between trigrams, apart from using
> > pf2
> > > > pf3.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > > > 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti <
> abenede...@apache.org
> > >:
> > > >
> > > > > if you store the positions for your tokens ( and it is by default
> if
> > > you
> > > > > don't omit them), you have the relative position in the index. [1]
> > > > > I attach a blog post of mine, describing a little bit more in
> details
> > > the
> > > > > lucene internals.
> > > > >
> > > > > Apart from that, can you explain the problem you are trying to
> solve
> > ?
> > > > > The high level user experience ?
> > > > > What kind of search/autocompletion/relevancy tuning are you trying
> to
> > > > > achieve ?
> > > > > Maybe we can help better if we start from the problem :)
> > > > >
> > > > > Cheers
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html
> > > > >
> > > > > On 9 March 2016 

Re: Query result cache not getting inserted for query lasting > 5secs

2016-03-10 Thread Erick Erickson
Let's see the query. If you do anything with dates using NOW (without rounding),
the queries are actually not the same since NOW resolves itself to
the epoch and will change every millisecond.

Best,
Erick

On Thu, Mar 10, 2016 at 12:18 AM, Murali TV  wrote:
> Hi,
>
> I have a query that takes about 5secs to complete. The result count is
> about 250 million, and row size is about 25.
> The problem is that this query result is not getting loaded to the query
> cache, so it takes ~5secs every time its issued.  I also confirmed this by
> looking at the cache stats page in the admin console. The query cache size
> and inserts counts don't increase. The cache lookup counts increment but
> cache hits count stays the same for every instance of the query.
>
> If I modify the query such that the query returns in about 4.5 secs, this
>  second query does get cached and subsequent calls return in milliseconds.
> The cache stats page shows the cache size and insert counts increase by 1
> for the first query, and cache hits count increases for subsequent queries.
>
> What's the reason for the first query not getting cached when the request
> takes >5 secs? Is there any configuration around this?
>
> Thanks.
> Murali


Re: Clarification on +, and in edismax parser

2016-03-10 Thread Erick Erickson
Here's a _very_ useful explanation of why the query syntax isn't pure Boolean:
https://lucidworks.com/blog/2011/12/28/why-not-and-or-and-not/

Best,
Erick

On Thu, Mar 10, 2016 at 12:30 AM, Anil  wrote:
> Thank you Dikshant.
>
> On 10 March 2016 at 13:26, Dikshant Shahi  wrote:
>
>> Hi,
>>
>> No, + and "and" doesn't works similar. Even "and" and "AND" would have a
>> different behavior (is configurable) in edismax.
>>
>> When you put a + before a term, you specify that it's mandatory. Hence,
>> "+google +india" will get you the same result as "google AND india".
>>
>> Best Regards,
>> *Dikshant Shahi*
>>
>>
>>
>> On Thu, Mar 10, 2016 at 12:59 PM, Anil  wrote:
>>
>> >  "google"+"india" ,  "india"+"google" returning different results. Any
>> help
>> > would be appreciated.
>> >
>> > Thanks,
>> > Anil
>> >
>> >
>> > On 10 March 2016 at 11:47, Anil  wrote:
>> >
>> > > HI,
>> > >
>> > > I am using edismax query parser for my solr search.
>> > >
>> > > i believe '+' and 'and' should work similar.
>> > >
>> > > ex : "google"+"india", "google" and "india" should return same number
>> of
>> > > results.
>> > >
>> > > Correct me if I am wrong. Thanks.
>> > >
>> > > Regards,
>> > > Anil
>> > >
>> > >
>> > >
>> >
>>


Re: Very slow updates

2016-03-10 Thread Erick Erickson
This really doesn't have much information to go on.

Have you reviewed: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed?

What is "slow"? How are you updating? Are you batching updates? Are
you committing often?

Details matter.

Best,
Erick

On Thu, Mar 10, 2016 at 2:41 AM, michael solomon  wrote:
> Hi,
> I have a collection with one shard in solrcloud (for development before
> scaling) and when I'm trying to update new documents it's take about 20 sec
> for 12mb of data.
> What wrong with my config?
>
> VM RAM - 28gb
> JVM-Memory - 10gb
>
> What else can I do?
>
> Thanks,
> Michael


Re: NoSuchFileException errors common on version 5.5.0

2016-03-10 Thread Shawn Heisey
On 3/10/2016 10:09 AM, Kevin Risden wrote:
> This sounds related to SOLR-8587 and there is a fix in SOLR-8793 that isn't
> out in a release since it was fixed after 5.5 went out.

Thanks for that info.

I pulled down branch_5_5 and installed a 5.5.1 snapshot.  Had to edit
lucene/version.properties to get it to be 5.5.1.  I also had to edit the
SolrIdentifierValidator class to allow hyphens, since I have them in
some of my core names.  The NoSuchFileException errors are gone now.

Thanks,
Shawn



Re: Query on Highlights

2016-03-10 Thread Anil
i have tested with large documents with large values of hl.maxAnalyzedChars,
i can see highlights now. thanks.


On 10 March 2016 at 22:29, Anil  wrote:

> HI,
>
> i have indexed large files (around 10 mb) in a text field with stored and
> indexed as true.
> Search of a text against the field returning  records but highlights are
> empty for few documents.
>
> is it because of default hl.maxAnalyzedChars ?
>
> Please let me know if you need any additional information. Thanks.
>
> Regards,
> Anil
>
>
>


Re: NoSuchFileException errors common on version 5.5.0

2016-03-10 Thread Kevin Risden
This sounds related to SOLR-8587 and there is a fix in SOLR-8793 that isn't
out in a release since it was fixed after 5.5 went out.

Kevin Risden
Hadoop Tech Lead | Avalon Consulting, LLC 
M: 732 213 8417
LinkedIn  | Google+
 | Twitter


-
This message (including any attachments) contains confidential information
intended for a specific individual and purpose, and is protected by law. If
you are not the intended recipient, you should delete this message. Any
disclosure, copying, or distribution of this message, or the taking of any
action based on it, is strictly prohibited.

On Thu, Mar 10, 2016 at 11:02 AM, Shawn Heisey  wrote:

> I have a dev system running 5.5.0.  I am seeing a lot of
> NoSuchFileException errors (for segments_XXXfilenames).
>
> Here's a log excerpt:
>
> 2016-03-10 09:52:00.054 INFO  (qtp1012570586-821) [   x:inclive]
> org.apache.solr.core.SolrCore.Request [inclive]  webapp=/solr
> path=/admin/luke
> params={qt=/admin/luke=schema=javabin=2} status=500 QTime=1
> 2016-03-10 09:52:00.055 ERROR (qtp1012570586-821) [   x:inclive]
> org.apache.solr.servlet.HttpSolrCall
> null:java.nio.file.NoSuchFileException:
> /index/solr5/data/data/inc_0/index/segments_ias
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
>
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at
>
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
> at
>
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.nio.file.Files.readAttributes(Files.java:1737)
> at java.nio.file.Files.size(Files.java:2332)
> at
> org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:209)
> 
>
> I did not include the full stacktrace, only up to the first Lucene/Solr
> class.
>
> Most of the error logs are preceded by a request to the /admin/luke
> handler, like you see above, but there are also entries where a failed
> request is not logged right before the error.  My index maintenance
> program calls /admin/luke to programmatically determine the uniqueKey
> for the index.
>
> These errors do not seem to actually interfere with Solr operation, but
> they do concern me.
>
> Thanks,
> Shawn
>
>


NoSuchFileException errors common on version 5.5.0

2016-03-10 Thread Shawn Heisey
I have a dev system running 5.5.0.  I am seeing a lot of
NoSuchFileException errors (for segments_XXXfilenames).

Here's a log excerpt:

2016-03-10 09:52:00.054 INFO  (qtp1012570586-821) [   x:inclive]
org.apache.solr.core.SolrCore.Request [inclive]  webapp=/solr
path=/admin/luke
params={qt=/admin/luke=schema=javabin=2} status=500 QTime=1
2016-03-10 09:52:00.055 ERROR (qtp1012570586-821) [   x:inclive]
org.apache.solr.servlet.HttpSolrCall
null:java.nio.file.NoSuchFileException:
/index/solr5/data/data/inc_0/index/segments_ias
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.Files.size(Files.java:2332)
at
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:209)


I did not include the full stacktrace, only up to the first Lucene/Solr
class.

Most of the error logs are preceded by a request to the /admin/luke
handler, like you see above, but there are also entries where a failed
request is not logged right before the error.  My index maintenance
program calls /admin/luke to programmatically determine the uniqueKey
for the index.

These errors do not seem to actually interfere with Solr operation, but
they do concern me.

Thanks,
Shawn



Query on Highlights

2016-03-10 Thread Anil
HI,

i have indexed large files (around 10 mb) in a text field with stored and
indexed as true.
Search of a text against the field returning  records but highlights are
empty for few documents.

is it because of default hl.maxAnalyzedChars ?

Please let me know if you need any additional information. Thanks.

Regards,
Anil


Solr debug 'explain' values differ from the Solr score

2016-03-10 Thread Rick Sullivan
Hi,
I'm seeing behavior in Solr 5.5.0 where the top-level values I see in the debug 
response don't always correspond with the scores Solr assigns to the matched 
documents.

For example, here is the top-level debug information for two documents matched 
by a query:
114628: Objectdescription: "sum of:"details: Array[2]match: truevalue: 20.542768
357547: Objectdescription: "sum of:"details: Array[2]match: truevalue: 26.517654
But they have scores114628: 20.542767357547: 13.258826
I expect the second document to be the most relevant for my query, and the 
debug values seem to agree. However, in the final score I receive, that 
document's score has been adjusted down.
The relevant debug response information can be found here: 
http://apaste.info/mju
Does anyone have an idea why the Solr score may differ from the debug value?
Thanks,-Rick  

Solr on AIX

2016-03-10 Thread Stephane Bouchard
Hi, the company where I work is planning to migrate to AIX. Does anyone had
any issues running Solr 5 on AIX?

Thanks
SB


Facets of nested docs when parent docs are grouped

2016-03-10 Thread Jhon Smith
Is it bug or by design: if i group docs with option "=true" then 
facet counts are grouped and "represent" groups, including the fact that no 
count can be larger than number of groups.
But when the docs have nested docs and i additionally fetch neested docs facet 
with option "child.facet.field=some_field" then those facets are not grouped 
(they still relate to parent docs but not to "superparent" groups).

Example:
4 parent documents, each with 2 nested documents; parent documents will be 
grouped to 2 groups with 2 documents each.


*:*
 
  
   1  
   11  
   parentDoc   
   RED
   
111
nestedDoc   
L  
   
  
112
nestedDoc   
L  
  


   1  
   12  
   parentDoc   
   RED
   
121
nestedDoc   
L  
   
  
122
nestedDoc   
XL 
  


   2  
   21  
   parentDoc   
   RED
   
211
nestedDoc   
L  
   
  
212
nestedDoc   
XXL
  


   2  
   22  
   parentDoc   
   BLUE
   
221
nestedDoc   
XL 
   
  
222
nestedDoc   
XXL
  






Requests:
1. /bjqfacet?q={!parent 
which=doc_type_s:parentDoc}doc_type_s:nestedDoc=size_s

Returns 4 parent documents, nested facets reflect it: no count larger than 4 
and for example size L count is only 3 since for the first parent doc it 
appeared in both nested docs, so for this parent doc it counts for 1.


3
2
2




2. /bjqfacet?q={!parent 
which=doc_type_s:parentDoc}doc_type_s:nestedDoc=size_s=true=color_s

Adding color facet:


3
1


3
2
2




3. /bjqfacet?q={!parent 
which=doc_type_s:parentDoc}doc_type_s:nestedDoc=size_s=true=group_id_s=true=true=color_s

Adding grouping. Returns 2 groups. 
Color facets are grouped here reflecting groups. In a similar way in 1st 
request size facets changed to reflect parent documents, not nested documents. 
But here, after grouping, size facet counts do not change: they still reflect 
parent documents, not groups (for example size L count = 3 > number of groups = 
2).


2
1


3
2
2


So is it bug or by design? If bug when it could be fixed? 
And how it can be fixed locally? I guess when grouping happens info about 
nested doc facets is already known and hence they can be grouped in a similar 
way like usual parent doc fields?


Re: [Migration Solr4 to Solr5] Collection reload error

2016-03-10 Thread Shawn Heisey
On 3/10/2016 3:05 AM, Dmitry Kan wrote:
> The only thing that I spot is that you use both auto-commit with 900 sec
> frequency AND commitWithin. Solr is smart enough to skip empty commits. But
> if auto-commit kicks in during the doc add / delete, there will be at least
> two commits ongoing. Could you change you Full recovery case to commit
> eventually from the client code? Then you won't need the autoCommit section.

The autoCommit config has openSearcher=false, so I wouldn't touch it. 
With version 4.0 and later, autoCommit with openSearcher=false should be
part of *every* config.  The commitWithin parameter *will* open a new
searcher, so it has no conflict with an autoCommit config using
openSearcher=false.

Thanks,
Shawn



Re: how to force rescan of core.properties file in solr

2016-03-10 Thread Shawn Heisey
On 3/10/2016 3:00 AM, Gian Maria Ricci - aka Alkampfer wrote:
> but this change in core.properties is not available until I restart
> the service and Solr does core autodiscovery. Issuing a Core RELOAD
> does not work.
>
>  
>
> How I can force solr to reload core.properties when I change it?
>

Through your experiments, you have confirmed something that I
suspected:  The core.properties file is only read when Solr first starts
up -- during core discovery.  I think it would probably be a very major
effort to change this, but there may be a much easier way that the
project could allow  properties that can change on reload.

>From what I can read in the code, not even the filename in the
"properties" element in the core.properties file is re-checked on core
reload.  The reload action simply re-uses the CoreDescriptor object,
which is where these things are held.

Unless there's another properties file that I'm not aware of that *does*
get checked when a core gets re-loaded, I think you've got an excellent
use case for an "Improvement" issue in Jira.

Here's the change that I think Solr needs:  When a core is reloaded, all
property definitions that originated in the file referenced by the
"properties" property should be dropped and re-read.

Thanks,
Shawn



Re: ngrams with position

2016-03-10 Thread elisabeth benoit
oh yeah, now that you're saying it, yeah you're right, pf2 pf3 will boost
proximity between words, not between ngrams.

Thanks again,
Elisabeth

2016-03-10 12:31 GMT+01:00 Alessandro Benedetti :

> The reason pf2 and pf3 seems not a good solution to me is the fact that the
> edismax query parser calculate those grams on top of words shingles.
> So it takes the query in input, and produces the shingle based on the white
> space separator.
>
> i.e. if you search :
> "white tiger jumping"
>  and pf2 configured on field1.
> You are going to end up searching in field1 :
> "white tiger", "tiger jumping" .
> This is really useful in full text search oriented to phrases and partial
> phrases match.
> But it has nothing to do with the analysis type associated at query time at
> this moment.
> First it is used the query parser tokenisation to build the grams and then
> the query time analysis is applied.
> This according to my remembering,
> I will double check in the code and let you know.
>
> Cheers
>
>
> On 10 March 2016 at 11:02, elisabeth benoit 
> wrote:
>
> > That's the use cas, yes. Find Amsterdam with Asmtreadm.
> >
> > And yes, we're only doing approximative search if we get 0 result.
> >
> > I don't quite get why pf2 pf3 not a good solution.
> >
> > We're actually testing a solution close to phonetic. Some kind of word
> > reduction.
> >
> > Thanks for the suggestion (and the link), this makes me think maybe
> > phonetic is the good solution.
> >
> > Thanks for your help,
> > Elisabeth
> >
> > 2016-03-10 11:32 GMT+01:00 Alessandro Benedetti :
> >
> > >  If I followed your use case is:
> > >
> > > I type Asmtreadm and I want document matching Amsterdam ( even if the
> > edit
> > > distance is greater than 2) .
> > > First of all is something I hope you do only if you get 0 results, if
> not
> > > the overhead can be great and you are going to lose a lot of precision
> > > causing confusion in the customer.
> > >
> > > Pf2 and Pf3 is ngram of white space separated tokens, to make partial
> > > phrase query to affect the scoring.
> > > Not a good fit for your problem.
> > >
> > > More than grams, have you considered using some sort of phonetic
> > matching ?
> > > Could this help :
> > > https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching
> > >
> > > Cheers
> > >
> > > On 10 March 2016 at 08:47, elisabeth benoit  >
> > > wrote:
> > >
> > > > I am trying to do approximative search with solr. We've tried fuzzy
> > > search,
> > > > and spellcheck search, it's working ok but edit distance is limited
> > (to 2
> > > > for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator,
> we've
> > > had
> > > > performance issues, and I don't think you can have an edit distance
> > more
> > > > than 2.
> > > >
> > > > What we used to do with a database was more efficient: storing
> trigrams
> > > > with position, and then searching arround that position (not
> precisely
> > at
> > > > that position, since it's approximative search)
> > > >
> > > > Position is to avoid  for a trigram like ams (amsterdam) to get
> answers
> > > > where the same trigram is for instance at the end of the word. I
> would
> > > like
> > > > answers with the same relative position between trigrams to score
> > higher.
> > > > Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see
> any
> > > > other way. Please tell me if you do.
> > > >
> > > > From you're answer, I get that position is stored, but I dont
> > understand
> > > > how I can preserve relative order between trigrams, apart from using
> > pf2
> > > > pf3.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > > > 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti <
> abenede...@apache.org
> > >:
> > > >
> > > > > if you store the positions for your tokens ( and it is by default
> if
> > > you
> > > > > don't omit them), you have the relative position in the index. [1]
> > > > > I attach a blog post of mine, describing a little bit more in
> details
> > > the
> > > > > lucene internals.
> > > > >
> > > > > Apart from that, can you explain the problem you are trying to
> solve
> > ?
> > > > > The high level user experience ?
> > > > > What kind of search/autocompletion/relevancy tuning are you trying
> to
> > > > > achieve ?
> > > > > Maybe we can help better if we start from the problem :)
> > > > >
> > > > > Cheers
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html
> > > > >
> > > > > On 9 March 2016 at 15:02, elisabeth benoit <
> > elisaelisael...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello Alessandro,
> > > > > >
> > > > > > You may be right. What would you use to keep relative order
> > between,
> > > > for
> > > > > > instance, grams
> > > > > >
> > > > > > __a
> > > > > > _am
> > > > > > ams
> > > > > > mst
> > > > > > ste
> > > > > > 

Re: ngrams with position

2016-03-10 Thread Alessandro Benedetti
The reason pf2 and pf3 seems not a good solution to me is the fact that the
edismax query parser calculate those grams on top of words shingles.
So it takes the query in input, and produces the shingle based on the white
space separator.

i.e. if you search :
"white tiger jumping"
 and pf2 configured on field1.
You are going to end up searching in field1 :
"white tiger", "tiger jumping" .
This is really useful in full text search oriented to phrases and partial
phrases match.
But it has nothing to do with the analysis type associated at query time at
this moment.
First it is used the query parser tokenisation to build the grams and then
the query time analysis is applied.
This according to my remembering,
I will double check in the code and let you know.

Cheers


On 10 March 2016 at 11:02, elisabeth benoit 
wrote:

> That's the use cas, yes. Find Amsterdam with Asmtreadm.
>
> And yes, we're only doing approximative search if we get 0 result.
>
> I don't quite get why pf2 pf3 not a good solution.
>
> We're actually testing a solution close to phonetic. Some kind of word
> reduction.
>
> Thanks for the suggestion (and the link), this makes me think maybe
> phonetic is the good solution.
>
> Thanks for your help,
> Elisabeth
>
> 2016-03-10 11:32 GMT+01:00 Alessandro Benedetti :
>
> >  If I followed your use case is:
> >
> > I type Asmtreadm and I want document matching Amsterdam ( even if the
> edit
> > distance is greater than 2) .
> > First of all is something I hope you do only if you get 0 results, if not
> > the overhead can be great and you are going to lose a lot of precision
> > causing confusion in the customer.
> >
> > Pf2 and Pf3 is ngram of white space separated tokens, to make partial
> > phrase query to affect the scoring.
> > Not a good fit for your problem.
> >
> > More than grams, have you considered using some sort of phonetic
> matching ?
> > Could this help :
> > https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching
> >
> > Cheers
> >
> > On 10 March 2016 at 08:47, elisabeth benoit 
> > wrote:
> >
> > > I am trying to do approximative search with solr. We've tried fuzzy
> > search,
> > > and spellcheck search, it's working ok but edit distance is limited
> (to 2
> > > for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, we've
> > had
> > > performance issues, and I don't think you can have an edit distance
> more
> > > than 2.
> > >
> > > What we used to do with a database was more efficient: storing trigrams
> > > with position, and then searching arround that position (not precisely
> at
> > > that position, since it's approximative search)
> > >
> > > Position is to avoid  for a trigram like ams (amsterdam) to get answers
> > > where the same trigram is for instance at the end of the word. I would
> > like
> > > answers with the same relative position between trigrams to score
> higher.
> > > Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see any
> > > other way. Please tell me if you do.
> > >
> > > From you're answer, I get that position is stored, but I dont
> understand
> > > how I can preserve relative order between trigrams, apart from using
> pf2
> > > pf3.
> > >
> > > Best regards,
> > > Elisabeth
> > >
> > > 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti  >:
> > >
> > > > if you store the positions for your tokens ( and it is by default if
> > you
> > > > don't omit them), you have the relative position in the index. [1]
> > > > I attach a blog post of mine, describing a little bit more in details
> > the
> > > > lucene internals.
> > > >
> > > > Apart from that, can you explain the problem you are trying to solve
> ?
> > > > The high level user experience ?
> > > > What kind of search/autocompletion/relevancy tuning are you trying to
> > > > achieve ?
> > > > Maybe we can help better if we start from the problem :)
> > > >
> > > > Cheers
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html
> > > >
> > > > On 9 March 2016 at 15:02, elisabeth benoit <
> elisaelisael...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Alessandro,
> > > > >
> > > > > You may be right. What would you use to keep relative order
> between,
> > > for
> > > > > instance, grams
> > > > >
> > > > > __a
> > > > > _am
> > > > > ams
> > > > > mst
> > > > > ste
> > > > > ter
> > > > > erd
> > > > > rda
> > > > > dam
> > > > > am_
> > > > >
> > > > > of amsterdam? pf2 and pf3? That's all I can think about. Please let
> > me
> > > > know
> > > > > if you have more insights.
> > > > >
> > > > > Best regards,
> > > > > Elisabeth
> > > > >
> > > > > 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti <
> > abenede...@apache.org
> > > >:
> > > > >
> > > > > > Elizabeth,
> > > > > > out of curiousity, could we know what you are trying to solve
> with
> > > that
> > > > > > complex way of 

Re: ngrams with position

2016-03-10 Thread elisabeth benoit
That's the use cas, yes. Find Amsterdam with Asmtreadm.

And yes, we're only doing approximative search if we get 0 result.

I don't quite get why pf2 pf3 not a good solution.

We're actually testing a solution close to phonetic. Some kind of word
reduction.

Thanks for the suggestion (and the link), this makes me think maybe
phonetic is the good solution.

Thanks for your help,
Elisabeth

2016-03-10 11:32 GMT+01:00 Alessandro Benedetti :

>  If I followed your use case is:
>
> I type Asmtreadm and I want document matching Amsterdam ( even if the edit
> distance is greater than 2) .
> First of all is something I hope you do only if you get 0 results, if not
> the overhead can be great and you are going to lose a lot of precision
> causing confusion in the customer.
>
> Pf2 and Pf3 is ngram of white space separated tokens, to make partial
> phrase query to affect the scoring.
> Not a good fit for your problem.
>
> More than grams, have you considered using some sort of phonetic matching ?
> Could this help :
> https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching
>
> Cheers
>
> On 10 March 2016 at 08:47, elisabeth benoit 
> wrote:
>
> > I am trying to do approximative search with solr. We've tried fuzzy
> search,
> > and spellcheck search, it's working ok but edit distance is limited (to 2
> > for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, we've
> had
> > performance issues, and I don't think you can have an edit distance more
> > than 2.
> >
> > What we used to do with a database was more efficient: storing trigrams
> > with position, and then searching arround that position (not precisely at
> > that position, since it's approximative search)
> >
> > Position is to avoid  for a trigram like ams (amsterdam) to get answers
> > where the same trigram is for instance at the end of the word. I would
> like
> > answers with the same relative position between trigrams to score higher.
> > Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see any
> > other way. Please tell me if you do.
> >
> > From you're answer, I get that position is stored, but I dont understand
> > how I can preserve relative order between trigrams, apart from using pf2
> > pf3.
> >
> > Best regards,
> > Elisabeth
> >
> > 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti :
> >
> > > if you store the positions for your tokens ( and it is by default if
> you
> > > don't omit them), you have the relative position in the index. [1]
> > > I attach a blog post of mine, describing a little bit more in details
> the
> > > lucene internals.
> > >
> > > Apart from that, can you explain the problem you are trying to solve ?
> > > The high level user experience ?
> > > What kind of search/autocompletion/relevancy tuning are you trying to
> > > achieve ?
> > > Maybe we can help better if we start from the problem :)
> > >
> > > Cheers
> > >
> > > [1]
> > >
> > >
> >
> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html
> > >
> > > On 9 March 2016 at 15:02, elisabeth benoit 
> > > wrote:
> > >
> > > > Hello Alessandro,
> > > >
> > > > You may be right. What would you use to keep relative order between,
> > for
> > > > instance, grams
> > > >
> > > > __a
> > > > _am
> > > > ams
> > > > mst
> > > > ste
> > > > ter
> > > > erd
> > > > rda
> > > > dam
> > > > am_
> > > >
> > > > of amsterdam? pf2 and pf3? That's all I can think about. Please let
> me
> > > know
> > > > if you have more insights.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > > > 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti <
> abenede...@apache.org
> > >:
> > > >
> > > > > Elizabeth,
> > > > > out of curiousity, could we know what you are trying to solve with
> > that
> > > > > complex way of tokenisation ?
> > > > > Solr is really good in storing positions along with token, so I am
> > > > curious
> > > > > to know why your are mixing the things up.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On 8 March 2016 at 10:08, elisabeth benoit <
> > elisaelisael...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for your answer Emir,
> > > > > >
> > > > > > I'll check that out.
> > > > > >
> > > > > > Best regards,
> > > > > > Elisabeth
> > > > > >
> > > > > > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic <
> > > > emir.arnauto...@sematext.com
> > > > > >:
> > > > > >
> > > > > > > Hi Elisabeth,
> > > > > > > I don't think there is such token filter, so you would have to
> > > create
> > > > > > your
> > > > > > > own token filter that takes token and emits ngram token of
> > specific
> > > > > > length.
> > > > > > > It should not be too hard to create such filter - you can take
> a
> > > look
> > > > > how
> > > > > > > nagram filter is coded - yours should be simpler than that.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Emir
> > > > > > >
> > > > > > >
> > > > > > > On 08.03.2016 08:52, 

Very slow updates

2016-03-10 Thread michael solomon
Hi,
I have a collection with one shard in solrcloud (for development before
scaling) and when I'm trying to update new documents it's take about 20 sec
for 12mb of data.
What wrong with my config?

VM RAM - 28gb
JVM-Memory - 10gb

What else can I do?

Thanks,
Michael


Re: ngrams with position

2016-03-10 Thread Alessandro Benedetti
 If I followed your use case is:

I type Asmtreadm and I want document matching Amsterdam ( even if the edit
distance is greater than 2) .
First of all is something I hope you do only if you get 0 results, if not
the overhead can be great and you are going to lose a lot of precision
causing confusion in the customer.

Pf2 and Pf3 is ngram of white space separated tokens, to make partial
phrase query to affect the scoring.
Not a good fit for your problem.

More than grams, have you considered using some sort of phonetic matching ?
Could this help :
https://cwiki.apache.org/confluence/display/solr/Phonetic+Matching

Cheers

On 10 March 2016 at 08:47, elisabeth benoit 
wrote:

> I am trying to do approximative search with solr. We've tried fuzzy search,
> and spellcheck search, it's working ok but edit distance is limited (to 2
> for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, we've had
> performance issues, and I don't think you can have an edit distance more
> than 2.
>
> What we used to do with a database was more efficient: storing trigrams
> with position, and then searching arround that position (not precisely at
> that position, since it's approximative search)
>
> Position is to avoid  for a trigram like ams (amsterdam) to get answers
> where the same trigram is for instance at the end of the word. I would like
> answers with the same relative position between trigrams to score higher.
> Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see any
> other way. Please tell me if you do.
>
> From you're answer, I get that position is stored, but I dont understand
> how I can preserve relative order between trigrams, apart from using pf2
> pf3.
>
> Best regards,
> Elisabeth
>
> 2016-03-10 0:02 GMT+01:00 Alessandro Benedetti :
>
> > if you store the positions for your tokens ( and it is by default if you
> > don't omit them), you have the relative position in the index. [1]
> > I attach a blog post of mine, describing a little bit more in details the
> > lucene internals.
> >
> > Apart from that, can you explain the problem you are trying to solve ?
> > The high level user experience ?
> > What kind of search/autocompletion/relevancy tuning are you trying to
> > achieve ?
> > Maybe we can help better if we start from the problem :)
> >
> > Cheers
> >
> > [1]
> >
> >
> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html
> >
> > On 9 March 2016 at 15:02, elisabeth benoit 
> > wrote:
> >
> > > Hello Alessandro,
> > >
> > > You may be right. What would you use to keep relative order between,
> for
> > > instance, grams
> > >
> > > __a
> > > _am
> > > ams
> > > mst
> > > ste
> > > ter
> > > erd
> > > rda
> > > dam
> > > am_
> > >
> > > of amsterdam? pf2 and pf3? That's all I can think about. Please let me
> > know
> > > if you have more insights.
> > >
> > > Best regards,
> > > Elisabeth
> > >
> > > 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti  >:
> > >
> > > > Elizabeth,
> > > > out of curiousity, could we know what you are trying to solve with
> that
> > > > complex way of tokenisation ?
> > > > Solr is really good in storing positions along with token, so I am
> > > curious
> > > > to know why your are mixing the things up.
> > > >
> > > > Cheers
> > > >
> > > > On 8 March 2016 at 10:08, elisabeth benoit <
> elisaelisael...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks for your answer Emir,
> > > > >
> > > > > I'll check that out.
> > > > >
> > > > > Best regards,
> > > > > Elisabeth
> > > > >
> > > > > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic <
> > > emir.arnauto...@sematext.com
> > > > >:
> > > > >
> > > > > > Hi Elisabeth,
> > > > > > I don't think there is such token filter, so you would have to
> > create
> > > > > your
> > > > > > own token filter that takes token and emits ngram token of
> specific
> > > > > length.
> > > > > > It should not be too hard to create such filter - you can take a
> > look
> > > > how
> > > > > > nagram filter is coded - yours should be simpler than that.
> > > > > >
> > > > > > Regards,
> > > > > > Emir
> > > > > >
> > > > > >
> > > > > > On 08.03.2016 08:52, elisabeth benoit wrote:
> > > > > >
> > > > > >> Hello,
> > > > > >>
> > > > > >> I'm using solr 4.10.1. I'd like to index words with ngrams of
> fix
> > > > lenght
> > > > > >> with a position in the end.
> > > > > >>
> > > > > >> For instance, with fix lenght 3, Amsterdam would be something
> > like:
> > > > > >>
> > > > > >>
> > > > > >> a0 (two spaces added at beginning)
> > > > > >> am1
> > > > > >> ams2
> > > > > >> mst3
> > > > > >> ste4
> > > > > >> ter5
> > > > > >> erd6
> > > > > >> rda7
> > > > > >> dam8
> > > > > >> am9 (one more space in the end)
> > > > > >>
> > > > > >> The number at the end being the position.
> > > > > >>
> > > > > >> Does anyone have a clue how to achieve this?
> > > > > >>
> > > > > >> Best regards,
> > > > > >> 

Re: [Migration Solr4 to Solr5] Collection reload error

2016-03-10 Thread Dmitry Kan
Hi,

The only thing that I spot is that you use both auto-commit with 900 sec
frequency AND commitWithin. Solr is smart enough to skip empty commits. But
if auto-commit kicks in during the doc add / delete, there will be at least
two commits ongoing. Could you change you Full recovery case to commit
eventually from the client code? Then you won't need the autoCommit section.

On Mon, Mar 7, 2016 at 1:11 PM, Gerald Reinhart 
wrote:

>
> Hi,
>
>  To give you some context, we are migrating from Solr4 and solr5,
> the client code and the configuration haven't changed but now we are
> facing this problem. We have already checked the commit behaviour
> configuration and it seems good.
>
> Here it is :
>
> Server side, we have 2 collections (main and temp with blue and green
> aliases) :
>
>solrconfig.xml:
>
>   
>   
>  (...)
>  
>90
>false
>  
>
>  
>
>   
>
> Client side, we have 2 different modes:
>
> 1 - Full recovery :
>
> - Delete all documents from the temp collection
>   solrClient.deleteByQuery("*:*")
>
> - Add all new documents in temp collection (can be more
> than 5Millions),
>   solrClient.add(doc, -1) // commitWithinMs == -1
>
> -  Commit when all documents are added
>   solrClient.commit(false,false) // waitFlush == false ,
> waitSearcher == false
>
> -  Swap blue and green using "create alias" command
>
> -  Reload the temp collection to clean the cache. This is
> at this point we have the issue.
>
> 2 - Incremental :
>
> -  Add or delete documents from the main collection
>solrClient.add(doc, 180)   // commitWithin
> == 30 mn
>solrClient.deleteById(doc, 180) // commitWithin == 30 mn
>
> Maybe you will spot something obviously wrong ?
>
> Thanks
>
> Gérald and Elodie
>
>
>
>
> On 03/04/2016 12:41 PM, Dmitry Kan wrote:
>
>> Hi,
>>
>> Check the the autoCommit and autoSoftCommit nodes in the solrconfig.xml.
>> Set them to reasonable values. The idea is that if you commit too often,
>> searchers will be warmed up and thrown away. If at any point in time you
>> get overlapping commits, there will be several searchers sitting on the
>> deck.
>>
>> Dmitry
>>
>> On Mon, Feb 29, 2016 at 4:20 PM, Gerald Reinhart <
>> gerald.reinh...@kelkoo.com
>>
>>> wrote:
>>> Hi,
>>>
>>> We are facing an issue during a migration from Solr4 to Solr5.
>>>
>>> Given
>>> - migration from solr 4.10.4 to 5.4.1
>>> - 2 collections
>>> - cloud with one leader and several replicas
>>> - in solrconfig.xml: maxWarmingSearchers=1
>>> - no code change
>>>
>>> When collection reload using /admin/collections using solrj
>>>
>>> Then
>>>
>>> 2016-02-29 13:42:49,011 [http-8080-3] INFO
>>> org.apache.solr.core.CoreContainer:reload:848  - Reloading SolrCore
>>> 'fr_blue' using configuration from collection fr_blue
>>> 2016-02-29 13:42:45,428 [http-8080-6] INFO
>>> org.apache.solr.search.SolrIndexSearcher::237  - Opening
>>> Searcher@58b65fc[fr_blue] main
>>> (...)
>>> 2016-02-29 13:42:49,077 [http-8080-3] WARN
>>> org.apache.solr.core.SolrCore:getSearcher:1762  - [fr_blue] Error
>>> opening new searcher. exceeded limit of maxWarmingSearchers=1, try again
>>> later.
>>> 2016-02-29 13:42:49,091 [http-8080-3] ERROR
>>> org.apache.solr.handler.RequestHandlerBase:log:139  -
>>> org.apache.solr.common.SolrException: Error handling 'reload' action
>>>  at
>>>
>>>
>>> org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:770)
>>>  at
>>>
>>>
>>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:230)
>>>  at
>>>
>>>
>>> org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:184)
>>>  at
>>>
>>>
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
>>>  at
>>>
>>>
>>> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:664)
>>>  at
>>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:438)
>>>  at
>>>
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223)
>>>  at
>>>
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
>>>  at
>>>
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>  at
>>>
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>  at
>>>
>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>  at
>>>
>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>  at
>>>
>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>>  

how to force rescan of core.properties file in solr

2016-03-10 Thread Gian Maria Ricci - aka Alkampfer
I've setup a configuration in my solrconfig.xml to manage maseter or slave
with settings in core.properties file.

 

This allows me to select if the core is slave or master with a simple change
of core.properties file.

 

I've setup a dns entry for master.mysolr..xxx, this allows me to point
all of the slaves to DNS entry, and if I want to change the master (ex,
master is down) I can change DNS for master.mysolr..xxx to point at one
of the slaves, then change that slave core.properties to

 

enable.master=true

enable.slave=false

 

but this change in core.properties is not available until I restart the
service and Solr does core autodiscovery. Issuing a Core RELOAD does not
work. 

 

How I can force solr to reload core.properties when I change it?

 

I can Unload the core then re-create passing properties, but I'd like option
to

 

1)  Change core.properties file

2)  Make my core reload seeing new variables

 

Any clues?

 

Thanks in advance.

 

--
Gian Maria Ricci
Cell: +39 320 0136949

 

   


 



Re: Multiple custom Similarity implementations

2016-03-10 Thread Ahmet Arslan
Hi Parvesh,

Please see the similar discussion : http://search-lucene.com/m/eHNlijx91I7etm1

Ahmet


On Thursday, March 10, 2016 6:57 AM, Parvesh Garg  wrote:



Thanks Markus. We will look at other options. May I ask what can be the
reasons for not supporting this ever?


Parvesh Garg,

http://www.zettata.com


On Tue, Mar 8, 2016 at 8:59 PM, Markus Jelsma 
wrote:

> Hello, you can not change similarities per request, and this is likely
> never going to be supported for good reasons. You need multiple cores, or
> multiple fields with different similarity defined in the same core.
> Markus
>
> -Original message-
> > From:Parvesh Garg 
> > Sent: Tuesday 8th March 2016 5:36
> > To: solr-user@lucene.apache.org
> > Subject: Multiple custom Similarity implementations
> >
> > Hi,
> >
> > We have a requirement where we want to run an A/B test over multiple
> > Similarity implementations. Is it possible to define multiple similarity
> > tags in schema.xml file and chose one using the URL parameter? We are
> using
> > solr 4.7
> >
> > Currently, we are planning to have different cores with different
> > similarity configured and split traffic based on core names. This is
> > leading to index duplication and un-necessary resource usage.
> >
> > Any help is highly appreciated.
> >
> > Parvesh Garg,
> >
> > http://www.zettata.com
> >
>


Re: ngrams with position

2016-03-10 Thread elisabeth benoit
I am trying to do approximative search with solr. We've tried fuzzy search,
and spellcheck search, it's working ok but edit distance is limited (to 2
for DirectSolrSpellChecker in solr 4.10.1). With fuzzy operator, we've had
performance issues, and I don't think you can have an edit distance more
than 2.

What we used to do with a database was more efficient: storing trigrams
with position, and then searching arround that position (not precisely at
that position, since it's approximative search)

Position is to avoid  for a trigram like ams (amsterdam) to get answers
where the same trigram is for instance at the end of the word. I would like
answers with the same relative position between trigrams to score higher.
Maybe using edismax'ss pf2 and pf3 is a way to do this. I don't see any
other way. Please tell me if you do.

>From you're answer, I get that position is stored, but I dont understand
how I can preserve relative order between trigrams, apart from using pf2
pf3.

Best regards,
Elisabeth

2016-03-10 0:02 GMT+01:00 Alessandro Benedetti :

> if you store the positions for your tokens ( and it is by default if you
> don't omit them), you have the relative position in the index. [1]
> I attach a blog post of mine, describing a little bit more in details the
> lucene internals.
>
> Apart from that, can you explain the problem you are trying to solve ?
> The high level user experience ?
> What kind of search/autocompletion/relevancy tuning are you trying to
> achieve ?
> Maybe we can help better if we start from the problem :)
>
> Cheers
>
> [1]
>
> http://alexbenedetti.blogspot.co.uk/2015/07/exploring-solr-internals-lucene.html
>
> On 9 March 2016 at 15:02, elisabeth benoit 
> wrote:
>
> > Hello Alessandro,
> >
> > You may be right. What would you use to keep relative order between, for
> > instance, grams
> >
> > __a
> > _am
> > ams
> > mst
> > ste
> > ter
> > erd
> > rda
> > dam
> > am_
> >
> > of amsterdam? pf2 and pf3? That's all I can think about. Please let me
> know
> > if you have more insights.
> >
> > Best regards,
> > Elisabeth
> >
> > 2016-03-08 17:46 GMT+01:00 Alessandro Benedetti :
> >
> > > Elizabeth,
> > > out of curiousity, could we know what you are trying to solve with that
> > > complex way of tokenisation ?
> > > Solr is really good in storing positions along with token, so I am
> > curious
> > > to know why your are mixing the things up.
> > >
> > > Cheers
> > >
> > > On 8 March 2016 at 10:08, elisabeth benoit 
> > > wrote:
> > >
> > > > Thanks for your answer Emir,
> > > >
> > > > I'll check that out.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > > > 2016-03-08 10:24 GMT+01:00 Emir Arnautovic <
> > emir.arnauto...@sematext.com
> > > >:
> > > >
> > > > > Hi Elisabeth,
> > > > > I don't think there is such token filter, so you would have to
> create
> > > > your
> > > > > own token filter that takes token and emits ngram token of specific
> > > > length.
> > > > > It should not be too hard to create such filter - you can take a
> look
> > > how
> > > > > nagram filter is coded - yours should be simpler than that.
> > > > >
> > > > > Regards,
> > > > > Emir
> > > > >
> > > > >
> > > > > On 08.03.2016 08:52, elisabeth benoit wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >> I'm using solr 4.10.1. I'd like to index words with ngrams of fix
> > > lenght
> > > > >> with a position in the end.
> > > > >>
> > > > >> For instance, with fix lenght 3, Amsterdam would be something
> like:
> > > > >>
> > > > >>
> > > > >> a0 (two spaces added at beginning)
> > > > >> am1
> > > > >> ams2
> > > > >> mst3
> > > > >> ste4
> > > > >> ter5
> > > > >> erd6
> > > > >> rda7
> > > > >> dam8
> > > > >> am9 (one more space in the end)
> > > > >>
> > > > >> The number at the end being the position.
> > > > >>
> > > > >> Does anyone have a clue how to achieve this?
> > > > >>
> > > > >> Best regards,
> > > > >> Elisabeth
> > > > >>
> > > > >>
> > > > > --
> > > > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > Management
> > > > > Solr & Elasticsearch Support * http://sematext.com/
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Query behavior.

2016-03-10 Thread Modassar Ather
Thanks Shawn for pointing to the jira issue. I was not sure that if it is
an expected behavior or a bug or there could have been a way to get the
desired result.

Best,
Modassar

On Thu, Mar 10, 2016 at 11:32 AM, Shawn Heisey  wrote:

> On 3/9/2016 10:55 PM, Shawn Heisey wrote:
> > The ~2 syntax, when not attached to a phrase query (quotes) is the way
> > you express a fuzzy query. If it's attached to a query in quotes, then
> > it is a proximity query. I'm not sure whether it means something
> > different when it's attached to a query clause in parentheses, someone
> > with more knowledge will need to comment.
> 
> > https://issues.apache.org/jira/browse/SOLR-8812
>
> After I read SOLR-8812 more closely, it seems that the ~2 syntax with
> parentheses is the way that the effective mm value is expressed for a
> particular query clause in the parsed query.  I've learned something new
> today.
>
> Thanks,
> Shawn
>
>


Re: Clarification on +, and in edismax parser

2016-03-10 Thread Anil
Thank you Dikshant.

On 10 March 2016 at 13:26, Dikshant Shahi  wrote:

> Hi,
>
> No, + and "and" doesn't works similar. Even "and" and "AND" would have a
> different behavior (is configurable) in edismax.
>
> When you put a + before a term, you specify that it's mandatory. Hence,
> "+google +india" will get you the same result as "google AND india".
>
> Best Regards,
> *Dikshant Shahi*
>
>
>
> On Thu, Mar 10, 2016 at 12:59 PM, Anil  wrote:
>
> >  "google"+"india" ,  "india"+"google" returning different results. Any
> help
> > would be appreciated.
> >
> > Thanks,
> > Anil
> >
> >
> > On 10 March 2016 at 11:47, Anil  wrote:
> >
> > > HI,
> > >
> > > I am using edismax query parser for my solr search.
> > >
> > > i believe '+' and 'and' should work similar.
> > >
> > > ex : "google"+"india", "google" and "india" should return same number
> of
> > > results.
> > >
> > > Correct me if I am wrong. Thanks.
> > >
> > > Regards,
> > > Anil
> > >
> > >
> > >
> >
>


Query result cache not getting inserted for query lasting > 5secs

2016-03-10 Thread Murali TV
Hi,

I have a query that takes about 5secs to complete. The result count is
about 250 million, and row size is about 25.
The problem is that this query result is not getting loaded to the query
cache, so it takes ~5secs every time its issued.  I also confirmed this by
looking at the cache stats page in the admin console. The query cache size
and inserts counts don't increase. The cache lookup counts increment but
cache hits count stays the same for every instance of the query.

If I modify the query such that the query returns in about 4.5 secs, this
 second query does get cached and subsequent calls return in milliseconds.
The cache stats page shows the cache size and insert counts increase by 1
for the first query, and cache hits count increases for subsequent queries.

What's the reason for the first query not getting cached when the request
takes >5 secs? Is there any configuration around this?

Thanks.
Murali