Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread Erick Erickson
David:

Some of this still matters even with 7.5+. Prior to 7.5, you could easily have 
50% of your index consist of deleted docs. With 7.5, this ceiling is reduced. 
expungeDeletes will reduce the size to no more than 10% while still respecting 
the default max segment size of 5G. Optimizing and specifying maxSegments was 
getting you what you wanted, but more as a side effect, ya’ got lucky ;)….

You can set a bunch of parameters explicitly for TieredMergePolicy, some of the 
juicy ones might be 

- maxMergedSegmentMB, default 5000, will result in fewer segments but doesn’t 
materially affect the ration of deleted docs.

-forceMergeDeletesPctAllowed (used in expungeDeletes, default 10%)

- deletesPctAllowed (when doing “regular” merging, i.e. not optimizing or 
expungeDeletes) this is the target ceiling for the % of deleted docs allowed in 
the index. Cannot set below 20%).


It’s a balance between I/O and wasted space. The reason deletesPctAllowed is 
not allowed to go below 20% is that it’s too easy to shoot yourself in the 
foot. Setting it to 5%, for instance, would send I/O (and CPU) through the 
roof, merging is an expensive operation. And you can get something similar by 
doing an expungeDeletes once rather than rewriting segments all the time….

Ditto with the default value for forceMergeDeletesPctAllowed. Setting it to 1%, 
for instance, is doing a LOT of work for little gain.

Best,
Erick


> On Jun 7, 2019, at 2:44 PM, David Santamauro  
> wrote:
> 
> I use the same algorithm and for me, initialMaxSegments is always the number 
> of segments currently in the index (seen, e.g, in the SOLR admin UI). 
> finalMaxSegments depends on what kind of updates have happened. If I know 
> that "older" documents are untouched, then I'll usually use -60% or even 
> -70%, depending on the initialMaxSegments. I have a few cores that I'll even 
> go all the way down to 1.
> 
> If you are going to attempt this, I'd suggest to test with a small reduction, 
> say 10 segments, and monitor the index size and difference between maxDoc and 
> numDocs. I've shaved ~ 1T off of an index optimizing from 75 down to  30 
> segments (7T index total) and reduced a significant % of delete documents in 
> the process. YMMV ...
> 
> If you are using a version of SOLR >=7.5 (see LUCENE-7976), this might all be 
> moot.
> 
> //
> 
> 
> On 6/7/19, 2:29 PM, "jena"  wrote:
> 
>Thanks @Michael Joyner,  how did you decide initialmax segment to 256 ? Or 
> it
>is some random number i can use for my case ? Can you guuide me how to
>decide the initial & final max segments ?
> 
> 
>Michael Joyner wrote
>> That is the way we do it here - also helps a lot with not needing x2 or 
>> x3 disk space to handle the merge:
>> 
>> public void solrOptimize() {
>> int initialMaxSegments = 256;
>> int finalMaxSegments = 4;
>> if (isShowSegmentCounter()) {
>> log.info("Optimizing ...");
>> }
>> try (SolrClient solrServerInstance = getSolrClientInstance()) {
>> for (int segments = initialMaxSegments; segments >= 
>> finalMaxSegments; segments--) {
>> if (isShowSegmentCounter()) {
>> System.out.println("Optimizing to a max of " + 
>> segments + " segments.");
>> }
>> try {
>> solrServerInstance.optimize(true, true, segments);
>> } catch (RemoteSolrException | SolrServerException | 
>> IOException e) {
>> log.severe(e.getMessage());
>> }
>> }
>> } catch (IOException e) {
>> throw new RuntimeException(e);
>> }
>> }
>> 
>> On 6/7/19 4:56 AM, Nicolas Franck wrote:
>>> In that case, hard optimisation like that is out the question.
>>> Resort to automatic merge policies, specifying a maximum
>>> amount of segments. Solr is created with multiple segments
>>> in mind. Hard optimisation seems like not worth the problem.
>>> 
>>> The problem is this: the less segments you specify during
>>> during an optimisation, the longer it will take, because it has to read
>>> all of these segments to be merged, and redo the sorting. And a cluster
>>> has a lot of housekeeping on top of it.
>>> 
>>> If you really want to issue a optimisation, then you can
>>> also do it in steps (max segments parameter)
>>> 
>>> 10 -> 9 -> 8 -> 7 .. -> 1
>>> 
>>> that way less segments need to be merged in one go.
>>> 
>>> testing your index will show you what a good maximum
>>> amount of segments is for your index.
>>> 
 On 7 Jun 2019, at 07:27, jena <
> 
>> sthita2010@
> 
>> > wrote:
 
 Hello guys,
 
 We have 4 solr(version 4.4) instance on production environment, which
 are
 linked/associated with zookeeper for replication. We do heavy deleted &
 add
 operations. We have around 26million records and the index size is
 around
 70GB. We serve 100k+ requests per day.
 
 
 Becau

Re: Solr Heap Usage

2019-06-07 Thread Greg Harris
+1 for eclipse mat. Yourkit is another option. Heap dumps are invaluable
but a pain. If you’re just interested in overall heap and gc analysis I use
gc-viewer, which is usually all you need to know. I do heap dumps when
there are for large deviations from expectations and it is non obvious why

Greg

On Fri, Jun 7, 2019 at 11:30 AM John Davis 
wrote:

> What would be the best way to understand where heap is being used?
>
> On Tue, Jun 4, 2019 at 9:31 PM Greg Harris  wrote:
>
> > Just a couple of points I’d make here. I did some testing a while back in
> > which if no commit is made, (hard or soft) there are internal memory
> > structures holding tlogs and it will continue to get worse the more docs
> > that come in. I don’t know if that’s changed in further versions. I’d
> > recommend doing commits with some amount of frequency in indexing heavy
> > apps, otherwise you are likely to have heap issues. I personally would
> > advocate for some of the points already made. There are too many
> variables
> > going on here and ways to modify stuff to make sizing decisions and think
> > you’re doing anything other than a pure guess if you don’t test and
> > monitor. I’d advocate for a process in which testing is done regularly to
> > figure out questions like number of shards/replicas, heap size, memory
> etc.
> > Hard data, good process and regular testing will trump guesswork every
> time
> >
> > Greg
> >
> > On Tue, Jun 4, 2019 at 9:22 AM John Davis 
> > wrote:
> >
> > > You might want to test with softcommit of hours vs 5m for heavy
> indexing
> > +
> > > light query -- even though there is internal memory structure overhead
> > for
> > > no soft commits, in our testing a 5m soft commit (via commitWithin) has
> > > resulted in a very very large heap usage which I suspect is because of
> > > other overhead associated with it.
> > >
> > > On Tue, Jun 4, 2019 at 8:03 AM Erick Erickson  >
> > > wrote:
> > >
> > > > I need to update that, didn’t understand the bits about retaining
> > > internal
> > > > memory structures at the time.
> > > >
> > > > > On Jun 4, 2019, at 2:10 AM, John Davis 
> > > > wrote:
> > > > >
> > > > > Erick - These conflict, what's changed?
> > > > >
> > > > > So if I were going to recommend settings, they’d be something like
> > > this:
> > > > > Do a hard commit with openSearcher=false every 60 seconds.
> > > > > Do a soft commit every 5 minutes.
> > > > >
> > > > > vs
> > > > >
> > > > > Index-heavy, Query-light
> > > > > Set your soft commit interval quite long, up to the maximum latency
> > you
> > > > can
> > > > > stand for documents to be visible. This could be just a couple of
> > > minutes
> > > > > or much longer. Maybe even hours with the capability of issuing a
> > hard
> > > > > commit (openSearcher=true) or soft commit on demand.
> > > > >
> > > >
> > >
> >
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jun 2, 2019 at 8:58 PM Erick Erickson <
> > erickerick...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >>> I've looked through SolrJ, DIH and others -- is the bottomline
> > > > >>> across all of them to "batch updates" and not commit as long as
> > > > possible?
> > > > >>
> > > > >> Of course it’s more complicated than that ;)….
> > > > >>
> > > > >> But to start, yes, I urge you to batch. Here’s some stats:
> > > > >> https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/
> > > > >>
> > > > >> Note that at about 100 docs/batch you hit diminishing returns.
> > > > _However_,
> > > > >> that test was run on a single shard collection, so if you have 10
> > > shards
> > > > >> you’d
> > > > >> have to send 1,000 docs/batch. I wouldn’t sweat that number much,
> > just
> > > > >> don’t
> > > > >> send one at a time. And there are the usual gotchas if your
> > documents
> > > > are
> > > > >> 1M .vs. 1K.
> > > > >>
> > > > >> About committing. No, don’t hold off as long as possible. When you
> > > > commit,
> > > > >> segments are merged. _However_, the default 100M internal buffer
> > size
> > > > means
> > > > >> that segments are written anyway even if you don’t hit a commit
> > point
> > > > when
> > > > >> you have 100M of index data, and merges happen anyway. So you
> won’t
> > > save
> > > > >> anything on merging by holding off commits.
> > > > >> And you’ll incur penalties. Here’s more than you want to know
> about
> > > > >> commits:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> > > > >>
> > > > >> But some key take-aways… If for some reason Solr abnormally
> > > > >> terminates, the accumulated documents since the last hard
> > > > >> commit are replayed. So say you don’t commit for an hour of
> > > > >> furious indexing and someone does a “kill -9”. When you restart
> > > > >> Solr it’ll try to re-index all the docs for the last hour. Hard
> > >

NPE in DelegationTokenHttpSolrClient

2019-06-07 Thread aaront250
Hi,

Receiving NPE when trying to index into solr collection.

Initializing the HTTPSolrClient like this..

HttpSolrClient client = new HttpSolrClient.Builder()
.withKerberosDelegationToken(token)
.withHttpClient(httpClient)
.withBaseSolrUrl(baseUrl)
.build();

Retrieving the token using the function provided in the documentation.

   private static String getDelegationToken(final String renewer, final
String user, HttpSolrClient solrClient) throws Exception {
DelegationTokenRequest.Get get = new
DelegationTokenRequest.Get(renewer) {
  @Override
  public SolrParams getParams() {
ModifiableSolrParams params = new
ModifiableSolrParams(super.getParams());
params.set("user", user);
return params;
  }
};
DelegationTokenResponse.Get getResponse = 
get.process(solrClient);
return getResponse.getDelegationToken();
   }


I am able to getDocumentById successfully using the token but unable to
add/delete into collection. When I remove the ".withKerberosDelegationToken"
line and use keytab file it can index correctly. Does anyone know why I am
running into NPE?

The NPE seems to be coming from this line.

   SolrParams params = request.getParams();
->if (params.getParams("delegation") != null) {




java.lang.NullPointerException
at
org.apache.solr.client.solrj.impl.DelegationTokenHttpSolrClient.createMethod(DelegationTokenHttpSolrClient.java:93)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:242)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:178)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
at testAdd.main(testAdd.java:111)



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread David Santamauro
I use the same algorithm and for me, initialMaxSegments is always the number of 
segments currently in the index (seen, e.g, in the SOLR admin UI). 
finalMaxSegments depends on what kind of updates have happened. If I know that 
"older" documents are untouched, then I'll usually use -60% or even -70%, 
depending on the initialMaxSegments. I have a few cores that I'll even go all 
the way down to 1.

If you are going to attempt this, I'd suggest to test with a small reduction, 
say 10 segments, and monitor the index size and difference between maxDoc and 
numDocs. I've shaved ~ 1T off of an index optimizing from 75 down to  30 
segments (7T index total) and reduced a significant % of delete documents in 
the process. YMMV ...

If you are using a version of SOLR >=7.5 (see LUCENE-7976), this might all be 
moot.

//


On 6/7/19, 2:29 PM, "jena"  wrote:

Thanks @Michael Joyner,  how did you decide initialmax segment to 256 ? Or 
it
is some random number i can use for my case ? Can you guuide me how to
decide the initial & final max segments ?

 
Michael Joyner wrote
> That is the way we do it here - also helps a lot with not needing x2 or 
> x3 disk space to handle the merge:
> 
> public void solrOptimize() {
>  int initialMaxSegments = 256;
>  int finalMaxSegments = 4;
>  if (isShowSegmentCounter()) {
>  log.info("Optimizing ...");
>  }
>  try (SolrClient solrServerInstance = getSolrClientInstance()) {
>  for (int segments = initialMaxSegments; segments >= 
> finalMaxSegments; segments--) {
>  if (isShowSegmentCounter()) {
>  System.out.println("Optimizing to a max of " + 
> segments + " segments.");
>  }
>  try {
>  solrServerInstance.optimize(true, true, segments);
>  } catch (RemoteSolrException | SolrServerException | 
> IOException e) {
>  log.severe(e.getMessage());
>  }
>  }
>  } catch (IOException e) {
>  throw new RuntimeException(e);
>  }
>  }
> 
> On 6/7/19 4:56 AM, Nicolas Franck wrote:
>> In that case, hard optimisation like that is out the question.
>> Resort to automatic merge policies, specifying a maximum
>> amount of segments. Solr is created with multiple segments
>> in mind. Hard optimisation seems like not worth the problem.
>>
>> The problem is this: the less segments you specify during
>> during an optimisation, the longer it will take, because it has to read
>> all of these segments to be merged, and redo the sorting. And a cluster
>> has a lot of housekeeping on top of it.
>>
>> If you really want to issue a optimisation, then you can
>> also do it in steps (max segments parameter)
>>
>> 10 -> 9 -> 8 -> 7 .. -> 1
>>
>> that way less segments need to be merged in one go.
>>
>> testing your index will show you what a good maximum
>> amount of segments is for your index.
>>
>>> On 7 Jun 2019, at 07:27, jena <

> sthita2010@

> > wrote:
>>>
>>> Hello guys,
>>>
>>> We have 4 solr(version 4.4) instance on production environment, which
>>> are
>>> linked/associated with zookeeper for replication. We do heavy deleted &
>>> add
>>> operations. We have around 26million records and the index size is
>>> around
>>> 70GB. We serve 100k+ requests per day.
>>>
>>>
>>> Because of heavy indexing & deletion, we optimise solr instance
>>> everyday,
>>> because of that our solr cloud getting unstable , every solr instance go
>>> on
>>> recovery mode & our search is getting affected & very slow because of
>>> that.
>>> Optimisation takes around 1hr 30minutes.
>>> We are not able fix this issue, please help.
>>>
>>> Thanks & Regards
>>>
>>>
>>>
>>> --
>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Configure mutual TLS 1.2 to secure SOLR

2019-06-07 Thread Jörn Franke
(On the server side there is AFAIK anyway only 1.2 possible)

> Am 07.06.2019 um 21:42 schrieb Jörn Franke :
> 
> Configure SSL according to the reference guide.
> 
> Then start each Solr node with the option -Dhttps.protocols=TLSv1.2
> 
>> Am 07.06.2019 um 17:02 schrieb Paul :
>> 
>> Hi,
>> 
>> Can someone please outline how to use mutual TLS 1.2 with SOLR. Or, point me
>> at docs/tutorials/other where I can read up further on this (version
>> currently onsite is SOLR 7.6).
>> 
>> Thanks
>> Paul
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


NullPointerException in QueryComponent.unmarshalSortValues

2019-06-07 Thread Hendrik Haddorp

Hi,

I'm doing a simple *:* search on an empty multi sharded collection using
Solr 7.6 and am getting this exception:

NullPointerException
    at
org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1034)
    at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:885)
    at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:585)
    at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:564)
    at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:426)
    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)

This is the same exception as reported in
https://issues.apache.org/jira/browse/SOLR-12060 and likely also
https://issues.apache.org/jira/browse/SOLR-11643. Sometimes I can do
multiple requests and some work and some pass. And this test is done on
a single node that for testing uses a collection with four shards.

I looked a bit into the code:
https://github.com/apache/lucene-solr/blob/branch_7_6/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
I seems like
https://github.com/apache/lucene-solr/blob/branch_7_6/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L884
return null.
This should mean that
https://github.com/apache/lucene-solr/blob/branch_7_6/solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java#L482
did not get invoked, which would happen if FIELD_SORT_VALUES is not set
to true. And indeed if I add fsv=true to my query the NPE does not show
up. So there are a few questions:
1) what is fsv=true about?
2) why do I need to set it?
3) why don't I get the NPE all the time?

Earlier it looked as if the problem only showed up if I enabled the
suggester or spellcheck component. But after having done tons of tests
things are not that consistent.

thanks,
Hendrik


RE: Solr Heap Usage

2019-06-07 Thread Markus Jelsma
Hello,

We use VisualVM for making observations. But use Eclipse MAT for in-depth 
analysis, usually only when there is a suspected memory leak.

Regards,
Markus

 
 
-Original message-
> From:John Davis 
> Sent: Friday 7th June 2019 20:30
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Heap Usage
> 
> What would be the best way to understand where heap is being used?
> 
> On Tue, Jun 4, 2019 at 9:31 PM Greg Harris  wrote:
> 
> > Just a couple of points I’d make here. I did some testing a while back in
> > which if no commit is made, (hard or soft) there are internal memory
> > structures holding tlogs and it will continue to get worse the more docs
> > that come in. I don’t know if that’s changed in further versions. I’d
> > recommend doing commits with some amount of frequency in indexing heavy
> > apps, otherwise you are likely to have heap issues. I personally would
> > advocate for some of the points already made. There are too many variables
> > going on here and ways to modify stuff to make sizing decisions and think
> > you’re doing anything other than a pure guess if you don’t test and
> > monitor. I’d advocate for a process in which testing is done regularly to
> > figure out questions like number of shards/replicas, heap size, memory etc.
> > Hard data, good process and regular testing will trump guesswork every time
> >
> > Greg
> >
> > On Tue, Jun 4, 2019 at 9:22 AM John Davis 
> > wrote:
> >
> > > You might want to test with softcommit of hours vs 5m for heavy indexing
> > +
> > > light query -- even though there is internal memory structure overhead
> > for
> > > no soft commits, in our testing a 5m soft commit (via commitWithin) has
> > > resulted in a very very large heap usage which I suspect is because of
> > > other overhead associated with it.
> > >
> > > On Tue, Jun 4, 2019 at 8:03 AM Erick Erickson 
> > > wrote:
> > >
> > > > I need to update that, didn’t understand the bits about retaining
> > > internal
> > > > memory structures at the time.
> > > >
> > > > > On Jun 4, 2019, at 2:10 AM, John Davis 
> > > > wrote:
> > > > >
> > > > > Erick - These conflict, what's changed?
> > > > >
> > > > > So if I were going to recommend settings, they’d be something like
> > > this:
> > > > > Do a hard commit with openSearcher=false every 60 seconds.
> > > > > Do a soft commit every 5 minutes.
> > > > >
> > > > > vs
> > > > >
> > > > > Index-heavy, Query-light
> > > > > Set your soft commit interval quite long, up to the maximum latency
> > you
> > > > can
> > > > > stand for documents to be visible. This could be just a couple of
> > > minutes
> > > > > or much longer. Maybe even hours with the capability of issuing a
> > hard
> > > > > commit (openSearcher=true) or soft commit on demand.
> > > > >
> > > >
> > >
> > https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jun 2, 2019 at 8:58 PM Erick Erickson <
> > erickerick...@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > >>> I've looked through SolrJ, DIH and others -- is the bottomline
> > > > >>> across all of them to "batch updates" and not commit as long as
> > > > possible?
> > > > >>
> > > > >> Of course it’s more complicated than that ;)….
> > > > >>
> > > > >> But to start, yes, I urge you to batch. Here’s some stats:
> > > > >> https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/
> > > > >>
> > > > >> Note that at about 100 docs/batch you hit diminishing returns.
> > > > _However_,
> > > > >> that test was run on a single shard collection, so if you have 10
> > > shards
> > > > >> you’d
> > > > >> have to send 1,000 docs/batch. I wouldn’t sweat that number much,
> > just
> > > > >> don’t
> > > > >> send one at a time. And there are the usual gotchas if your
> > documents
> > > > are
> > > > >> 1M .vs. 1K.
> > > > >>
> > > > >> About committing. No, don’t hold off as long as possible. When you
> > > > commit,
> > > > >> segments are merged. _However_, the default 100M internal buffer
> > size
> > > > means
> > > > >> that segments are written anyway even if you don’t hit a commit
> > point
> > > > when
> > > > >> you have 100M of index data, and merges happen anyway. So you won’t
> > > save
> > > > >> anything on merging by holding off commits.
> > > > >> And you’ll incur penalties. Here’s more than you want to know about
> > > > >> commits:
> > > > >>
> > > > >>
> > > >
> > >
> > https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> > > > >>
> > > > >> But some key take-aways… If for some reason Solr abnormally
> > > > >> terminates, the accumulated documents since the last hard
> > > > >> commit are replayed. So say you don’t commit for an hour of
> > > > >> furious indexing and someone does a “kill -9”. When you restart
> > > > >> Solr it’ll try to re-index all the docs for the last hour. Hard
> > > commits
> > > > >> with openSearcher=false aren’t all

Re: Configure mutual TLS 1.2 to secure SOLR

2019-06-07 Thread Jörn Franke
Configure SSL according to the reference guide.

Then start each Solr node with the option -Dhttps.protocols=TLSv1.2

> Am 07.06.2019 um 17:02 schrieb Paul :
> 
> Hi,
> 
> Can someone please outline how to use mutual TLS 1.2 with SOLR. Or, point me
> at docs/tutorials/other where I can read up further on this (version
> currently onsite is SOLR 7.6).
> 
> Thanks
> Paul
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Heap Usage

2019-06-07 Thread John Davis
What would be the best way to understand where heap is being used?

On Tue, Jun 4, 2019 at 9:31 PM Greg Harris  wrote:

> Just a couple of points I’d make here. I did some testing a while back in
> which if no commit is made, (hard or soft) there are internal memory
> structures holding tlogs and it will continue to get worse the more docs
> that come in. I don’t know if that’s changed in further versions. I’d
> recommend doing commits with some amount of frequency in indexing heavy
> apps, otherwise you are likely to have heap issues. I personally would
> advocate for some of the points already made. There are too many variables
> going on here and ways to modify stuff to make sizing decisions and think
> you’re doing anything other than a pure guess if you don’t test and
> monitor. I’d advocate for a process in which testing is done regularly to
> figure out questions like number of shards/replicas, heap size, memory etc.
> Hard data, good process and regular testing will trump guesswork every time
>
> Greg
>
> On Tue, Jun 4, 2019 at 9:22 AM John Davis 
> wrote:
>
> > You might want to test with softcommit of hours vs 5m for heavy indexing
> +
> > light query -- even though there is internal memory structure overhead
> for
> > no soft commits, in our testing a 5m soft commit (via commitWithin) has
> > resulted in a very very large heap usage which I suspect is because of
> > other overhead associated with it.
> >
> > On Tue, Jun 4, 2019 at 8:03 AM Erick Erickson 
> > wrote:
> >
> > > I need to update that, didn’t understand the bits about retaining
> > internal
> > > memory structures at the time.
> > >
> > > > On Jun 4, 2019, at 2:10 AM, John Davis 
> > > wrote:
> > > >
> > > > Erick - These conflict, what's changed?
> > > >
> > > > So if I were going to recommend settings, they’d be something like
> > this:
> > > > Do a hard commit with openSearcher=false every 60 seconds.
> > > > Do a soft commit every 5 minutes.
> > > >
> > > > vs
> > > >
> > > > Index-heavy, Query-light
> > > > Set your soft commit interval quite long, up to the maximum latency
> you
> > > can
> > > > stand for documents to be visible. This could be just a couple of
> > minutes
> > > > or much longer. Maybe even hours with the capability of issuing a
> hard
> > > > commit (openSearcher=true) or soft commit on demand.
> > > >
> > >
> >
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Jun 2, 2019 at 8:58 PM Erick Erickson <
> erickerick...@gmail.com
> > >
> > > > wrote:
> > > >
> > > >>> I've looked through SolrJ, DIH and others -- is the bottomline
> > > >>> across all of them to "batch updates" and not commit as long as
> > > possible?
> > > >>
> > > >> Of course it’s more complicated than that ;)….
> > > >>
> > > >> But to start, yes, I urge you to batch. Here’s some stats:
> > > >> https://lucidworks.com/2015/10/05/really-batch-updates-solr-2/
> > > >>
> > > >> Note that at about 100 docs/batch you hit diminishing returns.
> > > _However_,
> > > >> that test was run on a single shard collection, so if you have 10
> > shards
> > > >> you’d
> > > >> have to send 1,000 docs/batch. I wouldn’t sweat that number much,
> just
> > > >> don’t
> > > >> send one at a time. And there are the usual gotchas if your
> documents
> > > are
> > > >> 1M .vs. 1K.
> > > >>
> > > >> About committing. No, don’t hold off as long as possible. When you
> > > commit,
> > > >> segments are merged. _However_, the default 100M internal buffer
> size
> > > means
> > > >> that segments are written anyway even if you don’t hit a commit
> point
> > > when
> > > >> you have 100M of index data, and merges happen anyway. So you won’t
> > save
> > > >> anything on merging by holding off commits.
> > > >> And you’ll incur penalties. Here’s more than you want to know about
> > > >> commits:
> > > >>
> > > >>
> > >
> >
> https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> > > >>
> > > >> But some key take-aways… If for some reason Solr abnormally
> > > >> terminates, the accumulated documents since the last hard
> > > >> commit are replayed. So say you don’t commit for an hour of
> > > >> furious indexing and someone does a “kill -9”. When you restart
> > > >> Solr it’ll try to re-index all the docs for the last hour. Hard
> > commits
> > > >> with openSearcher=false aren’t all that expensive. I usually set
> mine
> > > >> for a minute and forget about it.
> > > >>
> > > >> Transaction logs hold a window, _not_ the entire set of operations
> > > >> since time began. When you do a hard commit, the current tlog is
> > > >> closed and a new one opened and ones that are “too old” are deleted.
> > If
> > > >> you never commit you have a huge transaction log to no good purpose.
> > > >>
> > > >> Also, while indexing, in order to accommodate “Real Time Get”, all
> > > >> the docs indexed since the last searcher was opened have 

Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread jena
Thanks @Michael Joyner,  how did you decide initialmax segment to 256 ? Or it
is some random number i can use for my case ? Can you guuide me how to
decide the initial & final max segments ?

 
Michael Joyner wrote
> That is the way we do it here - also helps a lot with not needing x2 or 
> x3 disk space to handle the merge:
> 
> public void solrOptimize() {
>          int initialMaxSegments = 256;
>          int finalMaxSegments = 4;
>          if (isShowSegmentCounter()) {
>              log.info("Optimizing ...");
>          }
>          try (SolrClient solrServerInstance = getSolrClientInstance()) {
>              for (int segments = initialMaxSegments; segments >= 
> finalMaxSegments; segments--) {
>                  if (isShowSegmentCounter()) {
>                      System.out.println("Optimizing to a max of " + 
> segments + " segments.");
>                  }
>                  try {
>                      solrServerInstance.optimize(true, true, segments);
>                  } catch (RemoteSolrException | SolrServerException | 
> IOException e) {
>                      log.severe(e.getMessage());
>                  }
>              }
>          } catch (IOException e) {
>              throw new RuntimeException(e);
>          }
>      }
> 
> On 6/7/19 4:56 AM, Nicolas Franck wrote:
>> In that case, hard optimisation like that is out the question.
>> Resort to automatic merge policies, specifying a maximum
>> amount of segments. Solr is created with multiple segments
>> in mind. Hard optimisation seems like not worth the problem.
>>
>> The problem is this: the less segments you specify during
>> during an optimisation, the longer it will take, because it has to read
>> all of these segments to be merged, and redo the sorting. And a cluster
>> has a lot of housekeeping on top of it.
>>
>> If you really want to issue a optimisation, then you can
>> also do it in steps (max segments parameter)
>>
>> 10 -> 9 -> 8 -> 7 .. -> 1
>>
>> that way less segments need to be merged in one go.
>>
>> testing your index will show you what a good maximum
>> amount of segments is for your index.
>>
>>> On 7 Jun 2019, at 07:27, jena <

> sthita2010@

> > wrote:
>>>
>>> Hello guys,
>>>
>>> We have 4 solr(version 4.4) instance on production environment, which
>>> are
>>> linked/associated with zookeeper for replication. We do heavy deleted &
>>> add
>>> operations. We have around 26million records and the index size is
>>> around
>>> 70GB. We serve 100k+ requests per day.
>>>
>>>
>>> Because of heavy indexing & deletion, we optimise solr instance
>>> everyday,
>>> because of that our solr cloud getting unstable , every solr instance go
>>> on
>>> recovery mode & our search is getting affected & very slow because of
>>> that.
>>> Optimisation takes around 1hr 30minutes.
>>> We are not able fix this issue, please help.
>>>
>>> Thanks & Regards
>>>
>>>
>>>
>>> --
>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Configure mutual TLS 1.2 to secure SOLR

2019-06-07 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Paul,

On 6/7/19 11:02, Paul wrote:
> Can someone please outline how to use mutual TLS 1.2 with SOLR. Or,
> point me at docs/tutorials/other where I can read up further on
> this (version currently onsite is SOLR 7.6).

Here's a copy/paste from our internal guide for how to do this. YMMV.

Enjoy!

[...]

5. Configure Solr for TLS
   Create a server key and certificate:
   $ sudo mkdir /etc/solr
   $ sudo keytool -genkey -keyalg RSA -sigalg SHA256withRSA -keysize
4096 -validity 730 \
  -alias 'solr-ssl' -keystore /etc/solr/solr.p12 -storetype
PKCS12 \
  -ext san=dns:localhost,ip:192.168.10.20
 Use the following information for the certificate:
 First and Last name: 192.168.10.20 (or "localhost", or your
IP address)
 Org unit:  CHADIS Solr (Prod) (or dev)
 Everything else should be obvious

   Now, export the public key from the keystore.

   $ sudo /usr/local/java-8/bin/keytool -list -rfc -keystore
/etc/solr/solr.p12 -storetype PKCS12 -alias solr-ssl

   Copy that certificate and paste it into this command's stdin:

   $ sudo keytool -importcert -keystore /etc/solr/solr-server.p12
- -storetype PKCS12 -alias 'solr-ssl'

   Now, fix the ownership and permissions on these files:

   $ sudo chown root:solr /etc/solr/solr.p12 /etc/solr/solr-server.p12
   $ sudo chmod 0640 /etc/solr/solr.p12

   Edit the file /etc/default/solr.in.sh

   Set the following settings:

   SOLR_SSL_KEY_STORE=/etc/solr/solr.p12
   SOLR_SSL_KEY_STORE_TYPE=PKCS12
   SOLR_SSL_KEY_STORE_PASSWORD=whatever

   # You MUST set the trust store for some reason.
   SOLR_SSL_TRUST_STORE=/etc/solr/solr-server.p12
   SOLR_SSL_TRUST_STORE_TYPE=PKCS12
   SOLR_SSL_TRUST_STORE_PASSWORD=whatever

6. Configure Solr to Require Client TLS Certificates

  On each client, create a client key and certificate:

  $ keytool -genkey -keyalg EC -sigalg SHA256withECDSA \
-validity 730 -alias 'solr-client-ssl' \
-keystore /etc/solr/solr-client.p12 -storetype PKCS12

  Now dump the certificate for the next step:

  $ keytool -exportcert -keystore /etc/solr/solr-client.p12 -storetype
PKCS12 \
-alias 'solr-client-ssl' -rfc

  Don't forget that you might want to generate your own client certifica
te
  to use from you own web browser if you want to be able to connect to t
he
  server's dashboard.

  Use the output of that command on each client to put the cert(s)
into this
  trust store on the server:

  $ sudo keytool -importcert -keystore
/etc/solr/solr-trusted-clients.p12 \
 -storetype PKCS12 -alias '[client key alias]'

  Then, export the server's certificate and put IT into the
trusted-clients
  trust store, because command-line tools will use the server's own key
to
  contact itself.

  $ keytool -exportcert -keystore /etc/solr/solr-server.p12 -storetype
PKCS12 \
-alias 'solr-ssl'

  $ sudo keytool -importcert -keystore
/etc/solr/solr-trusted-clients.p12 \
 -storetype PKCS12 -alias 'solr-server'

  Now, set the proper file ownership and permissions:

  $ sudo chown root:solr /etc/solr/solr-trusted-clients.p12
  $ sudo chmod 0640 /etc/solr/solr-trusted-clients.p12

Edit /etc/default/solr.in.sh and add the following entries:

  # NOTE: Some of these are changing from "basic TLS" configuration.
  SOLR_SSL_NEED_CLIENT_AUTH=true
  SOLR_SSL_TRUST_STORE=/etc/solr/solr-trusted-clients.p12
  SOLR_SSL_TRUST_STORE_TYPE=PKCS12
  SOLR_SSL_TRUST_STORE_PASSWORD=whatever
  SOLR_SSL_CLIENT_TRUST_STORE=/etc/solr/solr-server.p12
  SOLR_SSL_CLIENT_TRUST_STORE_TYPE=PKCS12
  SOLR_SSL_CLIENT_TRUST_STORE_PASSWORD=whatever
  SOLR_SSL_CLIENT_KEY_STORE=/etc/solr/solr-client.p12
  SOLR_SSL_CLIENT_KEY_STORE_TYPE=PKCS12
  SOLR_SSL_CLIENT_KEY_STORE_PASSWORD=whatever

Summary of Files in /etc/solr
- -

solr.p12  Server keystore. Contains server key and certificate.
  Used by server to identify itself to clients.
  Should exist on Solr server.

solr-server.p12   Client trust store. Contains server's certificate.
  Used by clients to identify and trust the server.
  Should exist on Solr clients.

solr-client.p12   Client keystore. Contains client key and certificate.
  Used by clients to identify themselves to the server.
  Should exist on Solr clients when TLS client certs
are used.

solr-trusted-clients.p12
  Server trust store. Contains trusted client
certificates.
  Used by server to trust clients.
  Should exist on Solr servers when TLS client certs
are used.

[...]

Loading Data into a Core (Index)
- 
If you have installed Solr as a service using TLS, you will need to do
some
additional work to call Solr's "post" program. First, ensure you have
patched
bin/post according to the installation instructions above. Then

Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
Disregard my previous response.  When I reindexed, something went wrong and
so my Lucene database was empty, which explains the immediate results and 0
results.  I reindexed again (properly) and all is working find now.  Thanks
for the help.
Mark

On Fri, Jun 7, 2019 at 10:40 AM Erick Erickson 
wrote:

> Yeah, it can be opaque…
>
> My first guess is that you may not have a field “posttime” defined in your
> schema and/or documents. For searching it needs “indexed=true” and for
> faceting/grouping/sorting it should have “docValues=true”. That’s what your
> original facet query was telling you, the field isn’t there. Switching to
> an “fq” clause is consistent with there being no “posttime” field since
> Solr is fine with  docs that don’t have a  particular field. So by
> specifying a date range, any doc without a “posttime” field will be omitted
> from the results.
>
> Or it  just is spelled differently ;)
>
> Some things that might help:
>
> 1> Go to the admin UI and select cores>>your_core, then look at the
> “schema” link. There’s a drop-down that lets you select fields that are
> actually in your index and see  some of the values. My bet: “posttime”
> isn’t in the list. If so, you need to add it and re-index the docs  with a
> posttime field. If there is a “posttime”, select it and look at the upper
> right to see how it’s defined. There are two rows, one for what the schema
> thinks the definition is and one for what is actually in the Lucene  index.
>
> 2> add &debug=query to your queries, and run them from the admin UI.
> That’ll give you a _lot_ quicker turn-around as well as some good info
> about how  the query was actually executed.
>
> Best,
> Erick
>
> > On Jun 7, 2019, at 7:23 AM, Mark Fenbers - NOAA Federal
>  wrote:
> >
> > So, instead of addDateRangeFacet(), I used:
> > query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
> > 2015-01-01T00:00:00Z]");
> >
> > I didn't get any errors, but the query returned immediately with 0
> > results.  Without this contraint, it searches 13,000 records and takes 1
> to
> > 2 minutes and returns 356 records.  So something is not quite right, and
> > I'm too new at this to understand where I went wrong.
> > Mark
> >
> > On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
> > wrote:
> >
> >> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> >> it doesn't have any constraint on the results (i.e. it doesn't filter at
> >> all).
> >> You need to add a filter query [1] with a date range clause (e.g.
> >> fq=field:[ TO  >> or *>]).
> >>
> >> Best,
> >> Andrea
> >>
> >> [1]
> >>
> >>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> >> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
> >>
> >> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> >>> Hello!
> >>>
> >>> I have a search setup and it works fine.  I search a text field called
> >>> "logtext" in a database table.  My Java code is like this:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> query.setParam("df", "logtext");
> >>>
> >>> Then I execute the search... and it works just great.  But now I want
> to
> >>> add a constraint to only search for the "searchWord" within a certain
> >> range
> >>> of time -- given timestamps in the column called "posttime".  So, I
> added
> >>> the code in bold below:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> *query.setFacet(true);*
> >>> *query.addDateRangeFacet("posttime", new
> Date(System.currentTimeMillis()
> >> -
> >>> 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> >> /*
> >>> from 1 year ago to present) */*
> >>> query.setParam("df", "logtext");
> >>>
> >>> But this gives me a complaint: *undefined field: "posttime"* so I
> clearly
> >>> do not understand the arguments needed to addDateRangeFacet().  Can
> >> someone
> >>> help me determine the proper code for doing what I want?
> >>>
> >>> Further, I am puzzled about the "gap" argument [last one in
> >>> addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> >> have
> >>> no idea the purpose of this.  I haven't found any documentation that
> >>> explains this well.
> >>>
> >>> Mark
> >>>
> >>
> >>
>
>


Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
I added "posttime" to the schema first thing this morning, but your message
reminded me that I needed to re-index the table, which I did.  My schema
entry:



But my SQL contains "SELECT posttime as id" as so I tried both "posttime"
and "id" in my setParam() function, namely,
query.setParam("fq", "id:[2007-01-01T00:00:00Z TO 2010-01-01T00:00:00Z]");

So, whether I use "id" (string) or "posttime" (date), my results are an
immediate return of zero results.

I did look in the admin interface and *did* see posttime listed as one of
the index items.  The two rows (Index Analyzer and Query Analyzer) show the
same thing: org.apache.solr.schema.FieldType$DefaultAnalyzer, though I'm
not certain of the implications of this.

I have not attempted your &debug=query suggestion just yet...
Mark

On Fri, Jun 7, 2019 at 10:40 AM Erick Erickson 
wrote:

> Yeah, it can be opaque…
>
> My first guess is that you may not have a field “posttime” defined in your
> schema and/or documents. For searching it needs “indexed=true” and for
> faceting/grouping/sorting it should have “docValues=true”. That’s what your
> original facet query was telling you, the field isn’t there. Switching to
> an “fq” clause is consistent with there being no “posttime” field since
> Solr is fine with  docs that don’t have a  particular field. So by
> specifying a date range, any doc without a “posttime” field will be omitted
> from the results.
>
> Or it  just is spelled differently ;)
>
> Some things that might help:
>
> 1> Go to the admin UI and select cores>>your_core, then look at the
> “schema” link. There’s a drop-down that lets you select fields that are
> actually in your index and see  some of the values. My bet: “posttime”
> isn’t in the list. If so, you need to add it and re-index the docs  with a
> posttime field. If there is a “posttime”, select it and look at the upper
> right to see how it’s defined. There are two rows, one for what the schema
> thinks the definition is and one for what is actually in the Lucene  index.
>
> 2> add &debug=query to your queries, and run them from the admin UI.
> That’ll give you a _lot_ quicker turn-around as well as some good info
> about how  the query was actually executed.
>
> Best,
> Erick
>
> > On Jun 7, 2019, at 7:23 AM, Mark Fenbers - NOAA Federal
>  wrote:
> >
> > So, instead of addDateRangeFacet(), I used:
> > query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
> > 2015-01-01T00:00:00Z]");
> >
> > I didn't get any errors, but the query returned immediately with 0
> > results.  Without this contraint, it searches 13,000 records and takes 1
> to
> > 2 minutes and returns 356 records.  So something is not quite right, and
> > I'm too new at this to understand where I went wrong.
> > Mark
> >
> > On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
> > wrote:
> >
> >> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> >> it doesn't have any constraint on the results (i.e. it doesn't filter at
> >> all).
> >> You need to add a filter query [1] with a date range clause (e.g.
> >> fq=field:[ TO  >> or *>]).
> >>
> >> Best,
> >> Andrea
> >>
> >> [1]
> >>
> >>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> >> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
> >>
> >> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> >>> Hello!
> >>>
> >>> I have a search setup and it works fine.  I search a text field called
> >>> "logtext" in a database table.  My Java code is like this:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> query.setParam("df", "logtext");
> >>>
> >>> Then I execute the search... and it works just great.  But now I want
> to
> >>> add a constraint to only search for the "searchWord" within a certain
> >> range
> >>> of time -- given timestamps in the column called "posttime".  So, I
> added
> >>> the code in bold below:
> >>>
> >>> SolrQuery query - new SolrQuery();
> >>> query.setQuery(searchWord);
> >>> *query.setFacet(true);*
> >>> *query.addDateRangeFacet("posttime", new
> Date(System.currentTimeMillis()
> >> -
> >>> 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> >> /*
> >>> from 1 year ago to present) */*
> >>> query.setParam("df", "logtext");
> >>>
> >>> But this gives me a complaint: *undefined field: "posttime"* so I
> clearly
> >>> do not understand the arguments needed to addDateRangeFacet().  Can
> >> someone
> >>> help me determine the proper code for doing what I want?
> >>>
> >>> Further, I am puzzled about the "gap" argument [last one in
> >>> addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> >> have
> >>> no idea the purpose of this.  I haven't found any documentation that
> >>> explains this well.
> >>>
> >>> Mark
> >>>
> >>
> >>
>
>


Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread Erick Erickson



> On Jun 7, 2019, at 7:53 AM, David Santamauro  
> wrote:
> 
> So is this new optimize maxSegments / commit expungeDeletes behavior in 7.5? 
> My experience, and I watch the my optimize process very closely, is that 
> using maxSgements does not touch every segment with a deleted document. 
> expungeDeletes merges all segments that have deleted documents that have been 
> touched with said commit.
> 

Which part? 

The  different thing about 7.5 is that an optimize that doesn’t specify 
maxSegments will remove all deleted docs from an index without creating massive 
segments. Prior to 7.5 a simple optimize would create a single segment by 
default, no matter how large.

If, after the end of an optimize on a quiescent index, you see a difference 
between maxDoc and numDocs (or  deletedDocs  > 0) for a core, then that’s 
entirely unexpected  for any version of Solr.  NOTE: If you are actively 
indexing while optimizing you may see deleted docs in your index after optimize 
since optimize works on the segments it sees when the operation starts….

ExpungeDeletes has always, IIUC, defaulted to only merging segments  with > 10% 
deleted docs.

Best,
Erick

> After reading LUCENE-7976, it seems this is, indeed, new behavior.
> 
> 
> On 6/7/19, 10:31 AM, "Erick Erickson"  wrote:
> 
>Optimizing guarantees that there will be _no_ deleted documents in an 
> index when done. If a segment has even one deleted document, it’s merged, no 
> matter what you specify for maxSegments. 
> 
>Segments are write-once, so to remove deleted data from a segment it must 
> be at least rewritten into a new segment, whether or not it’s merged with 
> another segment on optimize.
> 
>expungeDeletes  does _not_ merge every segment that has deleted documents. 
> It merges segments that have > 10% (the default) deleted documents. If your 
> index happens to have all segments with > 10% deleted docs, then it will, 
> indeed, merge all of them.
> 
>In your example, if you look closely you should find that all segments 
> that had any deleted documents were written (merged) to new segments. I’d 
> expect that segments with _no_ deleted documents might mostly be left alone. 
> And two of the segments were chosen to merge together.
> 
>See LUCENE-7976 for a long discussion of how this changed starting  with 
> SOLR 7.5.
> 
>Best,
>Erick
> 
>> On Jun 7, 2019, at 7:07 AM, David Santamauro  
>> wrote:
>> 
>> Erick, on 6.0.1, optimize with maxSegments only merges down to the specified 
>> number. E.g., given an index with 75 segments, optimize with maxSegments=74 
>> will only merge 2 segments leaving 74 segments. It will choose a segment to 
>> merge that has deleted documents, but does not merge every segment with 
>> deleted documents.
>> 
>> I think you are thinking about the expungeDeletes parameter on the commit 
>> request. That will merge every segment that has a deleted document.
>> 
>> 
>> On 6/7/19, 10:00 AM, "Erick Erickson"  wrote:
>> 
>>   This isn’t quite right. Solr will rewrite _all_ segments that have _any_ 
>> deleted documents in them when optimizing, even one. Given your description, 
>> I’d guess that all your segments will have deleted documents, so even if you 
>> do specify maxSegments on the optimize command, the entire index will be 
>> rewritten.
>> 
>>   You’re in a bind, see: 
>> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
>>  You have this one massive segment and it will _not_ be merged until it’s 
>> almost all deleted documents, see the link above for a fuller explanation.
>> 
>>   Prior to Solr 7.5 you don’t have many options except to re-index and _not_ 
>> optimize. So if possible I’d reindex from scratch into a new collection and 
>> do not optimize. Or restructure your process such that you can optimize in a 
>> quiet period when little indexing is going on.
>> 
>>   Best,
>>   Erick
>> 
>>> On Jun 7, 2019, at 2:51 AM, jena  wrote:
>>> 
>>> Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
>>> version. Is there any API i can use to get my segment information ? Will try
>>> to use maxSegments and see if it can help us during optimization.
>>> 
>>> 
>>> 
>>> --
>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>> 
>> 
> 
> 



Re: Query takes a long time Solr 6.1.0

2019-06-07 Thread Shawn Heisey

On 6/6/2019 5:45 AM, vishal patel wrote:

One server(256GB RAM) has two below Solr instance and other application also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)

The second server(256GB RAM and 1 TB storage) has two below Solr instance and 
other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)


An 80GB heap is ENORMOUS.  And you have two of those per server.  Do you 
*know* that you need a heap that large?  You only have 50 million 
documents total, two instances that each have 80GB seems completely 
unnecessary.  I would think that one instance with a much smaller heap 
would handle just about anything you could throw at 50 million documents.


With 160GB taken by heaps, you're leaving less than 100GB of memory to 
cache over 700GB of index.  This is not going to work well, especially 
if your index doesn't have many fields that are stored.  It will cause a 
lot of disk I/O.



Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5


Unless you have changed the DirectoryFactory to something that's not 
default, your process listing does not reflect over 700GB of index data. 
 If you have changed the DirectoryFactory, then I would strongly 
recommend removing that part of your config and letting Solr use its 
default.



Note: Average 40GB heap used normally in each Solr instance. when replica gets 
down at that time disk IO are high and also GC pause time above 15 seconds. We 
can not identify the exact issue of replica recovery OR down from logs. due to 
the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to 
heavy indexing?


With an 80GB heap, I'm not really surprised you're seeing GC pauses 
above 15 seconds.  I have seen pauses that long with a heap that's only 8GB.


GC pauses lasting that long will cause problems with SolrCloud.  Nodes 
going into recovery is common.


Thanks,
Shawn


Re: Re: Query takes a long time Solr 6.1.0

2019-06-07 Thread David Hastings
There isnt anything wrong aside from your query is poorly thought out.

On Fri, Jun 7, 2019 at 11:04 AM vishal patel 
wrote:

> Any one is looking my issue??
>
> Get Outlook for Android
>
> 
> From: vishal patel
> Sent: Thursday, June 6, 2019 5:15:15 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query takes a long time Solr 6.1.0
>
> Thanks for your reply.
>
> > How much index data is on one server with 256GB of memory?  What is the
> > max heap size on the Solr instance?  Is there only one Solr instance?
>
> One server(256GB RAM) has two below Solr instance and other application
> also
> 1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
> 2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)
>
> The second server(256GB RAM and 1 TB storage) has two below Solr instance
> and other application also
> 1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
> 2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)
>
> Both server memory and disk usage:
> https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5
>
> Note: Average 40GB heap used normally in each Solr instance. when replica
> gets down at that time disk IO are high and also GC pause time above 15
> seconds. We can not identify the exact issue of replica recovery OR down
> from logs. due to the GC pause? OR due to disk IO high? OR due to
> time-consuming query? OR due to heavy indexing?
>
> Regards,
> Vishal
> 
> From: Shawn Heisey 
> Sent: Wednesday, June 5, 2019 7:10 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Query takes a long time Solr 6.1.0
>
> On 6/5/2019 7:08 AM, vishal patel wrote:
> > I have attached RAR file but not attached properly. Again attached txt
> file.
> >
> > For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram
> > and 1 TB storage. One shard and another shard replica in one server.
>
> You got lucky.  Even text files usually don't make it to the list --
> yours did this time.  Use a file sharing website in the future.
>
> That is a massive query.  The primary reason that Lucene defaults to a
> maxBooleanClauses value of 1024, which you are definitely exceeding
> here, is that queries with that many clauses tend to be slow and consume
> massive levels of resources.  It might not be possible to improve the
> query speed very much here if you cannot reduce the size of the query.
>
> Your query doesn't look like it is simple enough to replace with the
> terms query parser, which has better performance than a boolean query
> with thousands of "OR" clauses.
>
> How much index data is on one server with 256GB of memory?  What is the
> max heap size on the Solr instance?  Is there only one Solr instance?
>
> The screenshot mentioned here will most likely relay all the info I am
> looking for.  Be sure the sort is correct:
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
>
> You will not be able to successfully attach the screenshot to a message.
>   That will require a file sharing website.
>
> Thanks,
> Shawn
>


Fwd: Re: Query takes a long time Solr 6.1.0

2019-06-07 Thread vishal patel
Any one is looking my issue??

Get Outlook for Android


From: vishal patel
Sent: Thursday, June 6, 2019 5:15:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

Thanks for your reply.

> How much index data is on one server with 256GB of memory?  What is the
> max heap size on the Solr instance?  Is there only one Solr instance?

One server(256GB RAM) has two below Solr instance and other application also
1) shards1 (80GB heap ,790GB Storage, 449GB Indexed data)
2) replica of shard2 (80GB heap, 895GB Storage, 337GB Indexed data)

The second server(256GB RAM and 1 TB storage) has two below Solr instance and 
other application also
1) shards2 (80GB heap, 790GB Storage, 338GB Indexed data)
2) replica of shard1 (80GB heap, 895GB Storage, 448GB Indexed data)

Both server memory and disk usage:
https://drive.google.com/drive/folders/11GoZy8C0i-qUGH-ranPD8PCoPWCxeS-5

Note: Average 40GB heap used normally in each Solr instance. when replica gets 
down at that time disk IO are high and also GC pause time above 15 seconds. We 
can not identify the exact issue of replica recovery OR down from logs. due to 
the GC pause? OR due to disk IO high? OR due to time-consuming query? OR due to 
heavy indexing?

Regards,
Vishal

From: Shawn Heisey 
Sent: Wednesday, June 5, 2019 7:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Query takes a long time Solr 6.1.0

On 6/5/2019 7:08 AM, vishal patel wrote:
> I have attached RAR file but not attached properly. Again attached txt file.
>
> For 2 shards and 2 replicas, we have 2 servers and each has 256 GB ram
> and 1 TB storage. One shard and another shard replica in one server.

You got lucky.  Even text files usually don't make it to the list --
yours did this time.  Use a file sharing website in the future.

That is a massive query.  The primary reason that Lucene defaults to a
maxBooleanClauses value of 1024, which you are definitely exceeding
here, is that queries with that many clauses tend to be slow and consume
massive levels of resources.  It might not be possible to improve the
query speed very much here if you cannot reduce the size of the query.

Your query doesn't look like it is simple enough to replace with the
terms query parser, which has better performance than a boolean query
with thousands of "OR" clauses.

How much index data is on one server with 256GB of memory?  What is the
max heap size on the Solr instance?  Is there only one Solr instance?

The screenshot mentioned here will most likely relay all the info I am
looking for.  Be sure the sort is correct:

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

You will not be able to successfully attach the screenshot to a message.
  That will require a file sharing website.

Thanks,
Shawn


Configure mutual TLS 1.2 to secure SOLR

2019-06-07 Thread Paul
Hi,

Can someone please outline how to use mutual TLS 1.2 with SOLR. Or, point me
at docs/tutorials/other where I can read up further on this (version
currently onsite is SOLR 7.6).

Thanks
Paul



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Custom cache for Solr Cloud mode

2019-06-07 Thread Shawn Heisey

On 6/7/2019 8:49 AM, Erick Erickson wrote:

Yes. ZooKeeper has a “blob store”. See the Blob Store API in the ref guide.

Minor nit. You will be creating  a jar file, and configuring your collection to 
be able to find the new jar file.  Then you _upload_ both to ZooKeeper and 
reload your collection. The rest should be automatic, Solr  should look for the 
 jar file in ZooKeeper and pull it down locally to each node as necessary.


I thought the blob store was the .system collection (ultimately a Lucene 
index) ... hadn't ever heard of uploading a jar to ZK.  The 
documentation seems to concur.


https://lucene.apache.org/solr/guide/7_5/blob-store-api.html

Thanks,
Shawn


Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread David Santamauro
So is this new optimize maxSegments / commit expungeDeletes behavior in 7.5? My 
experience, and I watch the my optimize process very closely, is that using 
maxSgements does not touch every segment with a deleted document. 
expungeDeletes merges all segments that have deleted documents that have been 
touched with said commit.

After reading LUCENE-7976, it seems this is, indeed, new behavior.


On 6/7/19, 10:31 AM, "Erick Erickson"  wrote:

Optimizing guarantees that there will be _no_ deleted documents in an index 
when done. If a segment has even one deleted document, it’s merged, no matter 
what you specify for maxSegments. 

Segments are write-once, so to remove deleted data from a segment it must 
be at least rewritten into a new segment, whether or not it’s merged with 
another segment on optimize.

expungeDeletes  does _not_ merge every segment that has deleted documents. 
It merges segments that have > 10% (the default) deleted documents. If your 
index happens to have all segments with > 10% deleted docs, then it will, 
indeed, merge all of them.

In your example, if you look closely you should find that all segments that 
had any deleted documents were written (merged) to new segments. I’d expect 
that segments with _no_ deleted documents might mostly be left alone. And two 
of the segments were chosen to merge together.

See LUCENE-7976 for a long discussion of how this changed starting  with 
SOLR 7.5.

Best,
Erick

> On Jun 7, 2019, at 7:07 AM, David Santamauro  
wrote:
> 
> Erick, on 6.0.1, optimize with maxSegments only merges down to the 
specified number. E.g., given an index with 75 segments, optimize with 
maxSegments=74 will only merge 2 segments leaving 74 segments. It will choose a 
segment to merge that has deleted documents, but does not merge every segment 
with deleted documents.
> 
> I think you are thinking about the expungeDeletes parameter on the commit 
request. That will merge every segment that has a deleted document.
> 
> 
> On 6/7/19, 10:00 AM, "Erick Erickson"  wrote:
> 
>This isn’t quite right. Solr will rewrite _all_ segments that have 
_any_ deleted documents in them when optimizing, even one. Given your 
description, I’d guess that all your segments will have deleted documents, so 
even if you do specify maxSegments on the optimize command, the entire index 
will be rewritten.
> 
>You’re in a bind, see: 
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
 You have this one massive segment and it will _not_ be merged until it’s 
almost all deleted documents, see the link above for a fuller explanation.
> 
>Prior to Solr 7.5 you don’t have many options except to re-index and 
_not_ optimize. So if possible I’d reindex from scratch into a new collection 
and do not optimize. Or restructure your process such that you can optimize in 
a quiet period when little indexing is going on.
> 
>Best,
>Erick
> 
>> On Jun 7, 2019, at 2:51 AM, jena  wrote:
>> 
>> Thanks @Nicolas Franck for reply, i don't see any any segment info for 
4.4
>> version. Is there any API i can use to get my segment information ? Will 
try
>> to use maxSegments and see if it can help us during optimization.
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 
> 




Re: Custom cache for Solr Cloud mode

2019-06-07 Thread Erick Erickson
Yes. ZooKeeper has a “blob store”. See the Blob Store API in the ref guide.

Minor nit. You will be creating  a jar file, and configuring your collection to 
be able to find the new jar file.  Then you _upload_ both to ZooKeeper and 
reload your collection. The rest should be automatic, Solr  should look for the 
 jar file in ZooKeeper and pull it down locally to each node as necessary.

Best,
Erick

> On Jun 6, 2019, at 11:06 PM, abhishek  wrote:
> 
> 
> Thanks for the response.
> 
> Eric, 
> Are you suggesting to download this file from zookeeper, and upload it after
> changing ? 
> 
> Mikhail,
> Thanks. I will try solrCore.SolrConfg.userCacheConfigs option.
> Any idea why, CoreContainer->getCores() would be returning empty list for me
> ?
> 
> (CoreAdminRequest.setAction(CoreAdminAction.STATUS);
> CoreAdminRequest.process(solrClient); -> gives me list of cores correctly)
> 
> -Abhishek
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: searching only within a date range

2019-06-07 Thread Erick Erickson
Yeah, it can be opaque…

My first guess is that you may not have a field “posttime” defined in your 
schema and/or documents. For searching it needs “indexed=true” and for 
faceting/grouping/sorting it should have “docValues=true”. That’s what your 
original facet query was telling you, the field isn’t there. Switching to an 
“fq” clause is consistent with there being no “posttime” field since Solr is 
fine with  docs that don’t have a  particular field. So by specifying a date 
range, any doc without a “posttime” field will be omitted from the results.

Or it  just is spelled differently ;)

Some things that might help:

1> Go to the admin UI and select cores>>your_core, then look at the “schema” 
link. There’s a drop-down that lets you select fields that are actually in your 
index and see  some of the values. My bet: “posttime” isn’t in the list. If so, 
you need to add it and re-index the docs  with a posttime field. If there is a 
“posttime”, select it and look at the upper right to see how it’s defined. 
There are two rows, one for what the schema thinks the definition is and one 
for what is actually in the Lucene  index.

2> add &debug=query to your queries, and run them from the admin UI. That’ll 
give you a _lot_ quicker turn-around as well as some good info about how  the 
query was actually executed.

Best,
Erick

> On Jun 7, 2019, at 7:23 AM, Mark Fenbers - NOAA Federal 
>  wrote:
> 
> So, instead of addDateRangeFacet(), I used:
> query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
> 2015-01-01T00:00:00Z]");
> 
> I didn't get any errors, but the query returned immediately with 0
> results.  Without this contraint, it searches 13,000 records and takes 1 to
> 2 minutes and returns 356 records.  So something is not quite right, and
> I'm too new at this to understand where I went wrong.
> Mark
> 
> On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
> wrote:
> 
>> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
>> it doesn't have any constraint on the results (i.e. it doesn't filter at
>> all).
>> You need to add a filter query [1] with a date range clause (e.g.
>> fq=field:[ TO > or *>]).
>> 
>> Best,
>> Andrea
>> 
>> [1]
>> 
>> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
>> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
>> 
>> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
>>> Hello!
>>> 
>>> I have a search setup and it works fine.  I search a text field called
>>> "logtext" in a database table.  My Java code is like this:
>>> 
>>> SolrQuery query - new SolrQuery();
>>> query.setQuery(searchWord);
>>> query.setParam("df", "logtext");
>>> 
>>> Then I execute the search... and it works just great.  But now I want to
>>> add a constraint to only search for the "searchWord" within a certain
>> range
>>> of time -- given timestamps in the column called "posttime".  So, I added
>>> the code in bold below:
>>> 
>>> SolrQuery query - new SolrQuery();
>>> query.setQuery(searchWord);
>>> *query.setFacet(true);*
>>> *query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis()
>> -
>>> 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
>> /*
>>> from 1 year ago to present) */*
>>> query.setParam("df", "logtext");
>>> 
>>> But this gives me a complaint: *undefined field: "posttime"* so I clearly
>>> do not understand the arguments needed to addDateRangeFacet().  Can
>> someone
>>> help me determine the proper code for doing what I want?
>>> 
>>> Further, I am puzzled about the "gap" argument [last one in
>>> addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
>> have
>>> no idea the purpose of this.  I haven't found any documentation that
>>> explains this well.
>>> 
>>> Mark
>>> 
>> 
>> 



Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread Erick Erickson
Optimizing guarantees that there will be _no_ deleted documents in an index 
when done. If a segment has even one deleted document, it’s merged, no matter 
what you specify for maxSegments. 

Segments are write-once, so to remove deleted data from a segment it must be at 
least rewritten into a new segment, whether or not it’s merged with another 
segment on optimize.

expungeDeletes  does _not_ merge every segment that has deleted documents. It 
merges segments that have > 10% (the default) deleted documents. If your index 
happens to have all segments with > 10% deleted docs, then it will, indeed, 
merge all of them.

In your example, if you look closely you should find that all segments that had 
any deleted documents were written (merged) to new segments. I’d expect that 
segments with _no_ deleted documents might mostly be left alone. And two of the 
segments were chosen to merge together.

See LUCENE-7976 for a long discussion of how this changed starting  with SOLR 
7.5.

Best,
Erick

> On Jun 7, 2019, at 7:07 AM, David Santamauro  
> wrote:
> 
> Erick, on 6.0.1, optimize with maxSegments only merges down to the specified 
> number. E.g., given an index with 75 segments, optimize with maxSegments=74 
> will only merge 2 segments leaving 74 segments. It will choose a segment to 
> merge that has deleted documents, but does not merge every segment with 
> deleted documents.
> 
> I think you are thinking about the expungeDeletes parameter on the commit 
> request. That will merge every segment that has a deleted document.
> 
> 
> On 6/7/19, 10:00 AM, "Erick Erickson"  wrote:
> 
>This isn’t quite right. Solr will rewrite _all_ segments that have _any_ 
> deleted documents in them when optimizing, even one. Given your description, 
> I’d guess that all your segments will have deleted documents, so even if you 
> do specify maxSegments on the optimize command, the entire index will be 
> rewritten.
> 
>You’re in a bind, see: 
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
>  You have this one massive segment and it will _not_ be merged until it’s 
> almost all deleted documents, see the link above for a fuller explanation.
> 
>Prior to Solr 7.5 you don’t have many options except to re-index and _not_ 
> optimize. So if possible I’d reindex from scratch into a new collection and 
> do not optimize. Or restructure your process such that you can optimize in a 
> quiet period when little indexing is going on.
> 
>Best,
>Erick
> 
>> On Jun 7, 2019, at 2:51 AM, jena  wrote:
>> 
>> Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
>> version. Is there any API i can use to get my segment information ? Will try
>> to use maxSegments and see if it can help us during optimization.
>> 
>> 
>> 
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 
> 



Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread jena
Thanks @Erick for the suggestions. That looks so bad, yes your assumptions
are right, we have lot of delete & index documents as well. 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
So, instead of addDateRangeFacet(), I used:
query.setParam("fq", "posttime:[2010-01-01T00:00:00Z TO
2015-01-01T00:00:00Z]");

I didn't get any errors, but the query returned immediately with 0
results.  Without this contraint, it searches 13,000 records and takes 1 to
2 minutes and returns 356 records.  So something is not quite right, and
I'm too new at this to understand where I went wrong.
Mark

On Fri, Jun 7, 2019 at 9:52 AM Andrea Gazzarini 
wrote:

> Hi Mark, you are using a "range facet" which is a "query-shape" feature,
> it doesn't have any constraint on the results (i.e. it doesn't filter at
> all).
> You need to add a filter query [1] with a date range clause (e.g.
> fq=field:[ TO  or *>]).
>
> Best,
> Andrea
>
> [1]
>
> https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter
> [2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
>
> On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:
> > Hello!
> >
> > I have a search setup and it works fine.  I search a text field called
> > "logtext" in a database table.  My Java code is like this:
> >
> > SolrQuery query - new SolrQuery();
> > query.setQuery(searchWord);
> > query.setParam("df", "logtext");
> >
> > Then I execute the search... and it works just great.  But now I want to
> > add a constraint to only search for the "searchWord" within a certain
> range
> > of time -- given timestamps in the column called "posttime".  So, I added
> > the code in bold below:
> >
> > SolrQuery query - new SolrQuery();
> > query.setQuery(searchWord);
> > *query.setFacet(true);*
> > *query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis()
> -
> > 1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY");
> /*
> > from 1 year ago to present) */*
> > query.setParam("df", "logtext");
> >
> > But this gives me a complaint: *undefined field: "posttime"* so I clearly
> > do not understand the arguments needed to addDateRangeFacet().  Can
> someone
> > help me determine the proper code for doing what I want?
> >
> > Further, I am puzzled about the "gap" argument [last one in
> > addDateRangeFacet()].  What does this do?  I used +1DAY, but I really
> have
> > no idea the purpose of this.  I haven't found any documentation that
> > explains this well.
> >
> > Mark
> >
>
>


Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread jena
Thanks Shawn for suggestions. Interesting to know deleteByQuery has some
impact, will try to change it as you have suggested. Thabks



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr cloud setup

2019-06-07 Thread Erick Erickson
First of all, do not shard unless necessary to handle your QPS requirements. 
Sharding adds overhead and has some functionality limitations. How to define 
“necessary”? Load test a single shard (or even stand-alone with a single core) 
until it falls over. See: 
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
 for an outline of the process.

“handle your QPS rate” is a bit tricky. What I’m talking about there is the 
ability to 
1> index at an adequate speed
2> get queries back with acceptable latency.

Let’s say you test and can get 20 queries per second on a single shard, but 
need 100 QPS. Then add 4 more _replicas_ (not shards) to that single-sharded 
system for a total of 5 replicas x 1 shard.

My general expectation (and YMMV) is for 50M docs/shard. I’ve seen 300M docs on 
a single shard and 10M so the range is very wide depending on your particular 
needs. Given your index size, you’re in the range where sharding becomes 
desirable, but you have to test first.

Finally, note that there’s quite a jump going from  1 replica (leader only)  to 
2 in terms of indexing. The leader has to forward the docs to the follower and 
that shows up. In very heavy indexing scenarios I’ve seen this matter, if it 
does in your situation consider TLOG or PULL replica types.

Best,
Erick

> On Jun 7, 2019, at 1:53 AM, Emir Arnautović  
> wrote:
> 
> Hi Abhishek,
> Here is a nice blog post about migrating to SolrCloud: 
> https://sematext.com/blog/solr-master-slave-solrcloud-migration/ 
> 
> 
> Re number of shards - there is no definite answer - it depends on your 
> indexing/search latency requirements. Only tests can tell. Here are some 
> thought on how to perform tests: 
> https://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html 
> 
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 7 Jun 2019, at 09:05, Midas A  wrote:
>> 
>> Hi ,
>> 
>> Currently we are in master slave architechture we want to move in solr
>> cloud architechture .
>> how i should decide shard number in solr cloud ?
>> 
>> My current solr in version 6 and index size is 300 GB.
>> 
>> 
>> 
>> Regards,
>> Abhishek Tiwari
> 



Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread David Santamauro

/clarification/ ... expungeDeletes will merge every segment *touched by the 
current commit* that has a deleted document.


On 6/7/19, 10:07 AM, "David Santamauro"  wrote:

Erick, on 6.0.1, optimize with maxSegments only merges down to the 
specified number. E.g., given an index with 75 segments, optimize with 
maxSegments=74 will only merge 2 segments leaving 74 segments. It will choose a 
segment to merge that has deleted documents, but does not merge every segment 
with deleted documents.

I think you are thinking about the expungeDeletes parameter on the commit 
request. That will merge every segment that has a deleted document.


On 6/7/19, 10:00 AM, "Erick Erickson"  wrote:

This isn’t quite right. Solr will rewrite _all_ segments that have 
_any_ deleted documents in them when optimizing, even one. Given your 
description, I’d guess that all your segments will have deleted documents, so 
even if you do specify maxSegments on the optimize command, the entire index 
will be rewritten.

You’re in a bind, see: 
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
 You have this one massive segment and it will _not_ be merged until it’s 
almost all deleted documents, see the link above for a fuller explanation.

Prior to Solr 7.5 you don’t have many options except to re-index and 
_not_ optimize. So if possible I’d reindex from scratch into a new collection 
and do not optimize. Or restructure your process such that you can optimize in 
a quiet period when little indexing is going on.

Best,
Erick

> On Jun 7, 2019, at 2:51 AM, jena  wrote:
> 
> Thanks @Nicolas Franck for reply, i don't see any any segment info 
for 4.4
> version. Is there any API i can use to get my segment information ? 
Will try
> to use maxSegments and see if it can help us during optimization.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html





Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread David Santamauro
Erick, on 6.0.1, optimize with maxSegments only merges down to the specified 
number. E.g., given an index with 75 segments, optimize with maxSegments=74 
will only merge 2 segments leaving 74 segments. It will choose a segment to 
merge that has deleted documents, but does not merge every segment with deleted 
documents.

I think you are thinking about the expungeDeletes parameter on the commit 
request. That will merge every segment that has a deleted document.


On 6/7/19, 10:00 AM, "Erick Erickson"  wrote:

This isn’t quite right. Solr will rewrite _all_ segments that have _any_ 
deleted documents in them when optimizing, even one. Given your description, 
I’d guess that all your segments will have deleted documents, so even if you do 
specify maxSegments on the optimize command, the entire index will be rewritten.

You’re in a bind, see: 
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
 You have this one massive segment and it will _not_ be merged until it’s 
almost all deleted documents, see the link above for a fuller explanation.

Prior to Solr 7.5 you don’t have many options except to re-index and _not_ 
optimize. So if possible I’d reindex from scratch into a new collection and do 
not optimize. Or restructure your process such that you can optimize in a quiet 
period when little indexing is going on.

Best,
Erick

> On Jun 7, 2019, at 2:51 AM, jena  wrote:
> 
> Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
> version. Is there any API i can use to get my segment information ? Will 
try
> to use maxSegments and see if it can help us during optimization.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread Michael Joyner
That is the way we do it here - also helps a lot with not needing x2 or 
x3 disk space to handle the merge:


public void solrOptimize() {
        int initialMaxSegments = 256;
        int finalMaxSegments = 4;
        if (isShowSegmentCounter()) {
            log.info("Optimizing ...");
        }
        try (SolrClient solrServerInstance = getSolrClientInstance()) {
            for (int segments = initialMaxSegments; segments >= 
finalMaxSegments; segments--) {

                if (isShowSegmentCounter()) {
                    System.out.println("Optimizing to a max of " + 
segments + " segments.");

                }
                try {
                    solrServerInstance.optimize(true, true, segments);
                } catch (RemoteSolrException | SolrServerException | 
IOException e) {

                    log.severe(e.getMessage());
                }
            }
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

On 6/7/19 4:56 AM, Nicolas Franck wrote:

In that case, hard optimisation like that is out the question.
Resort to automatic merge policies, specifying a maximum
amount of segments. Solr is created with multiple segments
in mind. Hard optimisation seems like not worth the problem.

The problem is this: the less segments you specify during
during an optimisation, the longer it will take, because it has to read
all of these segments to be merged, and redo the sorting. And a cluster
has a lot of housekeeping on top of it.

If you really want to issue a optimisation, then you can
also do it in steps (max segments parameter)

10 -> 9 -> 8 -> 7 .. -> 1

that way less segments need to be merged in one go.

testing your index will show you what a good maximum
amount of segments is for your index.


On 7 Jun 2019, at 07:27, jena  wrote:

Hello guys,

We have 4 solr(version 4.4) instance on production environment, which are
linked/associated with zookeeper for replication. We do heavy deleted & add
operations. We have around 26million records and the index size is around
70GB. We serve 100k+ requests per day.


Because of heavy indexing & deletion, we optimise solr instance everyday,
because of that our solr cloud getting unstable , every solr instance go on
recovery mode & our search is getting affected & very slow because of that.
Optimisation takes around 1hr 30minutes.
We are not able fix this issue, please help.

Thanks & Regards



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html




Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread Erick Erickson
This isn’t quite right. Solr will rewrite _all_ segments that have _any_ 
deleted documents in them when optimizing, even one. Given your description, 
I’d guess that all your segments will have deleted documents, so even if you do 
specify maxSegments on the optimize command, the entire index will be rewritten.

You’re in a bind, see: 
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
 You have this one massive segment and it will _not_ be merged until it’s 
almost all deleted documents, see the link above for a fuller explanation.

Prior to Solr 7.5 you don’t have many options except to re-index and _not_ 
optimize. So if possible I’d reindex from scratch into a new collection and do 
not optimize. Or restructure your process such that you can optimize in a quiet 
period when little indexing is going on.

Best,
Erick

> On Jun 7, 2019, at 2:51 AM, jena  wrote:
> 
> Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
> version. Is there any API i can use to get my segment information ? Will try
> to use maxSegments and see if it can help us during optimization.
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: searching only within a date range

2019-06-07 Thread Andrea Gazzarini
Hi Mark, you are using a "range facet" which is a "query-shape" feature, 
it doesn't have any constraint on the results (i.e. it doesn't filter at 
all).
You need to add a filter query [1] with a date range clause (e.g. 
fq=field:[ TO or *>]).


Best,
Andrea

[1] 
https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-Thefq_FilterQuery_Parameter

[2] https://lucene.apache.org/solr/guide/6_6/working-with-dates.html

On 07/06/2019 14:02, Mark Fenbers - NOAA Federal wrote:

Hello!

I have a search setup and it works fine.  I search a text field called
"logtext" in a database table.  My Java code is like this:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
query.setParam("df", "logtext");

Then I execute the search... and it works just great.  But now I want to
add a constraint to only search for the "searchWord" within a certain range
of time -- given timestamps in the column called "posttime".  So, I added
the code in bold below:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
*query.setFacet(true);*
*query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis() -
1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY"); /*
from 1 year ago to present) */*
query.setParam("df", "logtext");

But this gives me a complaint: *undefined field: "posttime"* so I clearly
do not understand the arguments needed to addDateRangeFacet().  Can someone
help me determine the proper code for doing what I want?

Further, I am puzzled about the "gap" argument [last one in
addDateRangeFacet()].  What does this do?  I used +1DAY, but I really have
no idea the purpose of this.  I haven't found any documentation that
explains this well.

Mark





Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread Shawn Heisey

On 6/6/2019 11:27 PM, jena wrote:

Because of heavy indexing & deletion, we optimise solr instance everyday,
because of that our solr cloud getting unstable , every solr instance go on
recovery mode & our search is getting affected & very slow because of that.
Optimisation takes around 1hr 30minutes.


Ordinarily, optimizing would just be a transparent operation and even 
though it's slow, wouldn't be something that would interfere with index 
operation.


But if you add deleteByQuery to the mix, then you WILL have problems. 
These problems can occur even if you don't optimize -- because sometimes 
the normal segment merges will take a very long time like an optimize, 
and the same interference between deleteByQuery and segment merging will 
happen.


The fix for that is to stop doing deleteByQuery.  Replace it with a two 
step operation where you first do the query to get ID values, and then 
do deleteById.  That kind of delete will not have any bad interaction 
with segment merging.


Thanks,
Shawn


searching only within a date range

2019-06-07 Thread Mark Fenbers - NOAA Federal
Hello!

I have a search setup and it works fine.  I search a text field called
"logtext" in a database table.  My Java code is like this:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
query.setParam("df", "logtext");

Then I execute the search... and it works just great.  But now I want to
add a constraint to only search for the "searchWord" within a certain range
of time -- given timestamps in the column called "posttime".  So, I added
the code in bold below:

SolrQuery query - new SolrQuery();
query.setQuery(searchWord);
*query.setFacet(true);*
*query.addDateRangeFacet("posttime", new Date(System.currentTimeMillis() -
1000L * 86400L * 365L), new Date(System.currentTimeMillis()), "+1DAY"); /*
from 1 year ago to present) */*
query.setParam("df", "logtext");

But this gives me a complaint: *undefined field: "posttime"* so I clearly
do not understand the arguments needed to addDateRangeFacet().  Can someone
help me determine the proper code for doing what I want?

Further, I am puzzled about the "gap" argument [last one in
addDateRangeFacet()].  What does this do?  I used +1DAY, but I really have
no idea the purpose of this.  I haven't found any documentation that
explains this well.

Mark


Issues with calculating metrics and sorting on a float field in a stream

2019-06-07 Thread Oleksandr Chornyi
Hi guys!

I bumped into a couple of issues when trying to sort a stream or calculate
metrics on a Float field which contains values without the decimal part
(e.g 1.0, 0.0, etc.).

1. Issues with sorting. Consider this expression:

> sort(
> list(
>tuple(a=val(1.0)),
>tuple(a=val(2.0)),
>tuple(a=val(3.0))
> ),
> by="a desc"
> )

It executes sort just fine and returns

> "docs": [
>   {"a": 3},
>   {"a": 2},
>   {"a": 1}
> ]

The only minor issue at this point is that float numbers changed their
original type to integers. However, I'll get back to this later.
Now let's do a simple calculation over the same stream and try to sort it:

> sort(
> select(
> list(
>tuple(a=val(1.0)),
>tuple(a=val(2.0)),
>tuple(a=val(3.0))
> ),
> div(a, 2) as a
> ),
> by="a desc"
> )

This expression returns "EXCEPTION": "java.lang.Long cannot be cast to
java.lang.Double". This happens because of the div() function which returns
different data types for different tuples. If you execute just the select
expression:

> select(
> list(
> tuple(a=val(1.0)),
> tuple(a=val(2.0)),
> tuple(a=val(3.0))
> ),
> div(a, 2) as a
> )

It will return tuples where field "a" will have mixed Long and Double data
types:

> "docs": [
>   {"a": 0.5},
>   {"a": 1},
>   {"a": 1.5}
> ]

This is why sort stumbles upon it.
I think that the root cause of this issue lies in the
RecursiveEvaluator#normalizeOutputType method which returns Long is a
BigDecimal value has zero scale:

} else if(value instanceof BigDecimal){
>   BigDecimal bd = (BigDecimal)value;
>   if(bd.signum() == 0 || bd.scale() <= 0 || bd.stripTrailingZeros().scale() 
> <= 0){
> try{
>   return bd.longValueExact();
> } catch(ArithmeticException e){
>   // value was too big for a long, so use a double which can handle 
> scientific notation
> }
>   }
>   return bd.doubleValue();
> }

I consider this to be a major bug because even when your source stream
contains only Float/Double values, applying any arithmetic operation might
result in a value without decimal part which will be converted to Long that
will break sorting. Can you confirm that this is a bug, so that I'll create
a ticket?

2. The fact that Streaming Expressions engine heavily relies on the
assumption that a stream will contain numeric values of the same type leads
to subtle issues with calculating metrics. Consider this expression:

> rollup(
> list(
>tuple(a=val(1.1), g=1),
>tuple(a=val(2), g=1),
>tuple(a=val(3.1), g=1)
> ),
> over="g",
> min(a),
> max(a),
> sum(a),
> avg(a)
> )

(I showed earlier how you can get a stream of mixed types) It returns:

> {
>   "max(a)": 2,
>   "avg(a)": 0.,
>   "min(a)": 2,
>   "sum(a)": 2,
>   "g": "1"
> }

As you can see the results are wrong for all metrics. All metrics
considered only Long values from the source stream. In my case, it was
value '2'.
This happens because the implementation of all metrics holds separate
containers for Long and Double values. For example MaxMetric#getValue:

public Number getValue() {
>   if(longMax == Long.MIN_VALUE) {
> return doubleMax;
>   } else {
> return longMax;
>   }
> }

If a stream contained at least one Long among Doubles, the value of the
longMax container would be returned. I consider this a severe design flaw
and would like to get your perspective on this. Should I file a bug or I
miss something? Can I expect that this will be fixed at some point?

My ENV: solr-impl 7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan -
2019-02-23 02:39:07

Thank you in advance!
-- 
Best Regards,
Alex Chornyi


Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread jena
Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
version. Is there any API i can use to get my segment information ? Will try
to use maxSegments and see if it can help us during optimization.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr slave core corrupted and not replicating.

2019-06-07 Thread varma mahesh
Hi team,

Please help us in the issue mentioned above.
If this is not the right place to look please direct us to the correct
team.

Thanks & Regards,
Y Mahesh Varma

On Thu, 6 Jun, 2019, 1:21 AM varma mahesh,  wrote:

> ++solr-user@lucene.apache.org
>
> On Thu 6 Jun, 2019, 1:19 AM varma mahesh,  wrote:
>
>> Hi Team,
>>
>>
>> What happens to Sitecore - Solr query handling when a core is corrupted
>> in Solr slave in a Master - slave setup?
>>
>> Our Sitecore site's solr search engine is a master-slave setup. One of
>> the cores of the Slave is corrupted and is not available at all in Slave.
>>
>> It is not being replicated from Master too (Expecting index replication
>> to do this but core is completely missing in Slave). As read in index
>> replication documentation, all the queries are handled by Slave part of the
>> set up.
>>
>> What happens to the queries that are handled by this core that is missing
>> in slave?
>>
>> Will they be taken over by Master?
>>
>> Please help me as I can find no info about this anywhere else. For info
>> the core that is missing is of Sitecore
>> analytics index.
>>
>> The error that Solr slave showing us for Analytics core is:
>>
>> Ojrg.apache.solr.common.SolrException: Error opening new searcher
>>  at org.apache.solr.core.SolrCore.(SolrCore.java:815)
>>   at org.apache.solr.core.SolrCore.(SolrCore.java:658)
>>at org.apache.solr.core.CoreContainer.create(CoreContainer.java:637)
>>   at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:381)
>>  at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:375)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:148)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
>>  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>>   at java.lang.Thread.run(Thread.java:748)
>>
>> Can you please help us here why this happened? We could not find any info
>> from the logs that is leading us to this error.
>>
>>
>>


RE: query parsed in different ways in two identical solr instances

2019-06-07 Thread Danilo Tomasoni
any thoughts on that difference in the solr parsing? is it correct that the 
first looks like an AND while the second looks like and OR?
Thank you

Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for 
Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
tomas...@cosbi.eu
http://www.cosbi.eu

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatment in the 
respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to


From: Danilo Tomasoni [tomas...@cosbi.eu]
Sent: 06 June 2019 16:21
To: solr-user@lucene.apache.org
Subject: RE: query parsed in different ways in two identical solr instances

The two collections are not identical, many overlapping documents but with some 
different field names (test has also extra fields that 1 didn't have).
Actually we have 42.000.000 docs in solr1, and 40.000.000 in solr-test, but I 
think this shouldn'd be relevant because the query is basically like

id=x AND mesh=list of phrase queries

where the second part of the and is handled through a nested query (_query_ 
magic keyword).

I expect that a query like this one would return 1 documents (x) or 0 documents.

The thing that puzzles me is that on solr1 the engine is returning 1 document 
(x)
while on test the engine is returning 68.000 documents..
If you look at my first e-mail you will notice that in the correct engine the 
parsed query is like

+(+(...) +(...))

That is correct for an AND

while in the test engine the query is parsed like

+((...) (...))

which is more like an OR...


Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for 
Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
tomas...@cosbi.eu
http://www.cosbi.eu

As for the European General Data Protection Regulation 2016/679 on the 
protection of natural persons with regard to the processing of personal data, 
we inform you that all the data we possess are object of treatment in the 
respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may 
ask for their correction, cancellation or you may oppose to their use by 
written request sent by recorded delivery to The Microsoft Research – 
University of Trento Centre for Computational and Systems Biology Scarl, Piazza 
Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to


From: Alexandre Rafalovitch [arafa...@gmail.com]
Sent: 06 June 2019 15:53
To: solr-user
Subject: Re: query parsed in different ways in two identical solr instances

Those two queries look same after sorting the parameters, yet the
results are clearly different. That means the difference is deeper.

1) Have you checked that both collections have the same amount of
documents (e.g. mismatched final commit). Does basic "query=*:*"
return the same counts in the same initial order?
2) Are you absolutely sure you are comparing 7.3.0 with 7.3.1? There
was SOLR-11501 that may be relevant, but it was fixed in 7.2:
https://issues.apache.org/jira/browse/SOLR-11501

Regards,
   Alex.

Are you absolutely sure that your instances are 7.3.0 and 7.3.1?

On Thu, 6 Jun 2019 at 09:26, Danilo Tomasoni  wrote:
>
> Hello, and thank you for your answer.
> Attached you will find the two logs for the working solr1 server, and the 
> non-working solr-test server.
>
>
> Danilo Tomasoni
>
>
> Fondazione The Microsoft Research - University of Trento Centre for 
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> tomas...@cosbi.eu
> http://www.cosbi.eu
>
> As for the European General Data Protection Regulation 2016/679 on the 
> protection of natural persons with regard to the processing of personal data, 
> we inform you that all the data we possess are object of treatment in the 
> respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how; you 
> may ask for their correction, cancellation or you may oppose to their use by 
> written request sent by recorded delivery to The Microsoft Research – 
> University of Trento Centre for Computational and Systems Biology Scarl, 
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail 

Re: Urgent help on solr optimisation issue !!

2019-06-07 Thread Nicolas Franck
In that case, hard optimisation like that is out the question.
Resort to automatic merge policies, specifying a maximum
amount of segments. Solr is created with multiple segments
in mind. Hard optimisation seems like not worth the problem.

The problem is this: the less segments you specify during
during an optimisation, the longer it will take, because it has to read
all of these segments to be merged, and redo the sorting. And a cluster
has a lot of housekeeping on top of it.

If you really want to issue a optimisation, then you can
also do it in steps (max segments parameter)

10 -> 9 -> 8 -> 7 .. -> 1

that way less segments need to be merged in one go.

testing your index will show you what a good maximum
amount of segments is for your index.

> On 7 Jun 2019, at 07:27, jena  wrote:
> 
> Hello guys,
> 
> We have 4 solr(version 4.4) instance on production environment, which are
> linked/associated with zookeeper for replication. We do heavy deleted & add
> operations. We have around 26million records and the index size is around
> 70GB. We serve 100k+ requests per day.
> 
> 
> Because of heavy indexing & deletion, we optimise solr instance everyday,
> because of that our solr cloud getting unstable , every solr instance go on
> recovery mode & our search is getting affected & very slow because of that.
> Optimisation takes around 1hr 30minutes. 
> We are not able fix this issue, please help.
> 
> Thanks & Regards
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Solr cloud setup

2019-06-07 Thread Emir Arnautović
Hi Abhishek,
Here is a nice blog post about migrating to SolrCloud: 
https://sematext.com/blog/solr-master-slave-solrcloud-migration/ 


Re number of shards - there is no definite answer - it depends on your 
indexing/search latency requirements. Only tests can tell. Here are some 
thought on how to perform tests: 
https://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html 


HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 7 Jun 2019, at 09:05, Midas A  wrote:
> 
> Hi ,
> 
> Currently we are in master slave architechture we want to move in solr
> cloud architechture .
> how i should decide shard number in solr cloud ?
> 
> My current solr in version 6 and index size is 300 GB.
> 
> 
> 
> Regards,
> Abhishek Tiwari



Urgent help on solr optimisation issue !!

2019-06-07 Thread jena
Hello guys,

We have 4 solr(version 4.4) instance on production environment, which are
linked/associated with zookeeper for replication. We do heavy deleted & add
operations. We have around 26million records and the index size is around
70GB. We serve 100k+ requests per day.


Because of heavy indexing & deletion, we optimise solr instance everyday,
because of that our solr cloud getting unstable , every solr instance go on
recovery mode & our search is getting affected & very slow because of that.
Optimisation takes around 1hr 30minutes. 
We are not able fix this issue, please help.

Thanks & Regards



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr cloud setup

2019-06-07 Thread Midas A
Hi ,

Currently we are in master slave architechture we want to move in solr
cloud architechture .
how i should decide shard number in solr cloud ?

My current solr in version 6 and index size is 300 GB.



Regards,
Abhishek Tiwari