Re: Shard replica labels in Solr Admin graph?

2018-02-28 Thread Shawn Heisey

On 2/28/2018 5:42 PM, Scott Prentice wrote:
We initially tested our Solr Cloud implementation on a single VM with 
3 Solr servers and 3 Zookeeper servers. Once that seemed good, we 
moved to 3 VMs with 1 Solr/Zookeeper on each. That's all looking good, 
but in the Solr Admin > Cloud > Graph, all of my shard replicas are on 
"127.0.1.1" .. with the single VM setup it listed the port number so 
you could tell which "server" it was on.


Is there some way to get the shard replicas to list with the actual 
IPs of the server that the replica is on, rather than 127.0.1.1?


That is not going to work if those are separate machines.

There are two ways to fix this.

One is to figure out why Java is choosing a loopback address when it 
attempts to detect the machine's hostname.  I'm almost certain that 
/etc/hosts is set up incorrectly.  In my opinion, a typical /etc/hosts 
file should have two IPv4 lines, one defining localhost as 127.0.0.1, 
and another defining the machine's actual IP address as both the fully 
qualified domain name and the short hostname. An example:


127.0.0.1   localhost
192.168.1.200   smeagol.REDACTED.com    smeagol

The machine's hostname should not be found on any line that does not 
have a real IP address on it.


The other way to solve the problem is to specify the "host" system 
property to override Java's detection of the machine address/hostname.  
You can either add a commandline option to set the property, or add it 
to solr.xml.  Note that if your solr.xml file is in zookeeper, then you 
can't use solr.xml.  This is because with solr.xml in zookeeper, every 
machine would have the same host definition, and that won't work.


https://lucene.apache.org/solr/guide/6_6/format-of-solr-xml.html#the-code-solrcloud-code-element

Thanks,
Shawn



Re: Configuration of SOLR Cluster

2018-02-28 Thread Shawn Heisey

On 2/28/2018 6:54 AM, James Keeney wrote:

I did notice one thing in the logs:

2018-02-28 13:21:58,932 [myid:1] - INFO 
[/172.31.86.130:3888:QuorumCnxManager$Listener@743] - *Received 
connection request /172.31.73.122:34804 *




When the restarted node attempts to reconnect with the ensemble it 
looks like it does so on a random port. Could it be that nodes in the 
ensemble are rejecting the new request to rejoin because they are not 
listening on that port? And why is it not requesting on 3888:2888? 
This is confusing to me.


That appears to be the source port.  Which is generally going to be a 
very high port and semi-unpredictable.Normal TCP operation.


I have attached a ZK log and a SOLR log. You can watch the whole 
progression in the ZK log as it goes from happy to disconnected to 
trying to reconnect to part of the ensemble when the other nodes are 
restarted. Seems like ZK holds onto a state based on the original 
ensemble interactions and that state prevents the node from rejoining 
the ensemble. The state is then lost with the restart which allows the 
members to re-establish connection and form the new ensemble.


What timestamps correspond to the actions you took?

Lots and lots of connections refused.  Unless there's something 
preventing network access, I would only expect connections to be refused 
if the software isn't running or isn't listening on the destination port.


Which ZK is that log from?  The one that you shut down to begin testing, 
or one of the others?  I see some very large time gaps in the log:


=
2018-02-26 18:28:18,066 [myid:1] - INFO 
[LearnerHandler-/172.31.73.122:57652:LearnerHandler@535] - Received 
NEWLEADER-ACK message from 3
2018-02-26 18:56:19,656 [myid:1] - INFO [SyncThread:1:FileTxnLog@203] - 
Creating new log file: log.40001

=

=
2018-02-26 18:56:26,286 [myid:1] - WARN 
[SendWorker:3:QuorumCnxManager$SendWorker@954] - Send worker leaving thread
2018-02-26 19:34:38,103 [myid:] - INFO [main:QuorumPeerConfig@134] - 
Reading configuration from: /opt/zookeeper/current/bin/../conf/zoo.cfg

=

The first gap is nearly half an hour, the second is more than half an hour.

What happens after the second gap appears to be a program startup.  The 
things logged at 18:56:nn *might* be program shutdown, but the log 
doesn't explicitly say so.  If it is a shutdown, then the program was 
not running for quite a while.


I would definitely take this problem to the ZK mailing list.  The 
server-side problems don't involve Solr at all.  You are having problems 
with Solr, but they are completely within the ZK client code.  Likely 
both problems have the same root cause, so I'd start with the 
server-side issues.


Solr 6.6.2 contains ZK version 3.4.10.  Not the latest, but close.

Thanks,
Shawn



Shard replica labels in Solr Admin graph?

2018-02-28 Thread Scott Prentice
We initially tested our Solr Cloud implementation on a single VM with 3 
Solr servers and 3 Zookeeper servers. Once that seemed good, we moved to 
3 VMs with 1 Solr/Zookeeper on each. That's all looking good, but in the 
Solr Admin > Cloud > Graph, all of my shard replicas are on "127.0.1.1" 
.. with the single VM setup it listed the port number so you could tell 
which "server" it was on.


Is there some way to get the shard replicas to list with the actual IPs 
of the server that the replica is on, rather than 127.0.1.1?


Thanks!
...scott



RE: Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-28 Thread YELESWARAPU, VENKATA BHAN
Information Classification: ll Limited Access

Thank you for your reply Shawn. I'm not part of that user list so I never 
received any emails so far. 
Could you please subscribe me (vyeleswar...@statestreet.com) or let me know the 
process?
Also I would greatly appreciate if you could forward any responses received for 
this issue.

To answer your question, we see these messages in the solr log file. Solr 
search option is visible on the UI but when we search for a text, it says "No 
results found". 
The index files are not getting generated/created. We have the index job 
scheduled to run every min, and solr log file is filled with below messages.
"Object not fetched because its identifier appears to be already in 
processing". 

These are the Solr & lucene versions.
solr-spec4.3.1
solr-impl 4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33
lucene-spec4.3.1
lucene-impl 4.3.1 1491148 - shalinmangar - 2013-06-09 12:07:58

Solr master and slave configuration is working fine and I'm able to access the 
urls.
All we are trying is to make the search function work on UI. Please let me know 
if you need any more details.

P.S: Kindly keep me in Cc until I'm added to the user list.

Thank you,
Dutt

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Tuesday, February 27, 2018 10:09 PM
To: solr-user@lucene.apache.org
Cc: YELESWARAPU, VENKATA BHAN 
Subject: Re: Gentle reminder RE: Object not fetched because its identifier 
appears to be already in processing

On 2/27/2018 7:08 AM, YELESWARAPU, VENKATA BHAN wrote:
> While indexing job is running we are seeing the below message for all the 
> objects.
>
> Object not fetched because its identifier appears to be already in 
> processing

This time, I am going to include you as a CC on the message.  This is not 
normally something that I do, because posting to the list normally requires 
subscribing to the list, so you should be getting all replies from the list.

I'm pretty sure that I already replied once asking for information, but I never 
got a response.

Another thing I said on my last reply:  The text of the error message you have 
provided (in the subject and in the text I quoted above) is not in the Solr or 
Lucene codebase.  So right away we know that it wasn't generated by Solr.  It 
may have been generated by the other piece of software *inside* Solr, but 
without the rest of the error information, we have no way of knowing what 
actually happened.  Solr errors tend to be dozens of lines long, with most of 
the output being a Java stacktrace.  And in order to make sense of the 
stacktrace, we must have the Solr version.

In addition to the details Cassandra mentioned, there's one bit that will be 
critical:

Where *exactly* did you see this error?  Was it in the Solr admin UI, the Solr 
logfile, the logging output from your indexing program, or somewhere else?

Thanks,
Shawn



Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread Shawn Heisey

On 2/28/2018 2:53 AM, 苗海泉 wrote:

Thanks for your detailed advice, the monitor product you are talking about
is good, but our solr system is running on a private network and seems to
be unusable at all, with no single downloadable application for analyzing
specific gc logs.


For analyzing GC logs, the GCViewer app is useful.  With some practice 
(learning to disable irrelevent information) you can pinpoint problems.  
It also compiles statistics about GC intervals, which can be very 
helpful.  It is an executable jar.


https://github.com/chewiebug/GCViewer

But I have found an even easier tool for general use:

http://gceasy.io/

I still find value in GCViewer, but most of the time the information I'm 
after is provided by gceasy, and it's a lot easier to decipher.


Possible disadvantage for gceasy: it's an online tool. So you have to 
copy the log out of disconnected networks into a machine with Internet 
access.  I don't anticipate any sort of privacy problems with them -- 
logs that you upload are not kept very long, and GC logs don't contain 
anything sensitive anyway.


Thanks,
Shawn



Re: Defining Document Transformers in Solr Configuration

2018-02-28 Thread Alexandre Rafalovitch
I am not sure I fully understood the desired transformation, but
perhaps something from https://people.apache.org/~hossman/rev2016/
would help.

I am specifically thinking of:
*) f.person.qf example
*) ${people} example

Regards,
   Alex.

On 27 February 2018 at 15:20, simon  wrote:
> We do quite complex data pulls from a Solr index for subsequent analytics,
> currently using a home-grown Python API. Queries might include  a handful
> of pseudofields which this API rewrites to an aliased field invoking a
> Document Transformer in the 'fl' parameter list.
>
> For example 'numcites' is transformed to
>
> 'fl= ,numcites:[subquery]=pmid={!terms
> f=md_c_pmid v=$row.pmid}=10=q',...'
>
> What I'd ideally like to be able to do would be have this transformation
> defined in Solr configuration so that it's not tied  to one particular
> external API -  defining a macro, if you will, so that you could supply
> 'fl='a,b,c,%numcites%,...' in the request and have Solr do the expansion.
>
> Is there some way to do this that I've overlooked ? if not, I think it
> would be a useful new feature.
>
>
> -Simon


Re: Defining Document Transformers in Solr Configuration

2018-02-28 Thread simon
Thanks Mikhail:

I considered that, but not all queries would request that field, and there
are in fact a couple more similar DocTransformer-generated aliased fields
which we can optionally request, so it's not a general enough solution.

-Simon

On Wed, Feb 28, 2018 at 1:18 AM, Mikhail Khludnev  wrote:

> Hello, Simon.
>
> You can define a search handler where have 
> numcites:[subquery]=pmid={!terms
> f=md_c_pmid v=$row.pmid}=10=q
> 
> or something like that.
>
> On Tue, Feb 27, 2018 at 11:20 PM, simon  wrote:
>
> > We do quite complex data pulls from a Solr index for subsequent
> analytics,
> > currently using a home-grown Python API. Queries might include  a handful
> > of pseudofields which this API rewrites to an aliased field invoking a
> > Document Transformer in the 'fl' parameter list.
> >
> > For example 'numcites' is transformed to
> >
> > 'fl= ,numcites:[subquery]=pmid={!terms
> > f=md_c_pmid v=$row.pmid}=10=q',...'
> >
> > What I'd ideally like to be able to do would be have this transformation
> > defined in Solr configuration so that it's not tied  to one particular
> > external API -  defining a macro, if you will, so that you could supply
> > 'fl='a,b,c,%numcites%,...' in the request and have Solr do the expansion.
> >
> > Is there some way to do this that I've overlooked ? if not, I think it
> > would be a useful new feature.
> >
> >
> > -Simon
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread Emir Arnautović
If you are after only visualising GC, there are several tools that you can 
download or upload logs to visualise. If you would like to monitor all 
host/solr/jvm, Sematext’s SPM also comes in on-premises  version, where you 
install and host your own monitoring infrastructure: 
https://sematext.com/spm/#on-premises 

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 28 Feb 2018, at 10:53, 苗海泉  wrote:
> 
> Thanks for your detailed advice, the monitor product you are talking about
> is good, but our solr system is running on a private network and seems to
> be unusable at all, with no single downloadable application for analyzing
> specific gc logs.
> 
> 2018-02-28 16:57 GMT+08:00 Emir Arnautović  >:
> 
>> Hi,
>> I would start with following:
>> 1. have dedicated nodes for ZK ensemble - those do not have to be powerful
>> nodes (maybe 2-4 cores and 8GB RAM)
>> 2. reduce heap size to value below margin where JVM can use compressed
>> oops - 31GB should be safe size
>> 3. shard collection to all nodes
>> 4. increase rollover interval to 2h so you keep shard size/number as it is
>> today.
>> 5. experiment with slightly larger rollover intervals (e.g. 3h) if query
>> latency is still acceptable. That will result in less shards that are
>> slightly larger.
>> 
>> In any case monitor your cluster to see how changes affect it. Not sure
>> what you currently use for monitoring, but manual scanning of GC logs is
>> not fun. You can check out our monitoring tool if you don’t have one or if
>> it does not give you enough visibility: https://sematext.com/spm/ <
>> https://sematext.com/spm/ >
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 28 Feb 2018, at 02:42, 苗海泉  wrote:
>>> 
>>> Thank you, I read under the memory footprint, I set 75% recovery, memory
>>> occupancy at about 76%, the other we zookeeper not on a dedicated server,
>>> perhaps because of this cause instability.
>>> 
>>> What else do you recommend for me to check?
>>> 
>>> 2018-02-27 22:37 GMT+08:00 Emir Arnautović >> :
>>> 
 This does not show much: only that your heap is around 75% (24-25GB). I
 was thinking that you should compare metrics (heap/GC as well) when
>> running
 on without issues and when running with issues and see if something can
>> be
 concluded.
 About instability: Do you run ZK on dedicated nodes?
 
 Emir
 --
 Monitoring - Log Management - Alerting - Anomaly Detection
 Solr & Elasticsearch Consulting Support Training - http://sematext.com/
 
 
 
> On 27 Feb 2018, at 14:43, 苗海泉  wrote:
> 
> Thank you, we were 49 shard 49 nodes, but later found that in this
>> case,
> often disconnect between solr and zookeepr, zookeeper too many nodes
 caused
> solr instability, so reduced to 25 A follow-up performance can not keep
 up
> also need to increase back.
> 
> Very slow when solr and zookeeper not found any errors, just build the
> index slow, automatic commit inside the log display is slow, but the
>> main
> reason may not lie in the commit place.
> 
> I am sorry, I do not know how to look at the utilization of java heap,
> through the gc log, gc time is not long, I posted the log:
> 
> 
> {Heap before GC invocations=1144021 (full 72):
> garbage-first heap   total 33554432K, used 26982419K
>> [0x7f147800,
> 0x7f1478808000, 0x7f1c7800)
> region size 8192K, 204 young (1671168K), 26 survivors (212992K)
> Metaspace   used 41184K, capacity 41752K, committed 67072K,
>> reserved
> 67584K
> 2018-02-27T21:43:01.793+0800: 4668016.044: [GC pause (G1 Evacuation
 Pause)
> (young)
> Desired survivor size 109051904 bytes, new threshold 1 (max 15)
> - age   1:  113878760 bytes,  113878760 total
> - age   2:   21264744 bytes,  135143504 total
> - age   3:   17020096 bytes,  152163600 total
> - age   4:   26870864 bytes,  179034464 total
> , 0.0579794 secs]
> [Parallel Time: 46.9 ms, GC Workers: 18]
>[GC Worker Start (ms): Min: 4668016046.1, Avg: 4668016046.3, Max:
> 4668016046.4, Diff: 0.3]
>[Ext Root Scanning (ms): Min: 2.4, Avg: 6.5, Max: 46.3, Diff: 43.9,
> Sum: 116.9]
>[Update RS (ms): Min: 0.0, Avg: 3.4, Max: 6.0, Diff: 6.0, Sum:
>> 62.0]
>   [Processed Buffers: Min: 0, Avg: 6.3, Max: 16, Diff: 16, Sum:
 113]
>[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
>[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread 苗海泉
Thanks for your detailed advice, the monitor product you are talking about
is good, but our solr system is running on a private network and seems to
be unusable at all, with no single downloadable application for analyzing
specific gc logs.

2018-02-28 16:57 GMT+08:00 Emir Arnautović :

> Hi,
> I would start with following:
> 1. have dedicated nodes for ZK ensemble - those do not have to be powerful
> nodes (maybe 2-4 cores and 8GB RAM)
> 2. reduce heap size to value below margin where JVM can use compressed
> oops - 31GB should be safe size
> 3. shard collection to all nodes
> 4. increase rollover interval to 2h so you keep shard size/number as it is
> today.
> 5. experiment with slightly larger rollover intervals (e.g. 3h) if query
> latency is still acceptable. That will result in less shards that are
> slightly larger.
>
> In any case monitor your cluster to see how changes affect it. Not sure
> what you currently use for monitoring, but manual scanning of GC logs is
> not fun. You can check out our monitoring tool if you don’t have one or if
> it does not give you enough visibility: https://sematext.com/spm/ <
> https://sematext.com/spm/>
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 28 Feb 2018, at 02:42, 苗海泉  wrote:
> >
> > Thank you, I read under the memory footprint, I set 75% recovery, memory
> > occupancy at about 76%, the other we zookeeper not on a dedicated server,
> > perhaps because of this cause instability.
> >
> > What else do you recommend for me to check?
> >
> > 2018-02-27 22:37 GMT+08:00 Emir Arnautović  >:
> >
> >> This does not show much: only that your heap is around 75% (24-25GB). I
> >> was thinking that you should compare metrics (heap/GC as well) when
> running
> >> on without issues and when running with issues and see if something can
> be
> >> concluded.
> >> About instability: Do you run ZK on dedicated nodes?
> >>
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 27 Feb 2018, at 14:43, 苗海泉  wrote:
> >>>
> >>> Thank you, we were 49 shard 49 nodes, but later found that in this
> case,
> >>> often disconnect between solr and zookeepr, zookeeper too many nodes
> >> caused
> >>> solr instability, so reduced to 25 A follow-up performance can not keep
> >> up
> >>> also need to increase back.
> >>>
> >>> Very slow when solr and zookeeper not found any errors, just build the
> >>> index slow, automatic commit inside the log display is slow, but the
> main
> >>> reason may not lie in the commit place.
> >>>
> >>> I am sorry, I do not know how to look at the utilization of java heap,
> >>> through the gc log, gc time is not long, I posted the log:
> >>>
> >>>
> >>> {Heap before GC invocations=1144021 (full 72):
> >>> garbage-first heap   total 33554432K, used 26982419K
> [0x7f147800,
> >>> 0x7f1478808000, 0x7f1c7800)
> >>> region size 8192K, 204 young (1671168K), 26 survivors (212992K)
> >>> Metaspace   used 41184K, capacity 41752K, committed 67072K,
> reserved
> >>> 67584K
> >>> 2018-02-27T21:43:01.793+0800: 4668016.044: [GC pause (G1 Evacuation
> >> Pause)
> >>> (young)
> >>> Desired survivor size 109051904 bytes, new threshold 1 (max 15)
> >>> - age   1:  113878760 bytes,  113878760 total
> >>> - age   2:   21264744 bytes,  135143504 total
> >>> - age   3:   17020096 bytes,  152163600 total
> >>> - age   4:   26870864 bytes,  179034464 total
> >>> , 0.0579794 secs]
> >>>  [Parallel Time: 46.9 ms, GC Workers: 18]
> >>> [GC Worker Start (ms): Min: 4668016046.1, Avg: 4668016046.3, Max:
> >>> 4668016046.4, Diff: 0.3]
> >>> [Ext Root Scanning (ms): Min: 2.4, Avg: 6.5, Max: 46.3, Diff: 43.9,
> >>> Sum: 116.9]
> >>> [Update RS (ms): Min: 0.0, Avg: 3.4, Max: 6.0, Diff: 6.0, Sum:
> 62.0]
> >>>[Processed Buffers: Min: 0, Avg: 6.3, Max: 16, Diff: 16, Sum:
> >> 113]
> >>> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
> >>> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> >>> Sum: 0.0]
> >>> [Object Copy (ms): Min: 0.1, Avg: 23.8, Max: 25.5, Diff: 25.5, Sum:
> >>> 428.1]
> >>> [Termination (ms): Min: 0.0, Avg: 12.7, Max: 13.5, Diff: 13.5, Sum:
> >>> 228.9]
> >>>[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum:
> >> 18]
> >>> [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4,
> Sum:
> >>> 1.2]
> >>> [GC Worker Total (ms): Min: 46.4, Avg: 46.6, Max: 46.7, Diff: 0.3,
> >>> Sum: 838.0]
> >>> [GC Worker End (ms): Min: 4668016092.8, Avg: 4668016092.8, Max:
> >>> 4668016092.8, Diff: 0.0]
> >>>  [Code Root Fixup: 0.2 ms]
> >>>  [Code Root Purge: 0.0 ms]
> >>>  [Clear CT: 0.3 ms]
> >>>  [Other: 10.7 

Authentication and distributed search in 7.2.1

2018-02-28 Thread Peter Sturge
Hi,
In 7.2.1 there's the authentication module and associated security.json
file, which works well for single cores. (Note: standalone mode, no
SolrCloud)
It doesn't appear to work with distributed searches, including multi-shard
local searches .
  e.g. shards=localhost:8983/solr/core1,localhost:8983/solr/core2

Even when shards is just a single core  - shards=localhost:8983/solr/core1,
if the base search is to a different core (e.g.
http://localhost:8983/solr/somecore/select?
shards=localhost:8983/solr/core1.. , no error and no results are returned
status=0 numfound=0.

Can anyone please confirm if Solr 7 authentication does/doesn't support
distributed/sharded searches?

Many thanks,
Peter


RE: Gentle reminder RE: Object not fetched because its identifier appears to be already in processing

2018-02-28 Thread YELESWARAPU, VENKATA BHAN
Information Classification: ll Limited Access

Thanks Shawn, I submitted a request to subscribe.

After restarting jboss and redeploying solr war files, index files were 
successfully created and global search is now working.
Not sure what really was the problem and what was fixed. :) We spent few days 
on this.

Anyways, thank you very much for your support. Will get back if I have any 
questions.

Good day
Dutt

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Wednesday, February 28, 2018 1:08 PM
To: solr-user@lucene.apache.org
Cc: YELESWARAPU, VENKATA BHAN 
Subject: Re: Gentle reminder RE: Object not fetched because its identifier 
appears to be already in processing

On 2/28/2018 12:06 AM, YELESWARAPU, VENKATA BHAN wrote:
> Thank you for your reply Shawn. I'm not part of that user list so I never 
> received any emails so far.
> Could you please subscribe me (vyeleswar...@statestreet.com) or let me know 
> the process?
> Also I would greatly appreciate if you could forward any responses received 
> for this issue.

You can subscribe yourself.  That's not something I can do for you.

http://lucene.apache.org/solr/community.html#mailing-lists-irc

> To answer your question, we see these messages in the solr log file. Solr 
> search option is visible on the UI but when we search for a text, it says "No 
> results found".
> The index files are not getting generated/created. We have the index job 
> scheduled to run every min, and solr log file is filled with below messages.
> "Object not fetched because its identifier appears to be already in 
> processing".

Can you place that logfile (ideally the whole thing) somewhere and provide a 
URL for accessing it?  There are many paste websites and many file sharing 
sites that you can use to do this.  With the actual logfile, hopefully the 
problem can be found.

If I do a google search for that error message, the only thing that comes up is 
messages from you.  It doesn't appear to be something that people have 
encountered before.

> These are the Solr & lucene versions.
>  solr-spec4.3.1
>  solr-impl 4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33
>  lucene-spec4.3.1
>  lucene-impl 4.3.1 1491148 - shalinmangar - 2013-06-09 12:07:58

If we determine that there *is* a bug, it will need to be demonstrated in the 
current version (7.2.1) before it can be fixed.  There will be no more 4.x 
releases.  As you can see, the version you're running is nearly five years old.

> Solr master and slave configuration is working fine and I'm able to access 
> the urls.
> All we are trying is to make the search function work on UI. Please let me 
> know if you need any more details.

What happens if you leave the query as *:* and execute it? This is special 
syntax for all documents.

Thanks,
Shawn



Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread Emir Arnautović
Hi,
I would start with following:
1. have dedicated nodes for ZK ensemble - those do not have to be powerful 
nodes (maybe 2-4 cores and 8GB RAM)
2. reduce heap size to value below margin where JVM can use compressed oops - 
31GB should be safe size
3. shard collection to all nodes
4. increase rollover interval to 2h so you keep shard size/number as it is 
today.
5. experiment with slightly larger rollover intervals (e.g. 3h) if query 
latency is still acceptable. That will result in less shards that are slightly 
larger.

In any case monitor your cluster to see how changes affect it. Not sure what 
you currently use for monitoring, but manual scanning of GC logs is not fun. 
You can check out our monitoring tool if you don’t have one or if it does not 
give you enough visibility: https://sematext.com/spm/ 
 

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 28 Feb 2018, at 02:42, 苗海泉  wrote:
> 
> Thank you, I read under the memory footprint, I set 75% recovery, memory
> occupancy at about 76%, the other we zookeeper not on a dedicated server,
> perhaps because of this cause instability.
> 
> What else do you recommend for me to check?
> 
> 2018-02-27 22:37 GMT+08:00 Emir Arnautović :
> 
>> This does not show much: only that your heap is around 75% (24-25GB). I
>> was thinking that you should compare metrics (heap/GC as well) when running
>> on without issues and when running with issues and see if something can be
>> concluded.
>> About instability: Do you run ZK on dedicated nodes?
>> 
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 27 Feb 2018, at 14:43, 苗海泉  wrote:
>>> 
>>> Thank you, we were 49 shard 49 nodes, but later found that in this case,
>>> often disconnect between solr and zookeepr, zookeeper too many nodes
>> caused
>>> solr instability, so reduced to 25 A follow-up performance can not keep
>> up
>>> also need to increase back.
>>> 
>>> Very slow when solr and zookeeper not found any errors, just build the
>>> index slow, automatic commit inside the log display is slow, but the main
>>> reason may not lie in the commit place.
>>> 
>>> I am sorry, I do not know how to look at the utilization of java heap,
>>> through the gc log, gc time is not long, I posted the log:
>>> 
>>> 
>>> {Heap before GC invocations=1144021 (full 72):
>>> garbage-first heap   total 33554432K, used 26982419K [0x7f147800,
>>> 0x7f1478808000, 0x7f1c7800)
>>> region size 8192K, 204 young (1671168K), 26 survivors (212992K)
>>> Metaspace   used 41184K, capacity 41752K, committed 67072K, reserved
>>> 67584K
>>> 2018-02-27T21:43:01.793+0800: 4668016.044: [GC pause (G1 Evacuation
>> Pause)
>>> (young)
>>> Desired survivor size 109051904 bytes, new threshold 1 (max 15)
>>> - age   1:  113878760 bytes,  113878760 total
>>> - age   2:   21264744 bytes,  135143504 total
>>> - age   3:   17020096 bytes,  152163600 total
>>> - age   4:   26870864 bytes,  179034464 total
>>> , 0.0579794 secs]
>>>  [Parallel Time: 46.9 ms, GC Workers: 18]
>>> [GC Worker Start (ms): Min: 4668016046.1, Avg: 4668016046.3, Max:
>>> 4668016046.4, Diff: 0.3]
>>> [Ext Root Scanning (ms): Min: 2.4, Avg: 6.5, Max: 46.3, Diff: 43.9,
>>> Sum: 116.9]
>>> [Update RS (ms): Min: 0.0, Avg: 3.4, Max: 6.0, Diff: 6.0, Sum: 62.0]
>>>[Processed Buffers: Min: 0, Avg: 6.3, Max: 16, Diff: 16, Sum:
>> 113]
>>> [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.5]
>>> [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
>>> Sum: 0.0]
>>> [Object Copy (ms): Min: 0.1, Avg: 23.8, Max: 25.5, Diff: 25.5, Sum:
>>> 428.1]
>>> [Termination (ms): Min: 0.0, Avg: 12.7, Max: 13.5, Diff: 13.5, Sum:
>>> 228.9]
>>>[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum:
>> 18]
>>> [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, Sum:
>>> 1.2]
>>> [GC Worker Total (ms): Min: 46.4, Avg: 46.6, Max: 46.7, Diff: 0.3,
>>> Sum: 838.0]
>>> [GC Worker End (ms): Min: 4668016092.8, Avg: 4668016092.8, Max:
>>> 4668016092.8, Diff: 0.0]
>>>  [Code Root Fixup: 0.2 ms]
>>>  [Code Root Purge: 0.0 ms]
>>>  [Clear CT: 0.3 ms]
>>>  [Other: 10.7 ms]
>>> [Choose CSet: 0.0 ms]
>>> [Ref Proc: 5.9 ms]
>>> [Ref Enq: 0.2 ms]
>>> [Redirty Cards: 0.2 ms]
>>> [Humongous Register: 2.2 ms]
>>> [Humongous Reclaim: 0.4 ms]
>>> [Free CSet: 0.4 ms]
>>>  [Eden: 1424.0M(1424.0M)->0.0B(1552.0M) Survivors: 208.0M->80.0M Heap:
>>> 25.7G(32.0G)->24.3G(32.0G)]
>>> Heap after GC invocations=1144022 (full 72):
>>> garbage-first heap   total 33554432K, used 25489656K [0x7f147800,
>>> 0x7f1478808000, 0x7f1c7800)
>>> region size 8192K, 10 young