[jira] [Comment Edited] (SOLR-15210) ParallelStream should execute hashing & filtering directly in ExportWriter

2021-03-10 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298746#comment-17298746
 ] 

Andrzej Bialecki edited comment on SOLR-15210 at 3/10/21, 10:56 AM:


Attaching a patch \(!) while the repos are being transitioned.


was (Author: ab):
Attaching a patch (!) while the repos are being transitioned.

> ParallelStream should execute hashing & filtering directly in ExportWriter
> --
>
> Key: SOLR-15210
> URL: https://issues.apache.org/jira/browse/SOLR-15210
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-15210.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently ParallelStream uses {{HashQParserPlugin}} to partition the work 
> based on a hashed value of {{partitionKeys}}. Unfortunately, this filter has 
> a high initial runtime cost because it has to materialize all values of 
> {{partitionKeys}} on each worker in order to calculate their hash and decide 
> whether a particular doc belongs to the worker's partition.
> The alternative approach would be for the worker to collect and sort all 
> documents and only then filter out the ones that belong to the current 
> partition just before they are written out by {{ExportWriter}} - at this 
> point we have to materialize the fields anyway but also we can benefit from a 
> (minimal) BytesRef caching that the FieldWriters use. On the other hand we 
> pay the price of sorting all documents, and we also lose the query filter 
> caching that the {{HashQParserPlugin}} uses.
> This tradeoff is not obvious but should be investigated to see if it offers 
> better performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15210) ParallelStream should execute hashing & filtering directly in ExportWriter

2021-03-10 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298746#comment-17298746
 ] 

Andrzej Bialecki commented on SOLR-15210:
-

Attaching a patch (!) while the repos are being transitioned.

> ParallelStream should execute hashing & filtering directly in ExportWriter
> --
>
> Key: SOLR-15210
> URL: https://issues.apache.org/jira/browse/SOLR-15210
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-15210.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently ParallelStream uses {{HashQParserPlugin}} to partition the work 
> based on a hashed value of {{partitionKeys}}. Unfortunately, this filter has 
> a high initial runtime cost because it has to materialize all values of 
> {{partitionKeys}} on each worker in order to calculate their hash and decide 
> whether a particular doc belongs to the worker's partition.
> The alternative approach would be for the worker to collect and sort all 
> documents and only then filter out the ones that belong to the current 
> partition just before they are written out by {{ExportWriter}} - at this 
> point we have to materialize the fields anyway but also we can benefit from a 
> (minimal) BytesRef caching that the FieldWriters use. On the other hand we 
> pay the price of sorting all documents, and we also lose the query filter 
> caching that the {{HashQParserPlugin}} uses.
> This tradeoff is not obvious but should be investigated to see if it offers 
> better performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15210) ParallelStream should execute hashing & filtering directly in ExportWriter

2021-03-10 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15210:

Attachment: SOLR-15210.patch

> ParallelStream should execute hashing & filtering directly in ExportWriter
> --
>
> Key: SOLR-15210
> URL: https://issues.apache.org/jira/browse/SOLR-15210
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-15210.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently ParallelStream uses {{HashQParserPlugin}} to partition the work 
> based on a hashed value of {{partitionKeys}}. Unfortunately, this filter has 
> a high initial runtime cost because it has to materialize all values of 
> {{partitionKeys}} on each worker in order to calculate their hash and decide 
> whether a particular doc belongs to the worker's partition.
> The alternative approach would be for the worker to collect and sort all 
> documents and only then filter out the ones that belong to the current 
> partition just before they are written out by {{ExportWriter}} - at this 
> point we have to materialize the fields anyway but also we can benefit from a 
> (minimal) BytesRef caching that the FieldWriters use. On the other hand we 
> pay the price of sorting all documents, and we also lose the query filter 
> caching that the {{HashQParserPlugin}} uses.
> This tradeoff is not obvious but should be investigated to see if it offers 
> better performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15210) ParallelStream should execute hashing & filtering directly in ExportWriter

2021-03-10 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15210:

Attachment: SOLR-15210.patch

> ParallelStream should execute hashing & filtering directly in ExportWriter
> --
>
> Key: SOLR-15210
> URL: https://issues.apache.org/jira/browse/SOLR-15210
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently ParallelStream uses {{HashQParserPlugin}} to partition the work 
> based on a hashed value of {{partitionKeys}}. Unfortunately, this filter has 
> a high initial runtime cost because it has to materialize all values of 
> {{partitionKeys}} on each worker in order to calculate their hash and decide 
> whether a particular doc belongs to the worker's partition.
> The alternative approach would be for the worker to collect and sort all 
> documents and only then filter out the ones that belong to the current 
> partition just before they are written out by {{ExportWriter}} - at this 
> point we have to materialize the fields anyway but also we can benefit from a 
> (minimal) BytesRef caching that the FieldWriters use. On the other hand we 
> pay the price of sorting all documents, and we also lose the query filter 
> caching that the {{HashQParserPlugin}} uses.
> This tradeoff is not obvious but should be investigated to see if it offers 
> better performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15210) ParallelStream should execute hashing & filtering directly in ExportWriter

2021-03-10 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15210:

Attachment: (was: SOLR-15210.patch)

> ParallelStream should execute hashing & filtering directly in ExportWriter
> --
>
> Key: SOLR-15210
> URL: https://issues.apache.org/jira/browse/SOLR-15210
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently ParallelStream uses {{HashQParserPlugin}} to partition the work 
> based on a hashed value of {{partitionKeys}}. Unfortunately, this filter has 
> a high initial runtime cost because it has to materialize all values of 
> {{partitionKeys}} on each worker in order to calculate their hash and decide 
> whether a particular doc belongs to the worker's partition.
> The alternative approach would be for the worker to collect and sort all 
> documents and only then filter out the ones that belong to the current 
> partition just before they are written out by {{ExportWriter}} - at this 
> point we have to materialize the fields anyway but also we can benefit from a 
> (minimal) BytesRef caching that the FieldWriters use. On the other hand we 
> pay the price of sorting all documents, and we also lose the query filter 
> caching that the {{HashQParserPlugin}} uses.
> This tradeoff is not obvious but should be investigated to see if it offers 
> better performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15232) Add replica(s) as a part of node startup

2021-03-09 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15232:
---

 Summary: Add replica(s) as a part of node startup
 Key: SOLR-15232
 URL: https://issues.apache.org/jira/browse/SOLR-15232
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


In containerized environments it would make sense to be able to initialize a 
new node (pod) and designate it immediately to hold newly created replica(s) of 
specified collection/shard(s) once it's up and running.

Currently this is not easy to do, it requires the intervention of an external 
agent that additionally has to first check if the node is up, all of which 
makes the process needlessly complicated.

This functionality could be as simple as adding a command-line switch to 
{{bin/solr start}}, which would cause it to invoke appropriate ADDREPLICA 
commands once it verifies the node is up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14749) Provide a clean API for cluster-level event processing

2021-03-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298146#comment-17298146
 ] 

Andrzej Bialecki commented on SOLR-14749:
-

It looks like the underlying cause of the failures was the same as in 
SOLR-15122. I added a Phaser-based mechanism for tests to monitor the changes 
in configuration in the {{ContainerPluginsRegistry}}, similar to the one used 
in SOLR-15122.

I'm leaving this issue open to see if the fix works on jenkins (local beasting 
can't reproduce this failure).

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>  Time Spent: 22h
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-03-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17298116#comment-17298116
 ] 

Andrzej Bialecki commented on SOLR-15122:
-

[~mdrob] I think we can close this, the changes that you implemented seem to be 
working well.

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-12730) Implement staggered SPLITSHARD requests in IndexSizeTrigger

2021-03-09 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-12730.
-
Fix Version/s: (was: master (9.0))
   Resolution: Fixed

This has been fixed a long time ago (and subsequently removed from 9.0).

> Implement staggered SPLITSHARD requests in IndexSizeTrigger
> ---
>
> Key: SOLR-12730
> URL: https://issues.apache.org/jira/browse/SOLR-12730
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.1
>
>
> Simulated large scale tests uncovered an interesting scenario that occurs 
> also in real clusters where {{IndexSizeTrigger}} is used for controlling the 
> maximum shard size.
> As index size grows and the number of shards grows, if document assignment is 
> more or less even then at equal intervals (on a {{log2}} scale) there will be 
> an avalanche of SPLITSHARD operations, because all shards will reach the 
> critical size at approximately the same time.
> A hundred or more split shard operations running in parallel may severely 
> affect the cluster performance.
> One possible approach to reduce the likelihood of this situation is to split 
> shards not exactly in half but rather fudge the proportions around 60/40% in 
> a random sequence, so that the resulting sub-sub-sub…shards would reach the 
> thresholds at different times. This would require modifications to the 
> SPLITSHARD command to allow this randomization.
> Another approach would be to simply limit the maximum number of parallel 
> split shard operations. However, this would slow down the process of reaching 
> the balance (increase lag) and possibly violate other operational constraints 
> due to some shards waiting too long for the split and significantly exceeding 
> their max size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Reopened] (SOLR-14749) Provide a clean API for cluster-level event processing

2021-03-03 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reopened SOLR-14749:
-

Re-opening to fix the test failures.

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>  Time Spent: 22h
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15210) ParallelStream should execute hashing & filtering directly in ExportWriter

2021-03-02 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15210:
---

 Summary: ParallelStream should execute hashing & filtering 
directly in ExportWriter
 Key: SOLR-15210
 URL: https://issues.apache.org/jira/browse/SOLR-15210
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


Currently ParallelStream uses {{HashQParserPlugin}} to partition the work based 
on a hashed value of {{partitionKeys}}. Unfortunately, this filter has a high 
initial runtime cost because it has to materialize all values of 
{{partitionKeys}} on each worker in order to calculate their hash and decide 
whether a particular doc belongs to the worker's partition.

The alternative approach would be for the worker to collect and sort all 
documents and only then filter out the ones that belong to the current 
partition just before they are written out by {{ExportWriter}} - at this point 
we have to materialize the fields anyway but also we can benefit from a 
(minimal) BytesRef caching that the FieldWriters use. On the other hand we pay 
the price of sorting all documents, and we also lose the query filter caching 
that the {{HashQParserPlugin}} uses.

This tradeoff is not obvious but should be investigated to see if it offers 
better performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15209) Make the AffinityPlacementFactory the default placement plugin

2021-03-02 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15209:
---

 Summary: Make the AffinityPlacementFactory the default placement 
plugin
 Key: SOLR-15209
 URL: https://issues.apache.org/jira/browse/SOLR-15209
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki


Currently there's a lot of code in {{Assign}} dealing with the fact that we 
still support the old "legacy" replica assignment as well as the new 
plugin-based placement strategies.

Furthermore, the "legacy" assignment is now the default even though it's 
neither robust nor optimal, except in the very simple and small clusters. Also, 
providing another plugin-based placement as the default impl. runs into a small 
complication - in the absence of any plugin config the code reverts to the 
"legacy".

In order to promote the adoption of the new plugin-based placements we should 
make the {{AffinityPlacementFactory}} the new default placement strategy, 
selected when the explicit configuration is missing (and then create it as a 
default configuration in {{PlacementPluginFactoryLoader}}).

I propose to re-package the "legacy" strategy as a {{PlacementPluginFactory}} 
so that it can be configured in the same way as other placement plugins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15130) Allow per-collection replica placement node sets

2021-03-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15130.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> Allow per-collection replica placement node sets
> 
>
> Key: SOLR-15130
> URL: https://issues.apache.org/jira/browse/SOLR-15130
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is an extension of the existing {{replica_type}} concept in the 
> {{AffinityPlacementPlugin}}.
> Currently this concept allows users to distribute the placement of new 
> replicas by type (NRT, TLOG, PULL) if the target nodes specify the list of 
> allowed replica types that they accept. This can be easily extended to 
> support any other system property value that the node reports, and pair it 
> with any other collection property instead of replica type.
> The motivation for this is the use case where a cluster is logically divided 
> into nodes with different types of work load (eg. searching, indexing and 
> analytics). Currently it's not possible to configure the placement plugin in 
> a way that automatically puts some collections on specific node sets - 
> instead users would have to always specify the appropriate node set in every 
> CREATE / ADD / MOVE replica request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15131) Use collection properties for per-collection configuration of placement plugins

2021-02-25 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15131.
-
Resolution: Won't Fix

> Use collection properties for per-collection configuration of placement 
> plugins
> ---
>
> Key: SOLR-15131
> URL: https://issues.apache.org/jira/browse/SOLR-15131
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SOLR-15055 and SOLR-15130 implement per-collection behavior in the placement 
> plugins.
> In SOLR-15055 I decided to put this configuration in the plugin config 
> itself, using a {{withCollection}} property. The advantage of this approach 
> is that no other place in the code base knows about this configuration except 
> for the plugin itself.
> However, there are some disadvantages to it as well:
> * when collection is deleted it leaves the dangling bit of config in the 
> placement plugin config (an entry in {{withCollection}} that no longer refers 
> to any existing collection)
> * what's worse, when a new collection is created that uses the same name the 
> old config suddenly is applicable to the new collection, which is something 
> the user may not expected nor wanted.
> * the configuration of the plugin becomes more complicated if there are many 
> per-collection entries.
> The alternative approach is to keep these per-collection configuration bits 
> in the collection itself, using collection properties. The advantages are:
> * plugin configuration becomes very simple
> * when a collection is deleted the corresponding placement config parts are 
> deleted too (similarly as the "policy" property in 8x)
> The disadvantages of this approach are:
> * collection configuration exposes bits of the plugin configuration
> * when the placement plugin is changed (eg. a different one is configured) 
> the old pieces of config still remain in the collection properties and may 
> interfere with the new plugin config.
> I'm open to suggestions which way is the "more proper" way to address this 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15131) Use collection properties for per-collection configuration of placement plugins

2021-02-25 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17290812#comment-17290812
 ] 

Andrzej Bialecki commented on SOLR-15131:
-

On further investigation and having implemented a POC it looks like using 
collection properties would lead to poor efficiency for some tasks (like 
tracking the reverse mapping of {{withCollection}}.) so I'm closing this issue 
as Won't Fix.

> Use collection properties for per-collection configuration of placement 
> plugins
> ---
>
> Key: SOLR-15131
> URL: https://issues.apache.org/jira/browse/SOLR-15131
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SOLR-15055 and SOLR-15130 implement per-collection behavior in the placement 
> plugins.
> In SOLR-15055 I decided to put this configuration in the plugin config 
> itself, using a {{withCollection}} property. The advantage of this approach 
> is that no other place in the code base knows about this configuration except 
> for the plugin itself.
> However, there are some disadvantages to it as well:
> * when collection is deleted it leaves the dangling bit of config in the 
> placement plugin config (an entry in {{withCollection}} that no longer refers 
> to any existing collection)
> * what's worse, when a new collection is created that uses the same name the 
> old config suddenly is applicable to the new collection, which is something 
> the user may not expected nor wanted.
> * the configuration of the plugin becomes more complicated if there are many 
> per-collection entries.
> The alternative approach is to keep these per-collection configuration bits 
> in the collection itself, using collection properties. The advantages are:
> * plugin configuration becomes very simple
> * when a collection is deleted the corresponding placement config parts are 
> deleted too (similarly as the "policy" property in 8x)
> The disadvantages of this approach are:
> * collection configuration exposes bits of the plugin configuration
> * when the placement plugin is changed (eg. a different one is configured) 
> the old pieces of config still remain in the collection properties and may 
> interfere with the new plugin config.
> I'm open to suggestions which way is the "more proper" way to address this 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-02-15 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284648#comment-17284648
 ] 

Andrzej Bialecki commented on SOLR-15056:
-

[~wunder] can you please create a pull request? this is a sizeable patch and it 
would make it easier for someone to review / discuss the changes.

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Assignee: Atri Sharma
>Priority: Major
>  Labels: Metrics
> Attachments: 
> 0001-SOLR-15056-Circuit-Breakers-use-CPU-utilization-inst.patch, 
> 0002-SOLR-15056-clean-up-linkage-to-SolrCore-add-back-loa.patch, 
> SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14234) Unhelpful message in RemoteExecutionException

2021-02-10 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14234.
-
Fix Version/s: 8.9
   Resolution: Fixed

> Unhelpful message in RemoteExecutionException
> -
>
> Key: SOLR-14234
> URL: https://issues.apache.org/jira/browse/SOLR-14234
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.9
>
> Attachments: exception.patch
>
>
> Details of this exception are unusually passed here not in the message or 
> {{getCause()}} but instead they are passed in a separate field {{meta}}.
> The problem is that in many contexts where {{toString()}} is used these 
> details are completely ignored, which produces very confusing and incomplete 
> messages, like this:
> {code}
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteExecutionException: 
> Error from server at https://127.0.0.1:59006/solr: Error in command payload
>   at 
> __randomizedtesting.SeedInfo.seed([8AD470708D05DCDB:EF407342F1EBA057]:0)
>   at 
> org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteExecutionException.create(BaseHttpSolrClient.java:66)
> {code}
> I propose to add the details to the message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-9854) Collect metrics for index merges and index store IO

2021-02-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-9854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281682#comment-17281682
 ] 

Andrzej Bialecki commented on SOLR-9854:


Metrics Counter can only go forward but these integers must be able to go both 
ways because they represent the number of *currently* running merges (and the 
current number of docs / segments involved in the running merges), which 
naturally may vary from 0 to N.

> Collect metrics for index merges and index store IO
> ---
>
> Key: SOLR-9854
> URL: https://issues.apache.org/jira/browse/SOLR-9854
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 6.4, 7.0
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 6.4, 7.0
>
> Attachments: SOLR-9854.patch, SOLR-9854.patch
>
>
> Using API for metrics management developed in SOLR-4735 we should also start 
> collecting metrics for major aspects of {{IndexWriter}} operation, such as 
> read / write IO rates, number of minor and major merges and IO during these 
> operations, etc.
> This will provide a better insight into resource consumption and load at the 
> IO level.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9406) Make it simpler to track IndexWriter's events

2021-02-04 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278726#comment-17278726
 ] 

Andrzej Bialecki commented on LUCENE-9406:
--

FWIW {{SolrIndexWriter}} uses already available methods for collecting merge 
metrics. However, this required subclassing so if we can add a listener-like 
interface then it's even better.

> Make it simpler to track IndexWriter's events
> -
>
> Key: LUCENE-9406
> URL: https://issues.apache.org/jira/browse/LUCENE-9406
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
>
> This is the second spinoff from a [controversial PR to add a new index-time 
> feature to Lucene to merge small segments during 
> commit|https://github.com/apache/lucene-solr/pull/1552].  That change can 
> substantially reduce the number of small index segments to search.
> In that PR, there was a new proposed interface, {{IndexWriterEvents}}, giving 
> the application a chance to track when {{IndexWriter}} kicked off merges 
> during commit, how many, how long it waited, how often it gave up waiting, 
> etc.
> Such telemetry from production usage is really helpful when tuning settings 
> like which merges (e.g. a size threshold) to attempt on commit, and how long 
> to wait during commit, etc.
> I am splitting out this issue to explore possible approaches to do this.  
> E.g. [~simonw] proposed using a statistics class instead, but if I understood 
> that correctly, I think that would put the role of aggregation inside 
> {{IndexWriter}}, which is not ideal.
> Many interesting events, e.g. how many merges are being requested, how large 
> are they, how long did they take to complete or fail, etc., can be gleaned by 
> wrapping expert Lucene classes like {{MergePolicy}} and {{MergeScheduler}}.  
> But for those events that cannot (e.g. {{IndexWriter}} stopped waiting for 
> merges during commit), it would be very helpful to have some simple way to 
> track so applications can better tune.
> It is also possible to subclass {{IndexWriter}} and override key methods, but 
> I think that is inherently risky as {{IndexWriter}}'s protected methods are 
> not considered to be a stable API, and the synchronization used by 
> {{IndexWriter}} is confusing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14234) Unhelpful message in RemoteExecutionException

2021-02-03 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-14234:
---

Assignee: Andrzej Bialecki

> Unhelpful message in RemoteExecutionException
> -
>
> Key: SOLR-14234
> URL: https://issues.apache.org/jira/browse/SOLR-14234
> Project: Solr
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: exception.patch
>
>
> Details of this exception are unusually passed here not in the message or 
> {{getCause()}} but instead they are passed in a separate field {{meta}}.
> The problem is that in many contexts where {{toString()}} is used these 
> details are completely ignored, which produces very confusing and incomplete 
> messages, like this:
> {code}
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteExecutionException: 
> Error from server at https://127.0.0.1:59006/solr: Error in command payload
>   at 
> __randomizedtesting.SeedInfo.seed([8AD470708D05DCDB:EF407342F1EBA057]:0)
>   at 
> org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteExecutionException.create(BaseHttpSolrClient.java:66)
> {code}
> I propose to add the details to the message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15131) Use collection properties for per-collection configuration of placement plugins

2021-02-03 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15131:
---

 Summary: Use collection properties for per-collection 
configuration of placement plugins
 Key: SOLR-15131
 URL: https://issues.apache.org/jira/browse/SOLR-15131
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


SOLR-15055 and SOLR-15130 implement per-collection behavior in the placement 
plugins.

In SOLR-15055 I decided to put this configuration in the plugin config itself, 
using a {{withCollection}} property. The advantage of this approach is that no 
other place in the code base knows about this configuration except for the 
plugin itself.

However, there are some disadvantages to it as well:
* when collection is deleted it leaves the dangling bit of config in the 
placement plugin config (an entry in {{withCollection}} that no longer refers 
to any existing collection)
* what's worse, when a new collection is created that uses the same name the 
old config suddenly is applicable to the new collection, which is something the 
user may not expected nor wanted.
* the configuration of the plugin becomes more complicated if there are many 
per-collection entries.

The alternative approach is to keep these per-collection configuration bits in 
the collection itself, using collection properties. The advantages are:
* plugin configuration becomes very simple
* when a collection is deleted the corresponding placement config parts are 
deleted too (similarly as the "policy" property in 8x)

The disadvantages of this approach are:
* collection configuration exposes bits of the plugin configuration
* when the placement plugin is changed (eg. a different one is configured) the 
old pieces of config still remain in the collection properties and may 
interfere with the new plugin config.

I'm open to suggestions which way is the "more proper" way to address this 
issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15130) Allow per-collection replica placement node sets

2021-02-03 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15130:
---

 Summary: Allow per-collection replica placement node sets
 Key: SOLR-15130
 URL: https://issues.apache.org/jira/browse/SOLR-15130
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


This is an extension of the existing {{replica_type}} concept in the 
{{AffinityPlacementPlugin}}.

Currently this concept allows users to distribute the placement of new replicas 
by type (NRT, TLOG, PULL) if the target nodes specify the list of allowed 
replica types that they accept. This can be easily extended to support any 
other system property value that the node reports, and pair it with any other 
collection property instead of replica type.

The motivation for this is the use case where a cluster is logically divided 
into nodes with different types of work load (eg. searching, indexing and 
analytics). Currently it's not possible to configure the placement plugin in a 
way that automatically puts some collections on specific node sets - instead 
users would have to always specify the appropriate node set in every CREATE / 
ADD / MOVE replica request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277044#comment-17277044
 ] 

Andrzej Bialecki commented on SOLR-15122:
-

I'll leave this open to see if the fix works.

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15068) RefGuide documentation for replica placement plugins

2021-02-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15068.
-
Resolution: Fixed

> RefGuide documentation for replica placement plugins
> 
>
> Key: SOLR-15068
> URL: https://issues.apache.org/jira/browse/SOLR-15068
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-15122:
---

Assignee: Andrzej Bialecki

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-02 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15122:

Fix Version/s: master (9.0)

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15122) ClusterEventProducerTest.testEvents is unstable

2021-02-01 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276255#comment-17276255
 ] 

Andrzej Bialecki commented on SOLR-15122:
-

Hmm. I suspect this is caused by the plugin config update being sent to ZK via 
different node than the Overseer, and then the node going down before the 
Overseer node synchronizes with ZK. I'll work on a fix.

> ClusterEventProducerTest.testEvents is unstable
> ---
>
> Key: SOLR-15122
> URL: https://issues.apache.org/jira/browse/SOLR-15122
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Tests
>Reporter: Mike Drob
>Priority: Major
>
> This test looks to be unstable according to Jenkins since about Nov 5. I just 
> started seeing occasional failures locally when running the whole suite but 
> cannot reproduce when running in isolation.
> https://lists.apache.org/thread.html/rf0c16b257bc3236ea414be51451806352b55f15d4949f4fd54a3b71a%40%3Cbuilds.lucene.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15068) RefGuide documentation for replica placement plugins

2021-01-26 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15068:

Fix Version/s: master (9.0)

> RefGuide documentation for replica placement plugins
> 
>
> Key: SOLR-15068
> URL: https://issues.apache.org/jira/browse/SOLR-15068
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15055) Re-implement 'withCollection'

2021-01-26 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15055.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> Re-implement 'withCollection'
> -
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
>  * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
>  * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.
>  
> Edit: {{maxShardsPerNode}} constraint will be addressed in another issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15055) Re-implement 'withCollection'

2021-01-26 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15055:

Summary: Re-implement 'withCollection'  (was: Re-implement 'withCollection' 
and 'maxShardsPerNode')

> Re-implement 'withCollection'
> -
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15055) Re-implement 'withCollection'

2021-01-26 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15055:

Description: 
Solr 8x replica placement provided two settings that are very useful in certain 
scenarios:
 * {{withCollection}} constraint specified that replicas should be placed on 
the same nodes where replicas of another collection are located. In the 8x 
implementation this was limited in practice to co-locating single-shard 
secondary collections used for joins or other lookups from the main collection 
(which could be multi-sharded).

 * {{maxShardsPerNode}} - this constraint specified the maximum number of 
replicas per shard that can be placed on the same node. In most scenarios this 
was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica of any 
given shard would be placed on any given node). Changing this constraint to 
values > 1 would reduce fault-tolerance but may be desired in test setups or as 
a temporary relief measure.

 

Both these constraints are collection-specific so they should be configured 
e.g. as collection properties.

 

Edit: {{maxShardsPerNode}} constraint will be addressed in another issue.

  was:
Solr 8x replica placement provided two settings that are very useful in certain 
scenarios:

* {{withCollection}} constraint specified that replicas should be placed on the 
same nodes where replicas of another collection are located. In the 8x 
implementation this was limited in practice to co-locating single-shard 
secondary collections used for joins or other lookups from the main collection 
(which could be multi-sharded).

* {{maxShardsPerNode}} - this constraint specified the maximum number of 
replicas per shard that can be placed on the same node. In most scenarios this 
was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica of any 
given shard would be placed on any given node). Changing this constraint to 
values > 1 would reduce fault-tolerance but may be desired in test setups or as 
a temporary relief measure.

 

Both these constraints are collection-specific so they should be configured 
e.g. as collection properties.


> Re-implement 'withCollection'
> -
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
>  * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
>  * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.
>  
> Edit: {{maxShardsPerNode}} constraint will be addressed in another issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15076) Inconsistent metric types in ReplicationHandler

2021-01-25 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15076.
-
Resolution: Fixed

> Inconsistent metric types in ReplicationHandler
> ---
>
> Key: SOLR-15076
> URL: https://issues.apache.org/jira/browse/SOLR-15076
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.9
>
>
> As pointed out by [~dsmiley] in SOLR-14924 there are cases when 
> ReplicaHandler returns unexpected type of a metric (string instead of a 
> number):
> {quote}
> There are test failures in TestReplicationHandler introduced by this change 
> (I think). See 
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.String
>  at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0)
>  at 
> org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361)
> The test could be made to convert to a string. But it suggests an 
> inconsistency that ought to be fixed – apparently ReplicationHandler 
> sometimes returns its details using all strings and othertimes with the typed 
> variants – and that's bad.
> {quote}
> Reproducing seed from David:
> {quote}
> gradlew :solr:core:test --tests 
> "org.apache.solr.handler.TestReplicationHandler.doTestDetails" -Ptests.jvms=6 
> -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=5135EC61BF449203 
> -Ptests.file.encoding=ISO-8859-1
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15094) Replace all code references of coreNodeName to replicaName

2021-01-21 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17269391#comment-17269391
 ] 

Andrzej Bialecki commented on SOLR-15094:
-

Yes, please! this was always very confusing.

We can also change the message key in 9.0.

> Replace all code references of coreNodeName to replicaName
> --
>
> Key: SOLR-15094
> URL: https://issues.apache.org/jira/browse/SOLR-15094
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>
> This variable is too confusing. I know that we can't change the actual key 
> name in messages because of backward incompatibility , but we will change the 
> internal variables/method names 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15076) Inconsistent metric types in ReplicationHandler

2021-01-21 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15076:

Fix Version/s: 8.9

> Inconsistent metric types in ReplicationHandler
> ---
>
> Key: SOLR-15076
> URL: https://issues.apache.org/jira/browse/SOLR-15076
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.9
>
>
> As pointed out by [~dsmiley] in SOLR-14924 there are cases when 
> ReplicaHandler returns unexpected type of a metric (string instead of a 
> number):
> {quote}
> There are test failures in TestReplicationHandler introduced by this change 
> (I think). See 
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.String
>  at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0)
>  at 
> org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361)
> The test could be made to convert to a string. But it suggests an 
> inconsistency that ought to be fixed – apparently ReplicationHandler 
> sometimes returns its details using all strings and othertimes with the typed 
> variants – and that's bad.
> {quote}
> Reproducing seed from David:
> {quote}
> gradlew :solr:core:test --tests 
> "org.apache.solr.handler.TestReplicationHandler.doTestDetails" -Ptests.jvms=6 
> -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=5135EC61BF449203 
> -Ptests.file.encoding=ISO-8859-1
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-20 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268615#comment-17268615
 ] 

Andrzej Bialecki commented on SOLR-15055:
-

[~ichattopadhyaya] verification of co-location at query time has several 
drawbacks:
 * it doesn't guarantee that the request will be handled at all, maybe never
 * it puts the burden on the operator to maintain the colocation, without any 
help from Solr. ADDREPLICAs for the primary collection may put replicas 
anywhere and the operator then has to discover the placement and "chase" it 
with the secondary replicas - and even then this opens a time window when 
requests can be denied

Regarding the complexity of the PR:
 * please note that some of the unrelated files are modified because I moved 
{{PROPERTY_PREFIX}} to a better location.
 * the actual modifications in collection Cmd-s are minimal - it's true that I 
did some refactoring in {{DeleteReplicaCmd}} for clarity but the substantial 
changes are just a few lines long.
 * the changes in the {{placement.*}} package are to enable placement plugins 
to have some control of operations other than ADDREPLICA. Sooner or later this 
functionality would be needed anyway.

So, if you disregard the added functionality in the placement plugins, the 
overall changes to support {{withCollection}} are limited just to the 
{{AffinityPlacementFactory}} plugin implementation, and the rest of Solr code 
has no idea that this functionality is supported. IMHO this is a good level of 
separation from core and encapsulation of optional capability.

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-13 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264023#comment-17264023
 ] 

Andrzej Bialecki commented on SOLR-15055:
-

>From the Slack discussions on #solr-dev it looks like the least intrusive 
>option for now is to provide a way for the placement plugins to veto 
>collection layout changes (adding / removing replicas and shards) if they 
>would violate the constraint, and delegate the responsibility to meet the 
>constraint to the operator (by manually adding necessary number of secondary 
>replicas, or manually removing them first from the nodes where the primary 
>replicas are to be deleted).

Other options would introduce a lot of complexity to the existing collection 
admin commands.

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-13 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264021#comment-17264021
 ] 

Andrzej Bialecki commented on SOLR-15056:
-

{quote}Under 100% CPU, load average doesn't tell us much, but CPU usage is very 
useful. Over 100% CPU, CPU utilization doesn't tell us much, but load average 
tells us a lot. It tells us how much work is waiting to run.
{quote}
Well said! Indeed these are two very different metrics, and having two separate 
breaker implementations is not a bad idea (well, we could use one impl + a 
switch, but that could be too confusing and too easy to make mistakes).

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>  Labels: Metrics
> Attachments: SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-12 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263548#comment-17263548
 ] 

Andrzej Bialecki commented on SOLR-15056:
-

...there's always this: if you don't know what to choose then make it an option 
;)

Seriously, though - Walter is correct that the regular sysLoadAvg is unbounded, 
it may reach eg. 10's or even 100's, so it's difficult to properly adjust the 
threshold.

{{getSystemCpuLoad}} is supported on Oracle/OpenJDK, Amazon Coretto and IBM J9, 
I'm not sure about Zulu but I would guess it's supported there too, so indeed 
it looks like a good default with predictable range. However, we could provide 
an option to use sysLoadAvg if that's what the user prefers (or if 
systemCpuLoad is not available) - it just needs a few if-else statements, but 
most of all it needs SOLID documentation.

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>  Labels: Metrics
> Attachments: SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-12 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17263224#comment-17263224
 ] 

Andrzej Bialecki commented on SOLR-15056:
-

This change is not needed and it makes the API ugly and more difficult to unit 
test:
{code:java}
-  public boolean checkAnyTripped() {
+  public boolean checkAnyTripped(SolrCore core) {
 {code}
The {{CircuitBreakerManager}} is created in {{SolrCore}} so its life-cycle is 
the same as SolrCore. This API misleadingly suggests that you could use it with 
any arbitrary SolrCore.

Instead, the manager can already hold onto a reference to SolrCore when its 
created, and use it to construct specific breaker implementations - in this 
case, you could pass SolrCore to the constructor of CPUCircuitBreaker, or make 
all classes that need it implement {{SolrCoreAware}} and initialize them in the 
manager's ctor in a uniform fashion.

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>  Labels: Metrics
> Attachments: SOLR-15056.patch
>
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262624#comment-17262624
 ] 

Andrzej Bialecki commented on SOLR-15055:
-

The requirement #1 from the list above can be implemented just in the placement 
plugin (filter candidate nodes to ensure that they contain at least one 
secondary replica). However, other requirements need modifications to 
{{DeleteReplicaCmd .}}

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262615#comment-17262615
 ] 

Andrzej Bialecki edited comment on SOLR-15055 at 1/11/21, 12:47 PM:


So, let's take a step back, and start with use-cases and requirements.

Cross-collection joins is the main use-case because it benefits greatly from 
co-location of replicas for the primary (search) and secondary (join) 
collections.

 1. when creating the primary collection (or adding new replicas to it) the 
placement plugin must take this into account and enforce the co-location of new 
primary replicas if possible.

* in 8x the framework would automatically create secondary replicas as 
necessary to satisfy this requirement (at the cost of code complexity).
* now at minimum the placement plugin should fail if it's not possible to 
satisfy this constraint with the current cluster layout, because then some 
primary replicas would have to use secondary replicas from other nodes and this 
would greatly (and unevenly) increase the query latency.
* if the placement of the primary collection fails because this constraint 
cannot be satisfied (there are too few secondary replicas) then the operator 
must manually add as many secondary replicas as needed. This is the weakness of 
this minimal approach - it would be better to somehow automate it to eliminate 
the manual intervention.

 2. removal of secondary replicas should be prevented if they are actively used 
under this constraint by any primary replicas on a specific node - allowing 
this would cause performance issues as described above.
 3. removal of primary replicas should cause the secondary replicas to be 
removed as well, if they are no longer in use on a specific node - but ensuring 
that at least N replicas of the secondary collection remain (to prevent data 
loss).


was (Author: ab):
So, let's take a step back, and start with use-cases and requirements:
* cross-collection joins is the main use-case because it benefits greatly from 
co-location of replicas for the primary (search) and secondary (join) 
collections.
* when creating the primary collection (or adding new replicas to it) the 
placement plugin must take this into account and enforce the co-location of new 
primary replicas if possible.
** in 8x the framework would automatically create secondary replicas as 
necessary to satisfy this requirement (at the cost of code complexity).
** now at minimum the placement plugin should fail if it's not possible to 
satisfy this constraint with the current cluster layout, because then some 
primary replicas would have to use secondary replicas from other nodes and this 
would greatly (and unevenly) increase the query latency.
** if the placement of the primary collection fails because this constraint 
cannot be satisfied (there are too few secondary replicas) then the operator 
must manually add as many secondary replicas as needed. This is the weakness of 
this minimal approach - it would be better to somehow automate it to eliminate 
the manual intervention. 
* removal of secondary replicas should be prevented if they are actively used 
under this constraint by any primary replicas on a specific node - allowing 
this would cause performance issues as described above.
* removal of primary replicas should cause the secondary replicas to be removed 
as well, if they are no longer in use on a specific node - but ensuring that at 
least N replicas of the secondary collection remain (to prevent data loss).

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a 

[jira] [Commented] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262615#comment-17262615
 ] 

Andrzej Bialecki commented on SOLR-15055:
-

So, let's take a step back, and start with use-cases and requirements:
* cross-collection joins is the main use-case because it benefits greatly from 
co-location of replicas for the primary (search) and secondary (join) 
collections.
* when creating the primary collection (or adding new replicas to it) the 
placement plugin must take this into account and enforce the co-location of new 
primary replicas if possible.
** in 8x the framework would automatically create secondary replicas as 
necessary to satisfy this requirement (at the cost of code complexity).
** now at minimum the placement plugin should fail if it's not possible to 
satisfy this constraint with the current cluster layout, because then some 
primary replicas would have to use secondary replicas from other nodes and this 
would greatly (and unevenly) increase the query latency.
** if the placement of the primary collection fails because this constraint 
cannot be satisfied (there are too few secondary replicas) then the operator 
must manually add as many secondary replicas as needed. This is the weakness of 
this minimal approach - it would be better to somehow automate it to eliminate 
the manual intervention. 
* removal of secondary replicas should be prevented if they are actively used 
under this constraint by any primary replicas on a specific node - allowing 
this would cause performance issues as described above.
* removal of primary replicas should cause the secondary replicas to be removed 
as well, if they are no longer in use on a specific node - but ensuring that at 
least N replicas of the secondary collection remain (to prevent data loss).

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262521#comment-17262521
 ] 

Andrzej Bialecki edited comment on SOLR-15056 at 1/11/21, 9:36 AM:
---

The equivalent Java code would be something like this:

{code}
coreContainer
  .getSolrMetricManager()
  .registry(“solr.jvm”)
  .getMetrics()
  .get(“os.systemCpuLoad”)
{code}


was (Author: ab):
The equivalent Java code would be something like this:

{code}
corerContainer
  .getSolrMetricManager()
  .registry(“solr.jvm”)
  .getMetrics()
  .get(“os.systemCpuLoad”)
{code}

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262521#comment-17262521
 ] 

Andrzej Bialecki commented on SOLR-15056:
-

The equivalent Java code would be something like this:

{code}
corerContainer
  .getSolrMetricManager()
  .registry(“solr.jvm”)
  .getMetrics()
  .get(“os.systemCpuLoad”)
{code}

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262468#comment-17262468
 ] 

Andrzej Bialecki edited comment on SOLR-15056 at 1/11/21, 8:12 AM:
---

{{systemCpuLoad}} is already supported and returned as one of the metrics. This 
comes from the (somewhat convoluted) code in {{MetricUtils.addMxBeanMetrics}} 
where it tries to use all known implementations and accumulates any unique bean 
properties that they expose.

For example:
{code}
http://localhost:8983/solr/admin/metrics?group=jvm=os

{
"responseHeader": {
"status": 0,
"QTime": 1
},
"metrics": {
"solr.jvm": {
"os.arch": "x86_64",
"os.availableProcessors": 12,
"os.committedVirtualMemorySize": 8402419712,
"os.freePhysicalMemorySize": 41504768,
"os.freeSwapSpaceSize": 804519936,
"os.maxFileDescriptorCount": 8192,
"os.name": "Mac OS X",
"os.openFileDescriptorCount": 195,
"os.processCpuLoad": 0.0017402379609634876,
"os.processCpuTime": 1049201,
"os.systemCpuLoad": 0.1268950796343933,
"os.systemLoadAverage": 4.00439453125,
"os.totalPhysicalMemorySize": 34359738368,
"os.totalSwapSpaceSize": 7516192768,
"os.version": "10.16"
}
}
}
{code}


was (Author: ab):
{{systtemCpuLoad}} is already supported and returned as one of the metrics. 
This comes from the (somewhat convoluted) code in 
{{MetricUtils.addMxBeanMetrics}} where it tries to use all known 
implementations and accumulates any unique bean properties that they expose.

For example:
{code}
http://localhost:8983/solr/admin/metrics?group=jvm=os

{
"responseHeader": {
"status": 0,
"QTime": 1
},
"metrics": {
"solr.jvm": {
"os.arch": "x86_64",
"os.availableProcessors": 12,
"os.committedVirtualMemorySize": 8402419712,
"os.freePhysicalMemorySize": 41504768,
"os.freeSwapSpaceSize": 804519936,
"os.maxFileDescriptorCount": 8192,
"os.name": "Mac OS X",
"os.openFileDescriptorCount": 195,
"os.processCpuLoad": 0.0017402379609634876,
"os.processCpuTime": 1049201,
"os.systemCpuLoad": 0.1268950796343933,
"os.systemLoadAverage": 4.00439453125,
"os.totalPhysicalMemorySize": 34359738368,
"os.totalSwapSpaceSize": 7516192768,
"os.version": "10.16"
}
}
}
{code}

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To 

[jira] [Commented] (SOLR-15056) CPU circuit breaker needs to use CPU utilization, not Unix load average

2021-01-11 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262468#comment-17262468
 ] 

Andrzej Bialecki commented on SOLR-15056:
-

{{systtemCpuLoad}} is already supported and returned as one of the metrics. 
This comes from the (somewhat convoluted) code in 
{{MetricUtils.addMxBeanMetrics}} where it tries to use all known 
implementations and accumulates any unique bean properties that they expose.

For example:
{code}
http://localhost:8983/solr/admin/metrics?group=jvm=os

{
"responseHeader": {
"status": 0,
"QTime": 1
},
"metrics": {
"solr.jvm": {
"os.arch": "x86_64",
"os.availableProcessors": 12,
"os.committedVirtualMemorySize": 8402419712,
"os.freePhysicalMemorySize": 41504768,
"os.freeSwapSpaceSize": 804519936,
"os.maxFileDescriptorCount": 8192,
"os.name": "Mac OS X",
"os.openFileDescriptorCount": 195,
"os.processCpuLoad": 0.0017402379609634876,
"os.processCpuTime": 1049201,
"os.systemCpuLoad": 0.1268950796343933,
"os.systemLoadAverage": 4.00439453125,
"os.totalPhysicalMemorySize": 34359738368,
"os.totalSwapSpaceSize": 7516192768,
"os.version": "10.16"
}
}
}
{code}

> CPU circuit breaker needs to use CPU utilization, not Unix load average
> ---
>
> Key: SOLR-15056
> URL: https://issues.apache.org/jira/browse/SOLR-15056
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 8.7
>Reporter: Walter Underwood
>Priority: Major
>
> The config range, 50% to 95%, assumes that the circuit breaker is triggered 
> by a CPU utilization metric that goes from 0% to 100%. But the code uses the 
> metric OperatingSystemMXBean.getSystemLoadAverage(). That is an average of 
> the count of processes waiting to run. It is effectively unbounded. I've seen 
> it as high as 50 to 100. It is not bound by 1.0 (100%).
> A good limit for load average would need to be aware of the number of CPUs 
> available to the JVM. A load average of 8 is no problem for a 32 CPU host. It 
> is a critical situation for a 2 CPU host.
> Also, load average is a Unix OS metric. I don't know if it is even available 
> on Windows.
> Instead, use a CPU utilization metric that goes from 0.0 to 1.0. A good 
> choice is OperatingSystemMXBean.getSystemCPULoad(). This name also uses 
> "load", but it is a usage metric.
> From the Javadoc:
> > Returns the "recent cpu usage" for the whole system. This value is a double 
> >in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle 
> >during the recent period of time observed, while a value of 1.0 means that 
> >all CPUs were actively running 100% of the time during the recent period 
> >being observed. All values betweens 0.0 and 1.0 are possible depending of 
> >the activities going on in the system. If the system recent cpu usage is not 
> >available, the method returns a negative value.
> https://docs.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/OperatingSystemMXBean.html#getSystemCpuLoad()
> Also update the documentation to explain which JMX metrics are used for the 
> memory and CPU circuit breakers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-10 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17262452#comment-17262452
 ] 

Andrzej Bialecki commented on SOLR-15055:
-

[~mdrob] in 8x `maxShardsPerNode` was totally broken and didn't do what you 
would expect it to. It didn't actually affect the placement at all - it was 
only a crude initial check to see if the total number of replicas was larger 
than the number of nodes x maxShardsPerNode, regardless of where the placements 
would be. That's why almost all tests would set it to -1 or a large number, 
simply to bypass this check.

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15076) Inconsistent metric types in ReplicationHandler

2021-01-07 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15076:

Description: 
As pointed out by [~dsmiley] in SOLR-14924 there are cases when ReplicaHandler 
returns unexpected type of a metric (string instead of a number):
{quote}
There are test failures in TestReplicationHandler introduced by this change (I 
think). See 
https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/

java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.String
 at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0)
 at 
org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361)
The test could be made to convert to a string. But it suggests an inconsistency 
that ought to be fixed – apparently ReplicationHandler sometimes returns its 
details using all strings and othertimes with the typed variants – and that's 
bad.
{quote}

Reproducing seed from David:
{quote}
gradlew :solr:core:test --tests 
"org.apache.solr.handler.TestReplicationHandler.doTestDetails" -Ptests.jvms=6 
-Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=5135EC61BF449203 
-Ptests.file.encoding=ISO-8859-1
{quote}

  was:
As pointed out by [~dsmiley] in SOLR-14924 there are cases when ReplicaHandler 
returns unexpected type of a metric (string instead of a number):
{quote}
There are test failures in TestReplicationHandler introduced by this change (I 
think). See 
https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/

java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.String
 at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0)
 at 
org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361)
The test could be made to convert to a string. But it suggests an inconsistency 
that ought to be fixed – apparently ReplicationHandler sometimes returns its 
details using all strings and othertimes with the typed variants – and that's 
bad.
{quote}


> Inconsistent metric types in ReplicationHandler
> ---
>
> Key: SOLR-15076
> URL: https://issues.apache.org/jira/browse/SOLR-15076
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> As pointed out by [~dsmiley] in SOLR-14924 there are cases when 
> ReplicaHandler returns unexpected type of a metric (string instead of a 
> number):
> {quote}
> There are test failures in TestReplicationHandler introduced by this change 
> (I think). See 
> https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.String
>  at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0)
>  at 
> org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361)
> The test could be made to convert to a string. But it suggests an 
> inconsistency that ought to be fixed – apparently ReplicationHandler 
> sometimes returns its details using all strings and othertimes with the typed 
> variants – and that's bad.
> {quote}
> Reproducing seed from David:
> {quote}
> gradlew :solr:core:test --tests 
> "org.apache.solr.handler.TestReplicationHandler.doTestDetails" -Ptests.jvms=6 
> -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=5135EC61BF449203 
> -Ptests.file.encoding=ISO-8859-1
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15076) Inconsistent metric types in ReplicationHandler

2021-01-07 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15076:
---

 Summary: Inconsistent metric types in ReplicationHandler
 Key: SOLR-15076
 URL: https://issues.apache.org/jira/browse/SOLR-15076
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


As pointed out by [~dsmiley] in SOLR-14924 there are cases when ReplicaHandler 
returns unexpected type of a metric (string instead of a number):
{quote}
There are test failures in TestReplicationHandler introduced by this change (I 
think). See 
https://ci-builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1255/

java.lang.ClassCastException: java.lang.Integer cannot be cast to 
java.lang.String
 at __randomizedtesting.SeedInfo.seed([754427253A1E4E95:F190450AC46671D]:0)
 at 
org.apache.solr.handler.TestReplicationHandler.doTestDetails(TestReplicationHandler.java:361)
The test could be made to convert to a string. But it suggests an inconsistency 
that ought to be fixed – apparently ReplicationHandler sometimes returns its 
details using all strings and othertimes with the typed variants – and that's 
bad.
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15068) RefGuide documentation for replica placement plugins

2021-01-05 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15068:
---

 Summary: RefGuide documentation for replica placement plugins
 Key: SOLR-15068
 URL: https://issues.apache.org/jira/browse/SOLR-15068
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2021-01-04 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258370#comment-17258370
 ] 

Andrzej Bialecki commented on SOLR-15055:
-

Additional notes on how {{withCollection}} was implemented in 8x.

Let's first establish the naming:
 * collection A (primary) is the one that wants the other collection to be 
always co-located with it, eg. to implement faster cross-collection joins.
 * collection B (secondary) is an auxiliary collection that is used by 
collection A (primary). In 8x this collection had to be single-sharded.

In 8x collection A can be marked (by setting a collection property) as 
{{withCollection: B}}. Collection B must already exist. This constraint causes 
all ADDREPLICA commands for the collection A (including its initial creation) 
to also automatically invoke ADDREPLICA for collection B's replica (of the only 
shard) to be placed on the same node as the A's replica, if a B's replica is 
missing on the target node for the A's replica.

This relationship in 8x was always supposed to be 1:1, i.e. a single primary 
collection could specify at most a single {{withCollection: B}}.

A reverse relationship was also created in collection B using 
{{COLOCATED_WITH}} property. This property would point to collection A and it 
would prevent collection B from being deleted while in use by collection A.

That implementation was not ideal, for several reasons:
* additional replicas of the secondary collection B were never removed when 
primary replicas were deleted or moved around.
* the code would always add an NRT replica for the B collection, there was no 
way to request other types of replicas to add.
* AFAIK the placement could fail due to the fact that the B replica placements 
would bypass the usual placement policy calculations (including free disk space 
checks).
* for the same reason the placement of the A replica could be sub-optimal 
because it didn't consider the combined metrics of A+B replicas (combined 
replica size, combined number of cores, etc).
* only 1:1 relationship was officially supported - if multiple primary 
collection pointed to the same B collection the {{COLOCATED_WITH}} property in 
B would point only to the latest primary collection. This means that users 
could accidentally bypass the B's deletion prevention mechanism if they deleted 
the latest primary collection - but still kept in use the other previously 
defined primary collections.

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-14977) Container plugins need a way to be configured

2021-01-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-14977:
---

Assignee: Andrzej Bialecki

> Container plugins need a way to be configured
> -
>
> Key: SOLR-14977
> URL: https://issues.apache.org/jira/browse/SOLR-14977
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Plugin system
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-14977.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Container plugins are defined in {{/clusterprops.json:/plugin}} using a 
> simple {{PluginMeta}} bean. This is sufficient for implementations that don't 
> need any configuration except for the {{pathPrefix}} but insufficient for 
> anything else that needs more configuration parameters.
> An example would be a {{CollectionsRepairEventListener}} plugin proposed in 
> PR-1962, which needs parameters such as the list of collections, {{waitFor}}, 
> maximum operations allowed, etc. to properly function.
> This issue proposes to extend the {{PluginMeta}} bean to allow a 
> {{Map}} configuration parameters.
> There is an interface that we could potentially use ({{MapInitializedPlugin}} 
> but it works only with {{String}} values. This is not optimal because it 
> requires additional type-safety validation from the consumers. The existing 
> {{PluginInfo}} / {{PluginInfoInitialized}} interface is too complex for this 
> purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14977) Container plugins need a way to be configured

2021-01-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14977.
-
Resolution: Fixed

> Container plugins need a way to be configured
> -
>
> Key: SOLR-14977
> URL: https://issues.apache.org/jira/browse/SOLR-14977
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Plugin system
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-14977.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Container plugins are defined in {{/clusterprops.json:/plugin}} using a 
> simple {{PluginMeta}} bean. This is sufficient for implementations that don't 
> need any configuration except for the {{pathPrefix}} but insufficient for 
> anything else that needs more configuration parameters.
> An example would be a {{CollectionsRepairEventListener}} plugin proposed in 
> PR-1962, which needs parameters such as the list of collections, {{waitFor}}, 
> maximum operations allowed, etc. to properly function.
> This issue proposes to extend the {{PluginMeta}} bean to allow a 
> {{Map}} configuration parameters.
> There is an interface that we could potentially use ({{MapInitializedPlugin}} 
> but it works only with {{String}} values. This is not optimal because it 
> requires additional type-safety validation from the consumers. The existing 
> {{PluginInfo}} / {{PluginInfoInitialized}} interface is too complex for this 
> purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14977) Container plugins need a way to be configured

2021-01-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14977:

Fix Version/s: master (9.0)

> Container plugins need a way to be configured
> -
>
> Key: SOLR-14977
> URL: https://issues.apache.org/jira/browse/SOLR-14977
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Plugin system
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-14977.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Container plugins are defined in {{/clusterprops.json:/plugin}} using a 
> simple {{PluginMeta}} bean. This is sufficient for implementations that don't 
> need any configuration except for the {{pathPrefix}} but insufficient for 
> anything else that needs more configuration parameters.
> An example would be a {{CollectionsRepairEventListener}} plugin proposed in 
> PR-1962, which needs parameters such as the list of collections, {{waitFor}}, 
> maximum operations allowed, etc. to properly function.
> This issue proposes to extend the {{PluginMeta}} bean to allow a 
> {{Map}} configuration parameters.
> There is an interface that we could potentially use ({{MapInitializedPlugin}} 
> but it works only with {{String}} values. This is not optimal because it 
> requires additional type-safety validation from the consumers. The existing 
> {{PluginInfo}} / {{PluginInfoInitialized}} interface is too complex for this 
> purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15004) Unit tests for the replica placement API

2021-01-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15004.
-
Resolution: Fixed

> Unit tests for the replica placement API
> 
>
> Key: SOLR-15004
> URL: https://issues.apache.org/jira/browse/SOLR-15004
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a follow-up to SOLR-14613. Both the APIs and the sample 
> implementations need unit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15004) Unit tests for the replica placement API

2021-01-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15004:

Fix Version/s: master (9.0)

> Unit tests for the replica placement API
> 
>
> Key: SOLR-15004
> URL: https://issues.apache.org/jira/browse/SOLR-15004
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a follow-up to SOLR-14613. Both the APIs and the sample 
> implementations need unit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15019) Replica placement API needs a way to fetch existing replica metrics

2021-01-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15019:

Fix Version/s: master (9.0)

> Replica placement API needs a way to fetch existing replica metrics
> ---
>
> Key: SOLR-15019
> URL: https://issues.apache.org/jira/browse/SOLR-15019
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Replica placement API was introduced in SOLR-14613. It offers a few sample 
> (and simple) implementations of placement plugins.
> However, this API doesn't offer support for retrieving per-replica metrics, 
> which are required for calculating more realistic placements. For example, 
> when calculating placements for ADDREPLICA on an already existing collection 
> the plugin should know what is the size of replica in order to avoid placing 
> large replicas on nodes with insufficient free disk space.
> After discussing this with [~ilan] we propose the following additions to the 
> API:
> * use the existing {{AttributeFetcher}} interface as a facade for retrieving 
> per-replica values (currently it only retrieves per-node values)
> * add {{ShardValues}} interface to represent strongly-typed API for key 
> metrics, such as replica size, number of docs, number of update and search 
> requests.
> Plugins could then use this API like this:
> {code}
> AttributeFetcher attributeFetcher = ...
> SolrCollection solrCollection = ...
> Set metricNames = ...
> attributeFetcher.requestCollectionMetrics(solrCollection, 
> solrCollection.getShardNames(), metricNames);
> AttributeValues attributeValues = attributeFetcher.fetchAttributes();
> ShardValues shardValues = 
> attributeValues.getShardMetrics(solrCollection.getName(), shardName);
> int sizeInGB = shardValues.getSizeInGB(); // retrieves shard leader metrics
> int replicaSizeInGB = shardValues.getSizeInGB(replica);
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15019) Replica placement API needs a way to fetch existing replica metrics

2021-01-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15019.
-
Resolution: Fixed

> Replica placement API needs a way to fetch existing replica metrics
> ---
>
> Key: SOLR-15019
> URL: https://issues.apache.org/jira/browse/SOLR-15019
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Replica placement API was introduced in SOLR-14613. It offers a few sample 
> (and simple) implementations of placement plugins.
> However, this API doesn't offer support for retrieving per-replica metrics, 
> which are required for calculating more realistic placements. For example, 
> when calculating placements for ADDREPLICA on an already existing collection 
> the plugin should know what is the size of replica in order to avoid placing 
> large replicas on nodes with insufficient free disk space.
> After discussing this with [~ilan] we propose the following additions to the 
> API:
> * use the existing {{AttributeFetcher}} interface as a facade for retrieving 
> per-replica values (currently it only retrieves per-node values)
> * add {{ShardValues}} interface to represent strongly-typed API for key 
> metrics, such as replica size, number of docs, number of update and search 
> requests.
> Plugins could then use this API like this:
> {code}
> AttributeFetcher attributeFetcher = ...
> SolrCollection solrCollection = ...
> Set metricNames = ...
> attributeFetcher.requestCollectionMetrics(solrCollection, 
> solrCollection.getShardNames(), metricNames);
> AttributeValues attributeValues = attributeFetcher.fetchAttributes();
> ShardValues shardValues = 
> attributeValues.getShardMetrics(solrCollection.getName(), shardName);
> int sizeInGB = shardValues.getSizeInGB(); // retrieves shard leader metrics
> int replicaSizeInGB = shardValues.getSizeInGB(replica);
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Assigned] (SOLR-15019) Replica placement API needs a way to fetch existing replica metrics

2021-01-04 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki reassigned SOLR-15019:
---

Assignee: Andrzej Bialecki

> Replica placement API needs a way to fetch existing replica metrics
> ---
>
> Key: SOLR-15019
> URL: https://issues.apache.org/jira/browse/SOLR-15019
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Replica placement API was introduced in SOLR-14613. It offers a few sample 
> (and simple) implementations of placement plugins.
> However, this API doesn't offer support for retrieving per-replica metrics, 
> which are required for calculating more realistic placements. For example, 
> when calculating placements for ADDREPLICA on an already existing collection 
> the plugin should know what is the size of replica in order to avoid placing 
> large replicas on nodes with insufficient free disk space.
> After discussing this with [~ilan] we propose the following additions to the 
> API:
> * use the existing {{AttributeFetcher}} interface as a facade for retrieving 
> per-replica values (currently it only retrieves per-node values)
> * add {{ShardValues}} interface to represent strongly-typed API for key 
> metrics, such as replica size, number of docs, number of update and search 
> requests.
> Plugins could then use this API like this:
> {code}
> AttributeFetcher attributeFetcher = ...
> SolrCollection solrCollection = ...
> Set metricNames = ...
> attributeFetcher.requestCollectionMetrics(solrCollection, 
> solrCollection.getShardNames(), metricNames);
> AttributeValues attributeValues = attributeFetcher.fetchAttributes();
> ShardValues shardValues = 
> attributeValues.getShardMetrics(solrCollection.getName(), shardName);
> int sizeInGB = shardValues.getSizeInGB(); // retrieves shard leader metrics
> int replicaSizeInGB = shardValues.getSizeInGB(replica);
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2020-12-17 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15055:

Description: 
Solr 8x replica placement provided two settings that are very useful in certain 
scenarios:

* {{withCollection}} constraint specified that replicas should be placed on the 
same nodes where replicas of another collection are located. In the 8x 
implementation this was limited in practice to co-locating single-shard 
secondary collections used for joins or other lookups from the main collection 
(which could be multi-sharded).

* {{maxShardsPerNode}} - this constraint specified the maximum number of 
replicas per shard that can be placed on the same node. In most scenarios this 
was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica of any 
given shard would be placed on any given node). Changing this constraint to 
values > 1 would reduce fault-tolerance but may be desired in test setups or as 
a temporary relief measure.

 

Both these constraints are collection-specific so they should be configured 
e.g. as collection properties.

> Re-implement 'withCollection' and 'maxShardsPerNode'
> 
>
> Key: SOLR-15055
> URL: https://issues.apache.org/jira/browse/SOLR-15055
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15055) Re-implement 'withCollection' and 'maxShardsPerNode'

2020-12-17 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15055:
---

 Summary: Re-implement 'withCollection' and 'maxShardsPerNode'
 Key: SOLR-15055
 URL: https://issues.apache.org/jira/browse/SOLR-15055
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15019) Replica placement API needs a way to fetch existing replica metrics

2020-12-07 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245374#comment-17245374
 ] 

Andrzej Bialecki commented on SOLR-15019:
-

What common strongly-typed metrics we should expose in {{ShardValues}} ? 
Previously we used just these:
 * {{INDEX.sizeInBytes}} - for easier use I propose to expose only "size in GB" 
as that is more practical for any calculations.
 * {{QUERY./select.requestTimes:1minRate}} - this was used in the 
{{SearchRateTrigger}}. It reflects the search load of the replica
 * {{UPDATE./update.requestTimes:1minRate}} - this is new, it reflects the 
indexing load of the replica.

Also, IMHO in addition to these strongly-typed metrics the {{ShardValues}} 
interface should allow fetching arbitrary weakly-typed metrics.

> Replica placement API needs a way to fetch existing replica metrics
> ---
>
> Key: SOLR-15019
> URL: https://issues.apache.org/jira/browse/SOLR-15019
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Priority: Major
>
> Replica placement API was introduced in SOLR-14613. It offers a few sample 
> (and simple) implementations of placement plugins.
> However, this API doesn't offer support for retrieving per-replica metrics, 
> which are required for calculating more realistic placements. For example, 
> when calculating placements for ADDREPLICA on an already existing collection 
> the plugin should know what is the size of replica in order to avoid placing 
> large replicas on nodes with insufficient free disk space.
> After discussing this with [~ilan] we propose the following additions to the 
> API:
> * use the existing {{AttributeFetcher}} interface as a facade for retrieving 
> per-replica values (currently it only retrieves per-node values)
> * add {{ShardValues}} interface to represent strongly-typed API for key 
> metrics, such as replica size, number of docs, number of update and search 
> requests.
> Plugins could then use this API like this:
> {code}
> AttributeFetcher attributeFetcher = ...
> SolrCollection solrCollection = ...
> Set metricNames = ...
> attributeFetcher.requestCollectionMetrics(solrCollection, 
> solrCollection.getShardNames(), metricNames);
> AttributeValues attributeValues = attributeFetcher.fetchAttributes();
> ShardValues shardValues = 
> attributeValues.getShardMetrics(solrCollection.getName(), shardName);
> int sizeInGB = shardValues.getSizeInGB(); // retrieves shard leader metrics
> int replicaSizeInGB = shardValues.getSizeInGB(replica);
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-15019) Replica placement API needs a way to fetch existing replica metrics

2020-12-07 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245374#comment-17245374
 ] 

Andrzej Bialecki edited comment on SOLR-15019 at 12/7/20, 5:46 PM:
---

What common strongly-typed metrics we should expose in {{ShardValues}} ? 
Previously we used just these:
 * {{INDEX.sizeInBytes}} - for easier use I propose to expose only "size in GB" 
as that is more practical for any calculations.
 * {{QUERY./select.requestTimes:1minRate}} - this was used in the 
{{SearchRateTrigger}}. It reflects the search load of the replica
 * {{UPDATE./update.requestTimes:1minRate}} - this is new, it reflects the 
indexing load of the replica.

Also, IMHO in addition to these strongly-typed metrics the {{ShardValues}} 
interface should allow fetching arbitrary weakly-typed replica metrics.


was (Author: ab):
What common strongly-typed metrics we should expose in {{ShardValues}} ? 
Previously we used just these:
 * {{INDEX.sizeInBytes}} - for easier use I propose to expose only "size in GB" 
as that is more practical for any calculations.
 * {{QUERY./select.requestTimes:1minRate}} - this was used in the 
{{SearchRateTrigger}}. It reflects the search load of the replica
 * {{UPDATE./update.requestTimes:1minRate}} - this is new, it reflects the 
indexing load of the replica.

Also, IMHO in addition to these strongly-typed metrics the {{ShardValues}} 
interface should allow fetching arbitrary weakly-typed metrics.

> Replica placement API needs a way to fetch existing replica metrics
> ---
>
> Key: SOLR-15019
> URL: https://issues.apache.org/jira/browse/SOLR-15019
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Priority: Major
>
> Replica placement API was introduced in SOLR-14613. It offers a few sample 
> (and simple) implementations of placement plugins.
> However, this API doesn't offer support for retrieving per-replica metrics, 
> which are required for calculating more realistic placements. For example, 
> when calculating placements for ADDREPLICA on an already existing collection 
> the plugin should know what is the size of replica in order to avoid placing 
> large replicas on nodes with insufficient free disk space.
> After discussing this with [~ilan] we propose the following additions to the 
> API:
> * use the existing {{AttributeFetcher}} interface as a facade for retrieving 
> per-replica values (currently it only retrieves per-node values)
> * add {{ShardValues}} interface to represent strongly-typed API for key 
> metrics, such as replica size, number of docs, number of update and search 
> requests.
> Plugins could then use this API like this:
> {code}
> AttributeFetcher attributeFetcher = ...
> SolrCollection solrCollection = ...
> Set metricNames = ...
> attributeFetcher.requestCollectionMetrics(solrCollection, 
> solrCollection.getShardNames(), metricNames);
> AttributeValues attributeValues = attributeFetcher.fetchAttributes();
> ShardValues shardValues = 
> attributeValues.getShardMetrics(solrCollection.getName(), shardName);
> int sizeInGB = shardValues.getSizeInGB(); // retrieves shard leader metrics
> int replicaSizeInGB = shardValues.getSizeInGB(replica);
> {code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15022) RefGuide documentation for /cluster/plugin API

2020-12-07 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15022.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> RefGuide documentation for /cluster/plugin API
> --
>
> Key: SOLR-15022
> URL: https://issues.apache.org/jira/browse/SOLR-15022
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Plugin system
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>
> The {{/cluster/plugin}} API needs user-level documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14182) Move metric reporters config from solr.xml to ZK cluster properties

2020-12-07 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245123#comment-17245123
 ] 

Andrzej Bialecki commented on SOLR-14182:
-

Ok, I see your point Tomas - let's work on SOLR-14843 first.

> Move metric reporters config from solr.xml to ZK cluster properties
> ---
>
> Key: SOLR-14182
> URL: https://issues.apache.org/jira/browse/SOLR-14182
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.4
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Metric reporters are currently configured statically in solr.xml, which makes 
> it difficult to change dynamically or in a containerized environment.
> We should move this section to ZK /cluster.properties and add a back-compat 
> migration shim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-15016) Replica placement plugins should use container plugins API / configs

2020-12-07 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-15016.
-
Fix Version/s: master (9.0)
   Resolution: Fixed

> Replica placement plugins should use container plugins API / configs
> 
>
> Key: SOLR-15016
> URL: https://issues.apache.org/jira/browse/SOLR-15016
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Plugin system
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Replica placement API currently uses its own way of loading plugin 
> implementations and their config.
> Instead it should use a more robust mechanism supported by 
> {{ContainerPluginsAPI}} and {{ContainerPluginsRegistry}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

2020-12-03 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14613:

Fix Version/s: master (9.0)

> Provide a clean API for pluggable replica assignment implementations
> 
>
> Key: SOLR-14613
> URL: https://issues.apache.org/jira/browse/SOLR-14613
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Ilan Ginzburg
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 41h 20m
>  Remaining Estimate: 0h
>
> As described in SIP-8 the current autoscaling Policy implementation has 
> several limitations that make it difficult to use for very large clusters and 
> very large collections. SIP-8 also mentions the possible migration path by 
> providing alternative implementations of the placement strategies that are 
> less complex but more efficient in these very large environments.
> We should review the existing APIs that the current autoscaling engine uses 
> ({{SolrCloudManager}} , {{AssignStrategy}} , {{Suggester}} and related 
> interfaces) to see if they provide a sufficient and minimal API for plugging 
> in alternative autoscaling placement strategies, and if necessary refactor 
> the existing APIs.
> Since these APIs are internal it should be possible to do this without 
> breaking back-compat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14182) Move metric reporters config from solr.xml to ZK cluster properties

2020-12-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242285#comment-17242285
 ] 

Andrzej Bialecki commented on SOLR-14182:
-

I'd like to start working on this. As I see it this issue needs to address the 
following:
 * mark {{solr.xml:/solr/metrics}} as deprecated and remove in 9.1.
 * general metrics configuration (such as enable/disable, metric suppliers 
options) should move to {{/clusterprops.json:/metrics}}
 * metric reporters configuration should be moved to container-level plugins, 
ie. {{/clusterprops.json:/plugin}} and the corresponding API. This will make 
the reporters easier to configure and change dynamically without restarting 
Solr nodes.
 * precedence: {{MetricsConfig}} will be initialized from {{solr.xml}} as 
before. Then, if any clusterprops configuration is present then it will REPLACE 
the one from {{solr.xml}} - I don't want to attempt any fusion of these two, 
and I think it's easier to migrate if you don't merge these configs. This 
approach means that defining anything using the new locations will 
automatically turn off the old {{solr.xml}} config.

> Move metric reporters config from solr.xml to ZK cluster properties
> ---
>
> Key: SOLR-14182
> URL: https://issues.apache.org/jira/browse/SOLR-14182
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.4
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Metric reporters are currently configured statically in solr.xml, which makes 
> it difficult to change dynamically or in a containerized environment.
> We should move this section to ZK /cluster.properties and add a back-compat 
> migration shim.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-4735) Improve Solr metrics reporting

2020-12-02 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-4735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242229#comment-17242229
 ] 

Andrzej Bialecki commented on SOLR-4735:


[~Pavithrad] please open a new Jira issue and describe the problem in more 
detail, including Solr version, environment and Solr logs - this issue is 
closed.

> Improve Solr metrics reporting
> --
>
> Key: SOLR-4735
> URL: https://issues.apache.org/jira/browse/SOLR-4735
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Alan Woodward
>Assignee: Andrzej Bialecki
>Priority: Minor
> Fix For: 6.4, 7.0
>
> Attachments: SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, 
> SOLR-4735.patch, SOLR-4735.patch, SOLR-4735.patch, screenshot-2.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Following on from a discussion on the mailing list:
> http://search-lucene.com/m/IO0EI1qdyJF1/codahale=Solr+metrics+in+Codahale+metrics+and+Graphite+
> It would be good to make Solr play more nicely with existing devops 
> monitoring systems, such as Graphite or Ganglia.  Stats monitoring at the 
> moment is poll-only, either via JMX or through the admin stats page.  I'd 
> like to refactor things a bit to make this more pluggable.
> This patch is a start.  It adds a new interface, InstrumentedBean, which 
> extends SolrInfoMBean to return a 
> [[Metrics|http://metrics.codahale.com/manual/core/]] MetricRegistry, and a 
> couple of MetricReporters (which basically just duplicate the JMX and admin 
> page reporting that's there at the moment, but which should be more 
> extensible).  The patch includes a change to RequestHandlerBase showing how 
> this could work.  The idea would be to eventually replace the getStatistics() 
> call on SolrInfoMBean with this instead.
> The next step would be to allow more MetricReporters to be defined in 
> solrconfig.xml.  The Metrics library comes with ganglia and graphite 
> reporting modules, and we can add contrib plugins for both of those.
> There's some more general cleanup that could be done around SolrInfoMBean 
> (we've got two plugin handlers at /mbeans and /plugins that basically do the 
> same thing, and the beans themselves have some weirdly inconsistent data on 
> them - getVersion() returns different things for different impls, and 
> getSource() seems pretty useless), but maybe that's for another issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

2020-12-01 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17241685#comment-17241685
 ] 

Andrzej Bialecki commented on SOLR-14613:
-

[~noble.paul] tests are added in SOLR-15004. [~ilan] I think this issue can be 
resolved as fixed?

> Provide a clean API for pluggable replica assignment implementations
> 
>
> Key: SOLR-14613
> URL: https://issues.apache.org/jira/browse/SOLR-14613
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Ilan Ginzburg
>Priority: Major
>  Time Spent: 41h 20m
>  Remaining Estimate: 0h
>
> As described in SIP-8 the current autoscaling Policy implementation has 
> several limitations that make it difficult to use for very large clusters and 
> very large collections. SIP-8 also mentions the possible migration path by 
> providing alternative implementations of the placement strategies that are 
> less complex but more efficient in these very large environments.
> We should review the existing APIs that the current autoscaling engine uses 
> ({{SolrCloudManager}} , {{AssignStrategy}} , {{Suggester}} and related 
> interfaces) to see if they provide a sufficient and minimal API for plugging 
> in alternative autoscaling placement strategies, and if necessary refactor 
> the existing APIs.
> Since these APIs are internal it should be possible to do this without 
> breaking back-compat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15022) RefGuide documentation for /cluster/plugin API

2020-11-30 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240682#comment-17240682
 ] 

Andrzej Bialecki commented on SOLR-15022:
-

This documentation depends on some of the changes in the linked issues.

> RefGuide documentation for /cluster/plugin API
> --
>
> Key: SOLR-15022
> URL: https://issues.apache.org/jira/browse/SOLR-15022
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Plugin system
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> The {{/cluster/plugin}} API needs user-level documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15022) RefGuide documentation for /cluster/plugin API

2020-11-30 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15022:
---

 Summary: RefGuide documentation for /cluster/plugin API
 Key: SOLR-15022
 URL: https://issues.apache.org/jira/browse/SOLR-15022
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Plugin system
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


The {{/cluster/plugin}} API needs user-level documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15019) Replica placement API needs a way to fetch existing replica metrics

2020-11-26 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15019:
---

 Summary: Replica placement API needs a way to fetch existing 
replica metrics
 Key: SOLR-15019
 URL: https://issues.apache.org/jira/browse/SOLR-15019
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Andrzej Bialecki


Replica placement API was introduced in SOLR-14613. It offers a few sample (and 
simple) implementations of placement plugins.

However, this API doesn't offer support for retrieving per-replica metrics, 
which are required for calculating more realistic placements. For example, when 
calculating placements for ADDREPLICA on an already existing collection the 
plugin should know what is the size of replica in order to avoid placing large 
replicas on nodes with insufficient free disk space.

After discussing this with [~ilan] we propose the following additions to the 
API:

* use the existing {{AttributeFetcher}} interface as a facade for retrieving 
per-replica values (currently it only retrieves per-node values)
* add {{ShardValues}} interface to represent strongly-typed API for key 
metrics, such as replica size, number of docs, number of update and search 
requests.

Plugins could then use this API like this:
{code}
AttributeFetcher attributeFetcher = ...
SolrCollection solrCollection = ...
Set metricNames = ...
attributeFetcher.requestCollectionMetrics(solrCollection, 
solrCollection.getShardNames(), metricNames);

AttributeValues attributeValues = attributeFetcher.fetchAttributes();
ShardValues shardValues = 
attributeValues.getShardMetrics(solrCollection.getName(), shardName);
int sizeInGB = shardValues.getSizeInGB(); // retrieves shard leader metrics
int replicaSizeInGB = shardValues.getSizeInGB(replica);
{code} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15016) Replica placement plugins should use container plugins API / configs

2020-11-25 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15016:
---

 Summary: Replica placement plugins should use container plugins 
API / configs
 Key: SOLR-15016
 URL: https://issues.apache.org/jira/browse/SOLR-15016
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Plugin system
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki


Replica placement API currently uses its own way of loading plugin 
implementations and their config.

Instead it should use a more robust mechanism supported by 
{{ContainerPluginsAPI}} and {{ContainerPluginsRegistry}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-15007) Aggregate core handler=/select and /update metrics at the node level metric too

2020-11-18 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-15007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234635#comment-17234635
 ] 

Andrzej Bialecki commented on SOLR-15007:
-

Please check out {{MetricsCollectorHandler}}, {{SolrShardReporter}} and 
{{SolrClusterReporter}}. They were created to handle a little different 
scenario (aggregating metrics from nodes into shard leader / Overseer leader) 
but perhaps they can be reused.

They also use {{AggregateMetric}} to represent aggregated numeric values 
together with individual contributions.

> Aggregate core handler=/select and /update metrics at the node level metric 
> too
> ---
>
> Key: SOLR-15007
> URL: https://issues.apache.org/jira/browse/SOLR-15007
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: master (9.0)
>Reporter: Mathieu Marie
>Priority: Minor
>
> At my company, we anticipate huge number of cores and would like to report 
> aggregated view at the node level instead of the core level that will grow 
> exponentially.
> Right now, we're aggregating all of the solr.cores metrics to compute 
> per-cluster dashboards.
> But given that there are many admin handlers already reporting metrics at the 
> node level, I wonder if we could aggregate _/update_, _/select_ and all the 
> other handler counters in solr and expose them at the solr.node level too.
> It would requires (a lot) less data to transport, store and aggregate later, 
> while still giving access to per core metrics.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15004) Unit tests for the replica placement API

2020-11-16 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15004:

Description: This is a follow-up to SOLR-14613. Both the APIs and the 
sample implementations need unit tests.  (was: This is a follow-up to 
SOLR-14613.)

> Unit tests for the replica placement API
> 
>
> Key: SOLR-15004
> URL: https://issues.apache.org/jira/browse/SOLR-15004
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> This is a follow-up to SOLR-14613. Both the APIs and the sample 
> implementations need unit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-15004) Unit tests for the replica placement API

2020-11-16 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-15004:
---

 Summary: Unit tests for the replica placement API
 Key: SOLR-15004
 URL: https://issues.apache.org/jira/browse/SOLR-15004
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: AutoScaling
Reporter: Andrzej Bialecki
Assignee: Andrzej Bialecki






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-15004) Unit tests for the replica placement API

2020-11-16 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-15004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-15004:

Description: This is a follow-up to SOLR-14613.

> Unit tests for the replica placement API
> 
>
> Key: SOLR-15004
> URL: https://issues.apache.org/jira/browse/SOLR-15004
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> This is a follow-up to SOLR-14613.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-12821) IndexSizeTriggerTest.testMixedBounds() failures

2020-11-11 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-12821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-12821.
-
Resolution: Won't Fix

Autoscaling framework (the code in 8x) is EOL.

> IndexSizeTriggerTest.testMixedBounds() failures
> ---
>
> Key: SOLR-12821
> URL: https://issues.apache.org/jira/browse/SOLR-12821
> Project: Solr
>  Issue Type: Bug
>  Components: Tests
>Reporter: Steven Rowe
>Assignee: Andrzej Bialecki
>Priority: Major
>
> From [https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/2077/], 
> reproduced 5/5 iterations for me on Linux:
> {noformat}
> Checking out Revision 03c9c04353ce1b5ace33fddd5bd99059e63ed507 
> (refs/remotes/origin/master)
> [...]
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=IndexSizeTriggerTest -Dtests.method=testMixedBounds 
> -Dtests.seed=9336AB152EE44632 -Dtests.slow=true -Dtests.locale=hr 
> -Dtests.timezone=America/Guayaquil -Dtests.asserts=true 
> -Dtests.file.encoding=US-ASCII
>[junit4] FAILURE 50.8s J1 | IndexSizeTriggerTest.testMixedBounds <<<
>[junit4]> Throwable #1: java.lang.AssertionError: 
> expected: but was:
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([9336AB152EE44632:99B514B8635F4D68]:0)
>[junit4]>  at 
> org.apache.solr.cloud.autoscaling.IndexSizeTriggerTest.testMixedBounds(IndexSizeTriggerTest.java:669)
>[junit4]>  at java.lang.Thread.run(Thread.java:748)
> [...]
>[junit4]   2> NOTE: test params are: codec=Asserting(Lucene80): 
> {foo=PostingsFormat(name=MockRandom), id=PostingsFormat(name=Direct)}, 
> docValues:{_version_=DocValuesFormat(name=Asserting), 
> foo=DocValuesFormat(name=Asserting), id=DocValuesFormat(name=Lucene70)}, 
> maxPointsInLeafNode=452, maxMBSortInHeap=5.552665847709986, 
> sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@cc0bab0),
>  locale=hr, timezone=America/Guayaquil
>[junit4]   2> NOTE: SunOS 5.11 amd64/Oracle Corporation 1.8.0_172 
> (64-bit)/cpus=3,threads=1,free=191495432,total=518979584
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-13641) Undocumented and untested "cleanupThread" functionality in LFUCache and FastLRUCache

2020-11-11 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-13641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-13641.
-
Resolution: Won't Fix

LFUCache is EOL in Solr 8x and removed in Solr 9.0.

> Undocumented and untested "cleanupThread" functionality in LFUCache and 
> FastLRUCache
> 
>
> Key: SOLR-13641
> URL: https://issues.apache.org/jira/browse/SOLR-13641
> Project: Solr
>  Issue Type: Bug
>Reporter: Andrzej Bialecki
>Priority: Major
>
> Both LFUCache and FastLRUCache support a functionality for running evictions 
> asynchronously, in a thread different from the one that executes a {{put(K, 
> V)}} operation.
> Additionally, these asynchronous evictions can use either a one-off thread 
> created after each put, or a single long-running cleanup thread.
> However, this functionality is not documented anywhere and it's not tested. 
> It should either be removed, if it's not used, or properly documented and 
> tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14233) JsonSchemaCreator should support Map payloads

2020-11-11 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14233.
-
Resolution: Fixed

This has been fixed in SOLR-14871.

> JsonSchemaCreator should support Map payloads
> -
>
> Key: SOLR-14233
> URL: https://issues.apache.org/jira/browse/SOLR-14233
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.4.1
>Reporter: Andrzej Bialecki
>Priority: Major
> Attachments: schema.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> While working on v2 API for SOLR-13579 I discovered that it's currently not 
> possible to use API methods where the payload content is a {{java.util.Map}}. 
> This is needed when passing arguments that are arbitrary key-value maps and 
> not strictly-defined beans.
> Specifically, I needed a method like this:
> {code}
> @Command(name = "setparams")
> public void setParams(SolrQueryRequest req, SolrQueryResponse rsp, 
> PayloadObj> payload) {
> ...
> }
> {code}
> But this declaration produced confusing errors during API registration.
> Upon further digging I discovered that {{JsonSchemaCreator}} doesn't support 
> Map payloads.
> Attached patch seems to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14275) Policy calculations are very slow for large clusters and large operations

2020-11-11 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14275.
-
Resolution: Won't Fix

This implementation is EOL in Solr 8x.

> Policy calculations are very slow for large clusters and large operations
> -
>
> Key: SOLR-14275
> URL: https://issues.apache.org/jira/browse/SOLR-14275
> Project: Solr
>  Issue Type: Bug
>  Components: AutoScaling
>Affects Versions: 7.7.2, 8.4.1
>Reporter: Andrzej Bialecki
>Priority: Major
>  Labels: scaling
> Attachments: SOLR-14275.patch, scenario.txt
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Replica placement calculations performed during collection creation take 
> extremely long time (several minutes) when using a large cluster and creating 
> a large collection (eg. 1000 nodes, 500 shards, 4 replicas).
> Profiling shows that most of the time is spent in 
> {{Row.computeCacheIfAbsent}}, which probably doesn't reuse this cache as much 
> as it should.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14948) Autoscaling maxComputeOperations override causes exceptions

2020-11-11 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14948.
-
Resolution: Fixed

> Autoscaling maxComputeOperations override causes exceptions
> ---
>
> Key: SOLR-14948
> URL: https://issues.apache.org/jira/browse/SOLR-14948
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> The maximum number of operations to calculate in {{ComputePlanAction}} is 
> estimated based on the average collection replication factor and the size of 
> the cluster.
> In some cases this estimate may be insufficient, and there's an override 
> property that can be defined in {{autoscaling.json}} named 
> {{maxComputeOperations}}. However, the code in {{ComputePlanAction}} makes an 
> explicit cast to {{Integer}} whereas the value coming from a parsed JSON is 
> of type {{Long}}. This results in a {{ClassCastException}} being thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14749) Provide a clean API for cluster-level event processing

2020-11-11 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14749.
-
Resolution: Fixed

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>  Time Spent: 22h
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-11 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14683:

Fix Version/s: (was: master (9.0))
   8.8

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.8
>
> Attachments: SOLR-14683.patch, SOLR-14683.patch
>
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-10 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki resolved SOLR-14683.
-
Resolution: Fixed

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-14683.patch, SOLR-14683.patch
>
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-10 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14683:

Fix Version/s: master (9.0)

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-14683.patch, SOLR-14683.patch
>
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228768#comment-17228768
 ] 

Andrzej Bialecki commented on SOLR-14683:
-

This patch adds configurable placeholders for missing values of different 
types, all returning {{null}} by default. They are configured in 
{{solr.xml:solr/metrics/missingValues}} section, per Ref Guide doc (see example 
there).

If there are no objections I'll commit this shortly.

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-14683.patch, SOLR-14683.patch
>
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14683:

Attachment: SOLR-14683.patch

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-14683.patch, SOLR-14683.patch
>
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14683:

Attachment: SOLR-14683.patch

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-14683.patch
>
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228512#comment-17228512
 ] 

Andrzej Bialecki edited comment on SOLR-14683 at 11/9/20, 10:58 AM:


{quote}Solr's JSON Response writer already has long standing support to output 
{{Float.NaN}} as a quoted string {{"NaN"}}
{quote}
Therein lies the problem ;) since there is no standard way to do it Solr 
decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries 
(and the popular extended spec [http://json5.org|http://json5.org%29/]) use 
unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, 
inf, -inf}} - that's the beauty of standards, there are so many of them to 
choose from... /s

Taking all this into account serializing NaN as {{null}} seems like the safest 
option, unless we add this configurability to our JSONWriter.

Also, from the point of view of metrics it seems it conveys the same message 
when it returns NaN or null when the value is unknown - so for simplicity and 
easier compatibility we could always return {{null}} as a metric value, 
regardless of how it's serialized.


was (Author: ab):
{quote}Solr's JSON Response writer already has long standing support to output 
{{Float.NaN}} as a quoted string {{"NaN"}}
{quote}
Therein lies the problem ;) since there is no standard way to do it Solr 
decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries 
(and the popular extended spec [http://json5.org|http://json5.org%29/]) use 
unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, 
inf, -inf}} - that's the beauty of standards, there are so many of them to 
choose from... /s

Taking all this into account serializing NaN as {{null}} seems like the safest 
option, unless we add this configurability to our JSONWriter.

Also, since from the point of view of metrics it seems it conveys the same 
message when it returns NaN or null when the value is unknown - so for 
simplicity and easier compatibility we could always return {{null}} as a metric 
value, regardless of how it's serialized.

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228512#comment-17228512
 ] 

Andrzej Bialecki edited comment on SOLR-14683 at 11/9/20, 10:57 AM:


{quote}Solr's JSON Response writer already has long standing support to output 
{{Float.NaN}} as a quoted string {{"NaN"}}
{quote}
Therein lies the problem ;) since there is no standard way to do it Solr 
decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries 
(and the popular extended spec [http://json5.org|http://json5.org%29/]) use 
unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, 
inf, -inf}} - that's the beauty of standards, there are so many of them to 
choose from... /s

Taking all this into account serializing NaN as {{null}} seems like the safest 
option, unless we add this configurability to our JSONWriter.

Also, since from the point of view of metrics it seems it conveys the same 
message when it returns NaN or null when the value is unknown - so for 
simplicity and easier compatibility we could always return {{null}} as a metric 
value, regardless of how it's serialized.


was (Author: ab):
{quote}Solr's JSON Response writer already has long standing support to output 
{{Float.NaN}} as a quoted string {{"NaN"}}
{quote}
Therein lies the problem ;) since there is no standard way to do it Solr 
decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries 
(and the popular extended spec [http://json5.org|http://json5.org%29/]) use 
unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, 
inf, -inf}} - that's the beauty of standards, there are so many of them to 
choose from... /s

Taking all this into account returning {{null}} for NaN or undefined seems like 
the safest option.

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-09 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228512#comment-17228512
 ] 

Andrzej Bialecki commented on SOLR-14683:
-

{quote}Solr's JSON Response writer already has long standing support to output 
{{Float.NaN}} as a quoted string {{"NaN"}}
{quote}
Therein lies the problem ;) since there is no standard way to do it Solr 
decided to use a STRING (quoted) value of {{"NaN"}}... but some other libraries 
(and the popular extended spec [http://json5.org|http://json5.org%29/]) use 
unquoted values for {{NaN, Infinity, -Infinity}}. Some other parsers use {{nan, 
inf, -inf}} - that's the beauty of standards, there are so many of them to 
choose from... /s

Taking all this into account returning {{null}} for NaN or undefined seems like 
the safest option.

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14683) Review the metrics API to ensure consistent placeholders for missing values

2020-11-05 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226729#comment-17226729
 ] 

Andrzej Bialecki commented on SOLR-14683:
-

Prometheus best practices recommend "avoiding missing metrics" (as if that were 
always possible... what about eg. missing them due to network connectivity?), 
and recommend reporting 0 or NaN for the missing numeric metrics:

{quote}
Avoid missing metrics
Time series that are not present until something happens are difficult to deal 
with, as the usual simple operations are no longer sufficient to correctly 
handle them. To avoid this, export 0 (or NaN, if 0 would be misleading) for any 
time series you know may exist in advance.

Most Prometheus client libraries (including Go, Java, and Python) will 
automatically export a 0 for you for metrics with no labels.
{quote}

For frequently occurring events, where the average value of the metric may be 
high, reporting 0 WILL skew the stats more than reporting NaN. Reporting NaN 
also clearly indicates that the data is not available, as opposed to 0 which 
may be a legitimate value of the metric.

The problem is that serialization of NaN in JSON is not present in the JSON 
standard, only in extensions such as JSON 5 (http://json5.org). The current 
JSON standard ECMA-404 says "Numeric values that cannot be represented as 
sequences of digits (such as Infinity and NaN) are not permitted."

So the only standard option left in JSON to indicate that the data is missing 
is to return {{null}}.

> Review the metrics API to ensure consistent placeholders for missing values
> ---
>
> Key: SOLR-14683
> URL: https://issues.apache.org/jira/browse/SOLR-14683
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>
> Spin-off from SOLR-14657. Some gauges can legitimately be missing or in an 
> unknown state at some points in time, eg. during SolrCore startup or shutdown.
> Currently the API returns placeholders with either impossible values for 
> numeric gauges (such as index size -1) or empty maps / strings for other 
> non-numeric gauges.
> [~hossman] noticed that the values for these placeholders may be misleading, 
> depending on how the user treats them - if the client has no special logic to 
> treat them as "missing values" it may erroneously treat them as valid data. 
> E.g. numeric values of -1 or 0 may severely skew averages and produce 
> misleading peaks / valleys in metrics histories.
> On the other hand returning a literal {{null}} value instead of the expected 
> number may also cause unexpected client issues - although in this case it's 
> clearer that there's actually no data available, so long-term this may be a 
> better strategy than returning impossible values, even if it means that the 
> client should learn to handle {{null}} values appropriately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14749) Provide a clean API for cluster-level event processing

2020-11-05 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226674#comment-17226674
 ] 

Andrzej Bialecki commented on SOLR-14749:
-

Thanks [~noble.paul] and [~ilan] for comments and reviews!

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>  Time Spent: 22h
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14977) Container plugins need a way to be configured

2020-11-03 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17225524#comment-17225524
 ] 

Andrzej Bialecki commented on SOLR-14977:
-

Thanks [~ilan] this is very helpful, I think we should reuse this approach.

BTW, I think that the placement plugin info should live in the same place as 
other plugins, there should be no need to put it in a separate sub-section of 
/clusterprops.json. Then we could reuse the already existing mechanism for 
plugin config monitoring, plugin reloading etc in {{CustomContainerPlugins}}. 
This is what the {{ClusterSingleton}} and the upcoming {{ClusterEventProducer}} 
use in order to enable dynamic reloading of plugins. But that's a separate 
issue :)

> Container plugins need a way to be configured
> -
>
> Key: SOLR-14977
> URL: https://issues.apache.org/jira/browse/SOLR-14977
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Plugin system
>Reporter: Andrzej Bialecki
>Priority: Major
>
> Container plugins are defined in {{/clusterprops.json:/plugin}} using a 
> simple {{PluginMeta}} bean. This is sufficient for implementations that don't 
> need any configuration except for the {{pathPrefix}} but insufficient for 
> anything else that needs more configuration parameters.
> An example would be a {{CollectionsRepairEventListener}} plugin proposed in 
> PR-1962, which needs parameters such as the list of collections, {{waitFor}}, 
> maximum operations allowed, etc. to properly function.
> This issue proposes to extend the {{PluginMeta}} bean to allow a 
> {{Map}} configuration parameters.
> There is an interface that we could potentially use ({{MapInitializedPlugin}} 
> but it works only with {{String}} values. This is not optimal because it 
> requires additional type-safety validation from the consumers. The existing 
> {{PluginInfo}} / {{PluginInfoInitialized}} interface is too complex for this 
> purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14977) Container plugins need a way to be configured

2020-11-02 Thread Andrzej Bialecki (Jira)
Andrzej Bialecki created SOLR-14977:
---

 Summary: Container plugins need a way to be configured
 Key: SOLR-14977
 URL: https://issues.apache.org/jira/browse/SOLR-14977
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Plugin system
Reporter: Andrzej Bialecki


Container plugins are defined in {{/clusterprops.json:/plugin}} using a simple 
{{PluginMeta}} bean. This is sufficient for implementations that don't need any 
configuration except for the {{pathPrefix}} but insufficient for anything else 
that needs more configuration parameters.

An example would be a {{CollectionsRepairEventListener}} plugin proposed in 
PR-1962, which needs parameters such as the list of collections, {{waitFor}}, 
maximum operations allowed, etc. to properly function.

This issue proposes to extend the {{PluginMeta}} bean to allow a {{Map}} configuration parameters.

There is an interface that we could potentially use ({{MapInitializedPlugin}} 
but it works only with {{String}} values. This is not optimal because it 
requires additional type-safety validation from the consumers. The existing 
{{PluginInfo}} / {{PluginInfoInitialized}} interface is too complex for this 
purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14749) Provide a clean API for cluster-level event processing

2020-10-28 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1702#comment-1702
 ] 

Andrzej Bialecki commented on SOLR-14749:
-

This is not strictly speaking a user-level functionality. As we discussed this 
offline, it's true that we need to provide a comprehensive documentation for 
the whole plugin subsystem, but in my view this falls somewhere between 
admin-level and developer-level.

Admin-level docs would include the list of plugin categories, bundled 
implementations and how to configure them. {{ClusterSingleton}} is a 
developer-level concept, which is not exposed to admins and not configurable 
(at least for now). The {{ClusterEventProducer}} implementation and 
{{ClusterEventListener}}-s are configurable, and I can add some docs 
(somewhere???) but their functionality can only be changed by providing a 
different plugin implementation ... which is again a developer-level doc.

I don't think we have any good place to put developer-level documentation yet - 
the Ref Guide doesn't seem ideal, its focus is on user- and admin-level 
documentation.

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14749) Provide a clean API for cluster-level event processing

2020-10-27 Thread Andrzej Bialecki (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221730#comment-17221730
 ] 

Andrzej Bialecki commented on SOLR-14749:
-

[~noble.paul] "this" meaning what part specifically? so far this is an internal 
API, unless you mean the ability to define non-API plugins?

> Provide a clean API for cluster-level event processing
> --
>
> Key: SOLR-14749
> URL: https://issues.apache.org/jira/browse/SOLR-14749
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>
> This is a companion issue to SOLR-14613 and it aims at providing a clean, 
> strongly typed API for the functionality formerly known as "triggers" - that 
> is, a component for generating cluster-level events corresponding to changes 
> in the cluster state, and a pluggable API for processing these events.
> The 8x triggers have been removed so this functionality is currently missing 
> in 9.0. However, this functionality is crucial for implementing the automatic 
> collection repair and re-balancing as the cluster state changes (nodes going 
> down / up, becoming overloaded / unused / decommissioned, etc).
> For this reason we need this API and a default implementation of triggers 
> that at least can perform automatic collection repair (maintaining the 
> desired replication factor in presence of live node changes).
> As before, the actual changes to the collections will be executed using 
> existing CollectionAdmin API, which in turn may use the placement plugins 
> from SOLR-14613.
> h3. Division of responsibility
>  * built-in Solr components (non-pluggable):
>  ** cluster state monitoring and event generation,
>  ** simple scheduler to periodically generate scheduled events
>  * plugins:
>  ** automatic collection repair on {{nodeLost}} events (provided by default)
>  ** re-balancing of replicas (periodic or on {{nodeAdded}} events)
>  ** reporting (eg. requesting additional node provisioning)
>  ** scheduled maintenance (eg. removing inactive shards after split)
> h3. Other considerations
> These plugins (unlike the placement plugins) need to execute on one 
> designated node in the cluster. Currently the easiest way to implement this 
> is to run them on the Overseer leader node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14948) Autoscaling maxComputeOperations override causes exceptions

2020-10-19 Thread Andrzej Bialecki (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-14948:

Fix Version/s: (was: 8.8)
   8.7

> Autoscaling maxComputeOperations override causes exceptions
> ---
>
> Key: SOLR-14948
> URL: https://issues.apache.org/jira/browse/SOLR-14948
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: AutoScaling
>Affects Versions: 8.6.3
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.7
>
>
> The maximum number of operations to calculate in {{ComputePlanAction}} is 
> estimated based on the average collection replication factor and the size of 
> the cluster.
> In some cases this estimate may be insufficient, and there's an override 
> property that can be defined in {{autoscaling.json}} named 
> {{maxComputeOperations}}. However, the code in {{ComputePlanAction}} makes an 
> explicit cast to {{Integer}} whereas the value coming from a parsed JSON is 
> of type {{Long}}. This results in a {{ClassCastException}} being thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



  1   2   3   4   5   6   >