Re: [PR] SolrJ HTTP/2 Async API using CompletableFuture (update for 2024) [solr]

2024-04-11 Thread via GitHub


dsmiley commented on code in PR #2402:
URL: https://github.com/apache/solr/pull/2402#discussion_r156203


##
solr/solrj/src/test/org/apache/solr/client/solrj/impl/LBHttp2SolrClientTest.java:
##
@@ -51,4 +67,249 @@ public void 
testLBHttp2SolrClientWithTheseParamNamesInTheUrl() {
   http2SolrClient.getUrlParamNames().toArray());
 }
   }
+
+  @Test
+  public void testAsyncDeprecated() {
+testAsync(true);
+  }
+
+  @Test
+  public void testAsync() {
+testAsync(false);
+  }
+
+  @Test
+  public void testAsyncWithFailures() {
+
+// TODO: This demonstrates that the failing endpoint always gets retried, 
but
+// I would expect it to be labelled as a "zombie" and be skipped with 
additional iterations.
+
+LBSolrClient.Endpoint ep1 = new 
LBSolrClient.Endpoint("http://endpoint.one;);
+LBSolrClient.Endpoint ep2 = new 
LBSolrClient.Endpoint("http://endpoint.two;);
+List endpointList = List.of(ep1, ep2);
+
+Http2SolrClient.Builder b = new Http2SolrClient.Builder("http://base.url;);
+try (MockHttp2SolrClient client = new 
MockHttp2SolrClient("http://base.url;, b);
+LBHttp2SolrClient testClient = new LBHttp2SolrClient.Builder(client, 
ep1, ep2).build()) {
+
+  for (int j = 0; j < 2; j++) {
+// j: first time Endpoint One will retrun error code 500.
+// second time Endpoint One will be healthy
+
+String basePathToSucceed;
+if (j == 0) {
+  client.basePathToFail = ep1.getBaseUrl();
+  basePathToSucceed = ep2.getBaseUrl();
+} else {
+  client.basePathToFail = ep2.getBaseUrl();
+  basePathToSucceed = ep1.getBaseUrl();
+}
+
+for (int i = 0; i < 10; i++) {
+  // i: we'll try 10 times to see if it behaves the same every time.
+
+  QueryRequest queryRequest = new QueryRequest(new 
MapSolrParams(Map.of("q", "" + i)));
+  LBSolrClient.Req req = new LBSolrClient.Req(queryRequest, 
endpointList);
+  String iterMessage = "iter j/i " + j + "/" + i;
+  try {
+testClient.requestAsync(req).get(1, TimeUnit.MINUTES);
+  } catch (InterruptedException ie) {
+Thread.currentThread().interrupt();
+fail("interrupted");
+  } catch (TimeoutException | ExecutionException e) {
+fail(iterMessage + " Response ended in failure: " + e);
+  }
+  if (j == 0) {
+// The first endpoint gives an exception, so it retries.
+assertEquals(iterMessage, 2, client.lastBasePaths.size());
+
+String failedBasePath = client.lastBasePaths.remove(0);
+assertEquals(iterMessage, client.basePathToFail, failedBasePath);
+  } else {
+// The first endpoint does not give the exception, it doesn't 
retry.
+assertEquals(iterMessage, 1, client.lastBasePaths.size());
+  }
+  String successBasePath = client.lastBasePaths.remove(0);
+  assertEquals(iterMessage, basePathToSucceed, successBasePath);
+}
+  }
+}
+  }
+
+  private void testAsync(boolean useDeprecatedApi) {
+LBSolrClient.Endpoint ep1 = new 
LBSolrClient.Endpoint("http://endpoint.one;);
+LBSolrClient.Endpoint ep2 = new 
LBSolrClient.Endpoint("http://endpoint.two;);
+List endpointList = List.of(ep1, ep2);
+
+Http2SolrClient.Builder b = new Http2SolrClient.Builder("http://base.url;);
+try (MockHttp2SolrClient client = new 
MockHttp2SolrClient("http://base.url;, b);
+LBHttp2SolrClient testClient = new LBHttp2SolrClient.Builder(client, 
ep1, ep2).build()) {
+
+  int limit = 10; // For simplicity use an even limit
+  int halfLimit = limit / 2; // see TODO below
+
+  CountDownLatch cdl = new CountDownLatch(limit); // deprecated API use
+  List listeners = new ArrayList<>(); // deprecated 
API use
+  List> responses = new ArrayList<>();
+
+  for (int i = 0; i < limit; i++) {
+QueryRequest queryRequest = new QueryRequest(new 
MapSolrParams(Map.of("q", "" + i)));
+LBSolrClient.Req req = new LBSolrClient.Req(queryRequest, 
endpointList);
+if (useDeprecatedApi) {
+  LBTestAsyncListener listener = new LBTestAsyncListener(cdl);
+  listeners.add(listener);
+  testClient.asyncReq(req, listener);
+} else {
+  responses.add(testClient.requestAsync(req));
+}
+  }
+
+  if (useDeprecatedApi) {
+try {
+  // This is just a formality.  This is a single-threaded test.
+  cdl.await(1, TimeUnit.MINUTES);
+} catch (InterruptedException ie) {
+  Thread.currentThread().interrupt();
+  fail("interrupted");
+}
+  }
+
+  QueryRequest[] queryRequests = new QueryRequest[limit];
+  int numEndpointOne = 0;
+  int numEndpointTwo = 0; // see TODO below
+  for (int i = 0; i < limit; i++) {
+SolrRequest lastSolrReq = client.lastSolrRequests.get(i);

Re: [PR] SolrJ HTTP/2 Async API using CompletableFuture (update for 2024) [solr]

2024-04-11 Thread via GitHub


dsmiley commented on code in PR #2402:
URL: https://github.com/apache/solr/pull/2402#discussion_r1562016978


##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
##
@@ -417,67 +419,101 @@ public void send(OutStream outStream, SolrRequest 
req, String collection) thr
 outStream.flush();
   }
 
-  @SuppressWarnings("StaticAssignmentOfThrowable")
-  private static final Exception CANCELLED_EXCEPTION = new Exception();
-
-  private static final Cancellable FAILED_MAKING_REQUEST_CANCELLABLE = () -> 
{};
-
   @Override
   public Cancellable asyncRequest(
   SolrRequest solrRequest,
   String collection,
   AsyncListener> asyncListener) {
-MDCCopyHelper mdcCopyHelper = new MDCCopyHelper();
 
-Request req;
+asyncListener.onStart();
+CompletableFuture> cf =
+requestAsync(solrRequest, collection)
+.whenComplete(
+(nl, t) -> {
+  if (t != null) {
+asyncListener.onFailure(t);
+  } else {
+asyncListener.onSuccess(nl);
+  }
+});
+return () -> cf.cancel(true);
+  }
+
+  @Override
+  public CompletableFuture> requestAsync(
+  final SolrRequest solrRequest, String collection) {
+if (ClientUtils.shouldApplyDefaultCollection(collection, solrRequest)) {
+  collection = defaultCollection;
+}
+MDCCopyHelper mdcCopyHelper = new MDCCopyHelper();
+CompletableFuture> future = new CompletableFuture<>();
+final MakeRequestReturnValue mrrv;
+final String url;
 try {
-  String url = getRequestPath(solrRequest, collection);
-  InputStreamResponseListener listener =
-  new InputStreamReleaseTrackingResponseListener() {
-@Override
-public void onHeaders(Response response) {
-  super.onHeaders(response);
-  executor.execute(
-  () -> {
-InputStream is = getInputStream();
-try {
-  NamedList body =
-  processErrorsAndResponse(solrRequest, response, is, 
url);
-  mdcCopyHelper.onBegin(null);
-  log.debug("response processing success");
-  asyncListener.onSuccess(body);
-} catch (RemoteSolrException e) {
-  if (SolrException.getRootCause(e) != 
CANCELLED_EXCEPTION) {
+  url = getRequestPath(solrRequest, collection);
+  mrrv = makeRequest(solrRequest, url, true);
+} catch (SolrServerException | IOException e) {
+  future.completeExceptionally(e);
+  return future;
+}
+final ResponseParser parser =
+solrRequest.getResponseParser() == null ? this.parser : 
solrRequest.getResponseParser();
+mrrv.request
+.onRequestQueued(asyncTracker.queuedListener)
+.onComplete(asyncTracker.completeListener)
+.send(
+new InputStreamResponseListener() {
+  @Override
+  public void onHeaders(Response response) {
+super.onHeaders(response);
+InputStreamResponseListener listener = this;
+executor.execute(
+() -> {
+  InputStream is = listener.getInputStream();
+
+  // TODO: Original PR had this, but we do not close the 
stream.
+  // The legacy implementation did not track the input 
stream,
+  // or close it.
+  // assert ObjectReleaseTracker.track(is);
+
+  try {
+NamedList body =
+processErrorsAndResponse(solrRequest, response, 
is, url);
+mdcCopyHelper.onBegin(null);
+log.debug("response processing success");
+future.complete(body);
+  } catch (RemoteSolrException | SolrServerException e) {
 mdcCopyHelper.onBegin(null);
 log.debug("response processing failed", e);
-asyncListener.onFailure(e);
+future.completeExceptionally(e);
+  } finally {
+log.debug("response processing completed");

Review Comment:
   for consistency with 2 cases right above, these 2 lines should be reversed.



##
solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
##
@@ -247,7 +247,9 @@ private HttpClient createHttpClient(Builder builder) {
 this.authenticationStore = new AuthenticationStoreHolder();
 httpClient.setAuthenticationStore(this.authenticationStore);
 
-httpClient.setConnectTimeout(builder.connectionTimeoutMillis);
+if (builder.connectionTimeoutMillis != null) {

Review Comment:
   out-of-scope of this PR?



##

[jira] [Commented] (SOLR-10654) Expose Metrics in Prometheus format DIRECTLY from Solr

2024-04-11 Thread Matthew Biscocho (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836390#comment-17836390
 ] 

Matthew Biscocho commented on SOLR-10654:
-

Thanks for the info, [~dsmiley]! Actually came back to this recently since I 
wrote that last comment and I agree, parallel registry is way too much overhead 
and would probably make things much more difficult. Currently I have a lot of 
work written to transform the dropwizard registries to Prometheus with relevant 
tags for Solr! Looks pretty similar to the Prometheus exporter. Eager to post 
the first PR to get some feedback on my implementation. Hoping to get it out 
soon. 

> Expose Metrics in Prometheus format DIRECTLY from Solr
> --
>
> Key: SOLR-10654
> URL: https://issues.apache.org/jira/browse/SOLR-10654
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Keith Laban
>Priority: Major
> Attachments: prometheus_metrics.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Expose metrics via a `wt=prometheus` response type.
> Example scape_config in prometheus.yml:
> {code}
> scrape_configs:
>   - job_name: 'solr'
> metrics_path: '/solr/admin/metrics'
> params:
>   wt: ["prometheus"]
> static_configs:
>   - targets: ['localhost:8983']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-16866: Reorder nested directory deletions to avoid NoSuchFileException [solr]

2024-04-11 Thread via GitHub


dsmiley commented on code in PR #2349:
URL: https://github.com/apache/solr/pull/2349#discussion_r1561834394


##
solr/core/src/java/org/apache/solr/core/CachingDirectoryFactory.java:
##
@@ -302,6 +294,53 @@ private boolean closeCacheValue(CacheValue cacheValue) {
 return cl;
   }
 
+  private static Iterable sorted(Set vals) {
+if (vals.size() < 2) {
+  return vals;
+}
+// here we reverse-sort entries by path, in order to trivially ensure that
+// subpaths are removed before parent paths.
+CacheValue[] ret = vals.toArray(new CacheValue[0]);
+Arrays.sort(ret, (a, b) -> b.path.compareTo(a.path));
+return Arrays.asList(ret);

Review Comment:
   minor: this would be nicer as a single statement using Java streams



##
solr/core/src/java/org/apache/solr/core/CachingDirectoryFactory.java:
##
@@ -70,13 +71,9 @@ public CacheValue(String path, Directory directory) {
 public boolean closeCacheValueCalled = false;
 public boolean doneWithDir = false;
 private boolean deleteAfterCoreClose = false;
-public Set removeEntries = new HashSet<>();

Review Comment:
   It's nice to see one less data structure to track



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-16866: Deleting a non-existent directory should not fail [solr]

2024-04-11 Thread via GitHub


dsmiley commented on PR #2336:
URL: https://github.com/apache/solr/pull/2336#issuecomment-2050682221

   Although I haven't looked at this in as much detail as others, I find 
Michael's arguments persuasive and would prefer we do #2349 instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Update GC logging options for Zookeeper [solr]

2024-04-11 Thread via GitHub


dsmiley merged PR #2364:
URL: https://github.com/apache/solr/pull/2364


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Update GC logging options for Zookeeper [solr]

2024-04-11 Thread via GitHub


dsmiley commented on PR #2364:
URL: https://github.com/apache/solr/pull/2364#issuecomment-2050664409

   Thanks Radu!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17158 Terminate distributed processing quickly when query limit is reached - Initial impl [solr]

2024-04-11 Thread via GitHub


dsmiley commented on code in PR #2379:
URL: https://github.com/apache/solr/pull/2379#discussion_r1561815614


##
solr/core/src/java/org/apache/solr/core/SolrXmlConfig.java:
##
@@ -384,6 +384,9 @@ private static NodeConfig 
fillSolrSection(NodeConfig.NodeConfigBuilder builder,
 log.warn("solr.xml transientCacheSize -- transient cores is 
deprecated");
 builder.setTransientCacheSize(it.intVal(-1));
 break;
+  case "allowPartialResultsDefault":
+builder.setAllowPartialResultsDefault(it.boolVal(true));
+break;

Review Comment:
   I would prefer you not bother with all the machinery in solr.xml/NodeConfig 
and instead have a simple EnvUtils (env or sys prop) to enable.  It's just a 
boolean; doesn't have any interesting config to it.  Yes there are other 
booleans here but I think we Solr maintainers should consider how NodeConfig 
might systematically / dynamically understand primitive values (boolean, 
integer, ...) and apply to EnvUtils automatically without having to touch the 
NodeConfig class (which is kind of a pain; that builder!).  For example imagine 
"solr.search.partialResults" being settable as a system property, or also 
settable in solr.xml automatically via "searchPartialResults".
   
   CC @janhoy 
   
   Another take on this is that, shouldn't the user simply go to their 
solrconfig.xml and set a default partialResults like we all do for 
miscellaneous things for search?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-10654) Expose Metrics in Prometheus format DIRECTLY from Solr

2024-04-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836368#comment-17836368
 ] 

David Smiley commented on SOLR-10654:
-

Cool to see [your PR|https://github.com/apache/solr/pull/2375] Mathew :-)
You didn't title that PR starting with this JIRA issue, so it didn't auto-link. 
 Also it's good to comment in the JIRA issue about a PR any way since there is 
no notification whatsoever to those watching this JIRA issue even when the link 
is auto-added.

Disclaimer:  I haven't been looking closely at metrics lately.
A parallel registry seems painful -- overhead and the synchronization risks.
Moving to Micrometer -- do you think it would affect most metrics publishers in 
Solr (thus touch tons of source files) or only the metrics internals/plumbing?  
Either way, probably for Solr 10 if we go this way.
Maybe there could be a hard-coded algorithmic approach that can convert the raw 
name to a tagged/labelled one metric?

> Expose Metrics in Prometheus format DIRECTLY from Solr
> --
>
> Key: SOLR-10654
> URL: https://issues.apache.org/jira/browse/SOLR-10654
> Project: Solr
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Keith Laban
>Priority: Major
> Attachments: prometheus_metrics.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Expose metrics via a `wt=prometheus` response type.
> Example scape_config in prometheus.yml:
> {code}
> scrape_configs:
>   - job_name: 'solr'
> metrics_path: '/solr/admin/metrics'
> params:
>   wt: ["prometheus"]
> static_configs:
>   - targets: ['localhost:8983']
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17194: CloudHttp2SolrClient doesn't properly use httpClient Builder on certain paths [solr]

2024-04-11 Thread via GitHub


dsmiley commented on code in PR #2378:
URL: https://github.com/apache/solr/pull/2378#discussion_r1561775883


##
solr/core/src/test/org/apache/solr/cli/TestExportTool.java:
##
@@ -218,6 +231,66 @@ public void testVeryLargeCluster() throws Exception {
 }
   }
 
+  @Test
+  public void testWithBasicAuth() throws Exception {
+String COLLECTION_NAME = "secureCollection";
+String USER = "solr";
+String PASS = "SolrRocksAgain";
+final String SECURITY_JSON =
+Utils.toJSONString(
+Map.of(
+"authorization",
+Map.of(
+"class",
+RuleBasedAuthorizationPlugin.class.getName(),
+"user-role",
+singletonMap(USER, "admin"),
+"permissions",
+singletonList(Map.of("name", "all", "role", "admin"))),
+"authentication",
+Map.of(
+"class",
+BasicAuthPlugin.class.getName(),
+"blockUnknown",
+true,
+"credentials",
+singletonMap(USER, getSaltedHashedValue(PASS);
+
+configureCluster(2)
+.addConfig("conf", configset("cloud-minimal"))
+.withSecurityJson(SECURITY_JSON)
+.configure();
+
+CloudSolrClient cloudSolrClient = cluster.getSolrClient();
+
+CollectionAdminRequest.createCollection(COLLECTION_NAME, "conf", 2, 1)
+.setBasicAuthCredentials(USER, PASS)
+.process(cluster.getSolrClient());
+cluster.waitForActiveCollection(COLLECTION_NAME, 8, 8);

Review Comment:
   You won't get 8 replicas anymore; you changed this.  The test probably fails 
now.  I hate `waitForActiveCollection`; the collection is already active when 
createCollection finishes.  The risk this method mitigates is that the 
SolrClient returned by `cluster.getSolrClient` doesn't know it yet, but this 
test doesn't use the client henceforth (unlike most tests).



##
solr/core/src/test/org/apache/solr/cli/TestExportTool.java:
##
@@ -218,6 +231,66 @@ public void testVeryLargeCluster() throws Exception {
 }
   }
 
+  @Test
+  public void testWithBasicAuth() throws Exception {
+String COLLECTION_NAME = "secureCollection";
+String USER = "solr";
+String PASS = "SolrRocksAgain";
+final String SECURITY_JSON =
+Utils.toJSONString(
+Map.of(
+"authorization",
+Map.of(
+"class",
+RuleBasedAuthorizationPlugin.class.getName(),
+"user-role",
+singletonMap(USER, "admin"),
+"permissions",
+singletonList(Map.of("name", "all", "role", "admin"))),
+"authentication",
+Map.of(
+"class",
+BasicAuthPlugin.class.getName(),
+"blockUnknown",
+true,
+"credentials",
+singletonMap(USER, getSaltedHashedValue(PASS);
+
+configureCluster(2)
+.addConfig("conf", configset("cloud-minimal"))
+.withSecurityJson(SECURITY_JSON)
+.configure();
+
+CloudSolrClient cloudSolrClient = cluster.getSolrClient();
+
+CollectionAdminRequest.createCollection(COLLECTION_NAME, "conf", 2, 1)

Review Comment:
   Two shards doesn't seem pertinent to what is being tested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17194: CloudHttp2SolrClient doesn't properly use httpClient Builder on certain paths [solr]

2024-04-11 Thread via GitHub


dsmiley commented on code in PR #2378:
URL: https://github.com/apache/solr/pull/2378#discussion_r1561769753


##
solr/core/src/test/org/apache/solr/cli/TestExportTool.java:
##
@@ -218,6 +231,66 @@ public void testVeryLargeCluster() throws Exception {
 }
   }
 
+  @Test
+  public void testWithBasicAuth() throws Exception {
+String COLLECTION_NAME = "secureCollection";
+String USER = "solr";
+String PASS = "SolrRocksAgain";
+final String SECURITY_JSON =
+Utils.toJSONString(
+Map.of(
+"authorization",
+Map.of(
+"class",
+RuleBasedAuthorizationPlugin.class.getName(),
+"user-role",
+singletonMap(USER, "admin"),
+"permissions",
+singletonList(Map.of("name", "all", "role", "admin"))),
+"authentication",
+Map.of(
+"class",
+BasicAuthPlugin.class.getName(),
+"blockUnknown",
+true,
+"credentials",
+singletonMap(USER, getSaltedHashedValue(PASS);
+
+configureCluster(2)

Review Comment:
   Can the behavior be tested with one node instead of two?  Authentication 
seems unrelated to the size of the cluster.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Commented] (SOLR-8393) Component for Solr resource usage planning

2024-04-11 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836360#comment-17836360
 ] 

David Smiley commented on SOLR-8393:


What about Solr's metrics API; has that been explored as a solution to the 
problem/need?  We can consider adding more metrics.

I don't see why a query should return core metrics.

> Component for Solr resource usage planning
> --
>
> Key: SOLR-8393
> URL: https://issues.apache.org/jira/browse/SOLR-8393
> Project: Solr
>  Issue Type: Improvement
>Reporter: Steve Molloy
>Priority: Major
> Attachments: SOLR-8393-1.patch, SOLR-8393.patch, SOLR-8393.patch, 
> SOLR-8393.patch, SOLR-8393.patch, SOLR-8393.patch, SOLR-8393.patch, 
> SOLR-8393.patch, SOLR-8393.patch, SOLR-8393.patch, SOLR-8393_tag_7.5.0.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> One question that keeps coming back is how much disk and RAM do I need to run 
> Solr. The most common response is that it highly depends on your data. While 
> true, it makes for frustrated users trying to plan their deployments. 
> The idea I'm bringing is to create a new component that will attempt to 
> extrapolate resources needed in the future by looking at resources currently 
> used. By adding a parameter for the target number of documents, current 
> resources are adapted by a ratio relative to current number of documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-8393: Component for resource usage planning [solr]

2024-04-11 Thread via GitHub


igiguere commented on PR #1638:
URL: https://github.com/apache/solr/pull/1638#issuecomment-2050511702

   @gerlowskija
   > Does `size-estimator-lucene-solr.xls` actually work for folks? Do you use 
it regularly @igiguere ? Have you found it to be pretty accurate? Any other 
folks have experience with it?
   
   Me, personally, no, I don't use it ;).  I'll try to find out from 
client-facing people in the company.  I doubt anyone has compiled usage vs 
success statistics.

   > ... the community's only response to sizing questions has always been 
pretty much "You'll have to Guess-and-Check".
   
   The cluster sizing feature is documented to estimate (i.e.: guess) resource 
usage.  We could make the documentation clearer that it's not a fool-proof 
measure.  But, at least it beats holding a finger to the wind.  And it's a bit 
less complicated that the xls and a calculator.
   
   > And secondarily, because the spreadsheet this is all based off of was 
added in 2011 and hasn't really seen much iteration in the decade since. 
There's an absolute ton that's changed in both Lucene and Solr since then.
   
   We're only calculating RAM, disk size, document size.  Whatever has changed 
in Solr and Lucene, if it has an effect on RAM, disk space, doc size, then it 
should be reflected on the results... No?
   
   Note that this feature is meant to be used on a current "staging" 
deployment, to evaluate the eventual size of a "production" environment, for 
the same version of Solr.  No one is expected to draw conclusions from a 
previous version, so changes from one version to another are not a concern in 
that way.
   
   As a more general note, I should add that I'm a linguist converted to Java 
dev.  Not a mathematician ;)  If there's an error in the math, I will never see 
it.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[PR] SolrJ HTTP/2 Async API using CompletableFuture (update for 2024) [solr]

2024-04-11 Thread via GitHub


jdyer1 opened a new pull request, #2402:
URL: https://github.com/apache/solr/pull/2402

   This adds a standard `CompletableFuture`-based API to `Http2SolrClient`, 
`HttpJdkSolrClient` and `LBHttp2SolrClient`.  The existing methods `asyncReq` 
are marked `@deprecated` and defer to the new API methods.  Also deprecated are 
interfaces `AsyncListener` and `Cancellable`.  Uses within Solr of these 
deprecated methods/interfaces are migrated to the new API.  Also test coverage 
for these async methods is improved.
   
   This is based loosely on [PR 
1770](https://github.com/apache/lucene-solr/pull/1770).  This PR does not add 
async methods to `CloudHttp2SolrClient`, neither does it attempt to unify the 
async API between `LBHttp2SolrClient` and  `Http2SolrClient`.  
   
   There are several TODOs around `LBHttpSolrClient` & `LBHttp2SolrClient` to 
investigate and fix possible bugs and improve test coverage.  These TODOs are 
beyond the scope of this PR.
   
   Once merged, we should follow-up with a separate PR to remove the deprecated 
methods and interfaces in main/Solr10 only.
   
   cc: [rishisankar](https://github.com/rishisankar)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-8393: Component for resource usage planning [solr]

2024-04-11 Thread via GitHub


gerlowskija commented on PR #1638:
URL: https://github.com/apache/solr/pull/1638#issuecomment-2050163434

   About to take a look at the code and see if I can help with the v2 side of 
things, but before I dive into that I figured it was worth asking:
   
   Does `size-estimator-lucene-solr.xls` actually work for folks?  Do you use 
it regularly @igiguere ?  Have you found it to be pretty accurate?  Any other 
folks have experience with it?
   
   I'm happy to be wrong if we have several groups of folks out there in the 
wild that are using it, but my initial reaction is to be a little skeptical 
that it's reliable enough to incorporate into Solr.
   
   Primarily because, well, modeling resource-usage is a really really tough 
problem.  There's a reason that the community's only response to sizing 
questions has always been pretty much "You'll have to Guess-and-Check".
   
   And secondarily, because the spreadsheet this is all based off of was added 
in 2011 and hasn't really seen much iteration in the decade since.  There's an 
absolute ton that's changed in both Lucene and Solr since then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]

2024-04-11 Thread via GitHub


dsmiley commented on PR #2395:
URL: https://github.com/apache/solr/pull/2395#issuecomment-2050095834

   I don't like the complexity in this URP relating to tolerance of where the 
URP is placed in the chain; I'd feel better if the URP were simplified from 
that concern and we expect the user to place it at an appropriate spot.   We 
don't have complexity like that elsewhere and/or I argue it's not the right 
approach.
   
   I sympathize with why an URP feels right.  Okay.  On each `addDoc`, don't we 
just need to do the check on the current SolrIndexSearcher but remember who the 
searcher was so that the next `addDoc` can see it's the same searcher and if so 
don't do the check?  It'd need to cache the searcher reference for the instance 
equality; use a `WeakReference` so we allow the searcher to close & GC.  If we 
do this, we don't need a commit event listener, thus don't need an additional 
component / confusing interaction complexity.  Don't get the SolrIndexSearcher 
from the request (we don't want a per-request cache instance), get it from the 
SolrCore so we see a possibly evolving SolrIndexSearcher.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17232: PropertiesOutputStream overrides write(byte[], int, int). [solr]

2024-04-11 Thread via GitHub


epugh commented on code in PR #2400:
URL: https://github.com/apache/solr/pull/2400#discussion_r1561301167


##
solr/core/src/java/org/apache/solr/util/IndexOutputOutputStream.java:
##
@@ -20,11 +20,12 @@
 import java.io.OutputStream;
 import org.apache.lucene.store.IndexOutput;
 
-public class PropertiesOutputStream extends OutputStream {
+/** Wraps an {@link IndexOutput} to expose it as an {@link OutputStream}. */

Review Comment:
   thank you!   Helps me understand what it's for (and why the double "Output" 
in the name!)...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17232: PropertiesOutputStream overrides write(byte[], int, int). [solr]

2024-04-11 Thread via GitHub


bruno-roustant commented on PR #2400:
URL: https://github.com/apache/solr/pull/2400#issuecomment-2050033287

   Renamed. It was used only in two internal locations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17232: PropertiesOutputStream overrides write(byte[], int, int). [solr]

2024-04-11 Thread via GitHub


epugh commented on PR #2400:
URL: https://github.com/apache/solr/pull/2400#issuecomment-2050014227

   The name "IndexOutputOutputStream", while slightly funny, is fine, with just 
a bit of docs to highlight what you said "wrap an IndexOutput to expose it as 
an OutputStream" would have helped me grok it!   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17232: PropertiesOutputStream overrides write(byte[], int, int). [solr]

2024-04-11 Thread via GitHub


dsmiley commented on PR #2400:
URL: https://github.com/apache/solr/pull/2400#issuecomment-2049990316

   Lets rename this in main; separate no-JIRA PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17232: PropertiesOutputStream overrides write(byte[], int, int). [solr]

2024-04-11 Thread via GitHub


bruno-roustant commented on PR #2400:
URL: https://github.com/apache/solr/pull/2400#issuecomment-2049987103

   I think PropertiesOutputStream could be renamed IndexOutputOutputStream 
because it is used to wrap an IndexOutput to expose it as an OutputStream.
   I can add some JavaDoc. Do you agree to rename it to IndexOutputOutputStream?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17093: Collection restore API returns requestid when executed asynchronously [solr]

2024-04-11 Thread via GitHub


mariemat commented on code in PR #2111:
URL: https://github.com/apache/solr/pull/2111#discussion_r1561202416


##
solr/core/src/java/org/apache/solr/handler/admin/api/RestoreCollectionAPI.java:
##
@@ -144,10 +144,14 @@ public SubResponseAccumulatingJerseyResponse 
restoreCollection(
   throw remoteResponse.getException();
 }
 
+if (requestBody.async != null) {
+  response.requestId = requestBody.async;
+  return response;

Review Comment:
   I am curious to know if there is any reason to return here and skip the 2 
lines below?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]

2024-04-11 Thread via GitHub


gerlowskija commented on PR #2395:
URL: https://github.com/apache/solr/pull/2395#issuecomment-2049901831

   > Sorry if I rained on your parade, attempting to get this in by EOW.
   
   Oh no worries; there's no particular rush.  That was just how long I was 
planning to wait if no one commented in the interim.  I'm glad you both chimed 
in! 
   
   Though tbh I can't quite tell where you are on the PR approach overall:
   
   1. Are you '-1' this being an URP at all?
   2. Are you OK with this being an URP, with the minor tweaks you mentioned 
(e.g. documenting that it goes _after_ DUP, etc.)?
   3. Are you only OK with this being an URP, but only if we make some 
improvements to the URP framework (e.g. to ensure certain URPs always follow 
DUP)
   4. ...something else?
   
   Please let me know!
   
   One last reply inline:
   
   > To block all indexing, it could set SolrCore.readOnly = true and be done 
with it
   
   IMO `SolrCore.readOnly` is too blunt a tool for the "field limiting" this PR 
attempts.  First because `readOnly` would block deletions and operations that 
we'd want to allow.  And second because a user blocked by 'readOnly' would have 
their `/update` requests fail with a message that's pretty inscrutable or at 
least unrelated to the root cause.
   
   That's what I like about the URP approach - it gives us more granularity 
into the different types of "/update" operations and allows us to get a nice 
(or at least, nicer) error message back to users.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-13350: Multithreaded search [solr]

2024-04-11 Thread via GitHub


dsmiley commented on PR #2248:
URL: https://github.com/apache/solr/pull/2248#issuecomment-2049895628

   I figure that's somewhat pseudocode... so I don't want to over-critique that 
but it should have the XOR of the boundary long in-between each `arraycopy`.  
In practice you will likely loop the FixedBitSets, skipping the first in order 
to XOR the last long of the previous FixedBitSet with the first long of the 
current one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17232: PropertiesOutputStream overrides write(byte[], int, int). [solr]

2024-04-11 Thread via GitHub


epugh commented on PR #2400:
URL: https://github.com/apache/solr/pull/2400#issuecomment-2049846868

   Some JavaDocs on this class would be nice..   I looked at it, and I still 
don't quite grok why it is needed for a properties file?   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] remove stray backticks in solrcloud-distributed-requests.adoc [solr]

2024-04-11 Thread via GitHub


cpoerschke merged PR #2401:
URL: https://github.com/apache/solr/pull/2401


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-13350: Multithreaded search [solr]

2024-04-11 Thread via GitHub


cpoerschke commented on PR #2248:
URL: https://github.com/apache/solr/pull/2248#issuecomment-2049819054

   > Yes I mean _the_ FixedBitSet (otherwise I just would have said a bit set). 
No I don't know of similar code doing this, and I don't expect it's in 
Lucene/Solr already. Lucene is very segment oriented so there's no need for an 
entire index bit set view. Solr has forever clung to index level bit sets, for 
better/worse (I think worse). I'm tempted to help but not sure I have the time. 
It's not a blocker any way; the code can be improved.
   
   Okay, I've got the basic outline of the idea I think:
   
   ```
   final FixedBitSet bitsetA = new FixedBitSet(numBits);
   bitsetA.set(...);
   
   final int skipWords = 8;
   final int skipBits = skipWords * 64;
   
   final FixedBitSet bitsetB = new FixedBitSet(numBits - skipBits);
   bitsetB.set(... - skipBits);
   
   final long[] storedBits = new long[FixedBitSet.bits2words(numBits)];
   System.arraycopy(bitsetA.getBits(), 0, storedBits, 0, 
bitsetA.getBits().length);
   System.arraycopy(bitsetB.getBits(), 0, storedBits, skipWords, 
bitsetB.getBits().length);
   
   final FixedBitSet bitsetC = new FixedBitSet(storedBits, numBits);
   ```
   
   And then as a next step I'll have a go at applying that to the collecting 
code here on this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]

2024-04-11 Thread via GitHub


gerlowskija commented on code in PR #2395:
URL: https://github.com/apache/solr/pull/2395#discussion_r1560999237


##
solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessor.java:
##
@@ -0,0 +1,140 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.update.processor;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.Locale;
+import org.apache.solr.cloud.CloudDescriptor;
+import org.apache.solr.cloud.ZkController;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.cloud.Replica;
+import org.apache.solr.common.cloud.Slice;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.core.CoreDescriptor;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.update.AddUpdateCommand;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class NumFieldLimitingUpdateRequestProcessor extends 
UpdateRequestProcessor {
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  private SolrQueryRequest req;
+  private int fieldThreshold;
+  private int currentNumFields;
+  private boolean warnOnly;
+
+  public NumFieldLimitingUpdateRequestProcessor(
+  SolrQueryRequest req,
+  UpdateRequestProcessor next,
+  int fieldThreshold,
+  int currentNumFields,
+  boolean warnOnly) {
+super(next);
+this.req = req;
+this.fieldThreshold = fieldThreshold;
+this.currentNumFields = currentNumFields;
+this.warnOnly = warnOnly;
+  }
+
+  public void processAdd(AddUpdateCommand cmd) throws IOException {
+if (!isCloudMode() || /* implicit isCloudMode==true */ 
isLeaderThatOwnsTheDoc(cmd)) {
+  if (coreExceedsFieldLimit()) {
+throwExceptionOrLog(cmd.getPrintableId());
+  } else {
+if (log.isDebugEnabled()) {
+  log.debug(
+  "Allowing document {}, since current core is under the max-field 
limit (numFields={}, maxThreshold={})",

Review Comment:
   Done  



##
solr/core/src/java/org/apache/solr/update/processor/NumFieldsMonitor.java:
##
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.update.processor;
+
+import org.apache.solr.core.AbstractSolrEventListener;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.search.SolrIndexSearcher;
+import org.apache.solr.util.RefCounted;
+
+public class NumFieldsMonitor extends AbstractSolrEventListener {
+
+  private int numFields;
+
+  public NumFieldsMonitor(SolrCore core) {
+super(core);
+this.numFields = -1;
+  }
+
+  @Override
+  public void postCommit() {
+RefCounted indexSearcher = null;
+try {
+  indexSearcher = getCore().getSearcher();
+  // Get the number of fields directly from the IndexReader instead of the 
Schema object to also
+  // include the

Review Comment:
   > formatting?
   
   Done  
   
   > are there any other interesting places in Solr to use a NumFieldsMonitor?
   
   Good question.  It does seem like a kindof neat thing to have around, but I 
can't think of any other use-cases off the top of my head...



##
solr/core/src/test/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactoryTest.java:
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license 

Re: [PR] SOLR-17192: Add "field-limiting" URP to catch ill-designed schemas [solr]

2024-04-11 Thread via GitHub


gerlowskija commented on code in PR #2395:
URL: https://github.com/apache/solr/pull/2395#discussion_r1561009284


##
solr/core/src/java/org/apache/solr/update/processor/NumFieldLimitingUpdateRequestProcessorFactory.java:
##
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.update.processor;
+
+import org.apache.solr.common.util.NamedList;
+import org.apache.solr.core.SolrCore;
+import org.apache.solr.request.SolrQueryRequest;
+import org.apache.solr.response.SolrQueryResponse;
+import org.apache.solr.search.SolrIndexSearcher;
+import org.apache.solr.util.plugin.SolrCoreAware;
+
+/**
+ * This factory generates an UpdateRequestProcessor which fails update 
requests once a core has
+ * exceeded a configurable maximum number of fields. Meant as a safeguard to 
help users notice
+ * potentially-dangerous schema design before performance and stability 
problems start to occur.
+ *
+ * The URP uses the core's {@link SolrIndexSearcher} to judge the current 
number of fields.
+ * Accordingly, it undercounts the number of fields in the core - missing all 
fields added since the
+ * previous searcher was opened. As such, the URP's request-blocking is "best 
effort" - it cannot be
+ * relied on as a precise limit on the number of fields.
+ *
+ * Additionally, the field-counting includes all documents present in the 
index, including any
+ * deleted docs that haven't yet been purged via segment merging. Note that 
this can differ
+ * significantly from the number of fields defined in managed-schema.xml - 
especially when dynamic
+ * fields are enabled. The only way to reduce this field count is to delete 
documents and wait until
+ * the deleted documents have been removed by segment merges. Users may of 
course speed up this
+ * process by tweaking Solr's segment-merging, triggering an "optimize" 
operation, etc.
+ *
+ * {@link NumFieldLimitingUpdateRequestProcessorFactory} accepts two 
configuration parameters:
+ *
+ * 
+ *   maxFields - (required) The maximum number of fields 
before update requests
+ *   should be aborted. Once this limit has been exceeded, additional 
update requests will fail
+ *   until fields have been removed or the "maxFields" is increased.
+ *   warnOnly - (optional) If true then the URP 
logs verbose warnings
+ *   about the limit being exceeded but doesn't abort update requests. 
Defaults to false
+ *if not specified
+ * 
+ *
+ * @since 9.6.0
+ */
+public class NumFieldLimitingUpdateRequestProcessorFactory extends 
UpdateRequestProcessorFactory
+implements SolrCoreAware {
+
+  private static final String MAXIMUM_FIELDS_PARAM = "maxFields";
+  private static final String WARN_ONLY_PARAM = "warnOnly";
+
+  private NumFieldsMonitor numFieldsMonitor;
+  private int maximumFields;
+  private boolean warnOnly;
+
+  @Override
+  public void inform(final SolrCore core) {
+// register a commit callback for monitoring the number of fields in the 
schema
+numFieldsMonitor = new NumFieldsMonitor(core);
+core.getUpdateHandler().registerCommitCallback(numFieldsMonitor);
+core.registerNewSearcherListener(numFieldsMonitor);
+  }
+
+  public void init(NamedList args) {
+warnOnly = args.indexOf(WARN_ONLY_PARAM, 0) > 0 ? 
args.getBooleanArg(WARN_ONLY_PARAM) : false;
+
+if (args.indexOf(MAXIMUM_FIELDS_PARAM, 0) < 0) {
+  throw new IllegalArgumentException(
+  "The "
+  + MAXIMUM_FIELDS_PARAM
+  + " parameter is required for "
+  + getClass().getName()
+  + ", but no value was provided.");
+}
+final Object rawMaxFields = args.get(MAXIMUM_FIELDS_PARAM);

Review Comment:
   Agreed - it's really unfortunate.  I looked at a number of URP's and 
NamedListInitializedPlugin's, but couldn't find anything I liked.
   
   Maybe we could file a ticket for that separately; it'd be a really nice 
cleanup for all our NLIP's to use a set of utilities like:
   
   ```
   int getRequiredNamedListValueAsInteger(NamedList nl, String paramName);
   Integer getNamedListValueAsInteger(NamedList nl, String paramName);
   boolean getRequiredNamedListValueAsBoolean(NamedList nl, String paramName);
   

Re: [PR] Fix/solr 17018 branch 9 1 [solr]

2024-04-11 Thread via GitHub


alessandrobenedetti commented on PR #2358:
URL: https://github.com/apache/solr/pull/2358#issuecomment-2049618343

   > I glossed over this while simultaneously glossing over the branch_9x 
commit to see that you when to the same source files and made similar-ish 
looking changes. You did. In the branch_9x test I see 
`ltr_expensiveFeatureRescoringAndPartialResultsNotTolerated_shouldRaiseException`
 but you didn't port that test specifically (albeit did the others). Why not?
   
   it seemed to me that the possibility of not tolerating partial results via a 
dedicated parameter was introduced later on (as part of the query limit work). 
As far as I verified it was not possible in 9.1?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Fix/solr 17018 branch 9 1 [solr]

2024-04-11 Thread via GitHub


dsmiley commented on PR #2358:
URL: https://github.com/apache/solr/pull/2358#issuecomment-2049611339

   I glossed over this while simultaneously glossing over the branch_9x commit 
to see that you when to the same source files and made similar-ish looking 
changes.  You did.  In the branch_9x test I see 
`ltr_expensiveFeatureRescoringAndPartialResultsNotTolerated_shouldRaiseException`
 but you didn't port that test specifically (albeit did the others).  Why not?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Update dependency commons-codec:commons-codec to v1.16.1 [solr]

2024-04-11 Thread via GitHub


cpoerschke commented on PR #2266:
URL: https://github.com/apache/solr/pull/2266#issuecomment-2049418174

   > ... We have #2049 pending to upgrade to the latter.
   
   That's merged now. So tentatively requested rebase and will tentatively mark 
as ready-to-review then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Update org.apache.logging.log4j:* to v2.23.1 [solr]

2024-04-11 Thread via GitHub


cpoerschke commented on PR #2047:
URL: https://github.com/apache/solr/pull/2047#issuecomment-2049412817

   Similar to #1821 i.e. blocked on JDK 17+ -- if I find a third solrbot PR 
like it I'll create a JIRA and then close out these in favour of that.
   
   https://github.com/apache/solr/pull/1821#issuecomment-2049409176


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Update dependency com.adobe.testing:s3mock-junit4 to v3 [solr]

2024-04-11 Thread via GitHub


cpoerschke commented on PR #1821:
URL: https://github.com/apache/solr/pull/1821#issuecomment-2049409176

   Wondering if we'd like to
   * close this,
   * create a JIRA for manual upgrade later and
   * exclude from solrbot trying again
   * (and then the exclusion would be removed as part of the JIRA)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[PR] remove stray backticks in solrcloud-distributed-requests.adoc [solr]

2024-04-11 Thread via GitHub


cpoerschke opened a new pull request, #2401:
URL: https://github.com/apache/solr/pull/2401

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



Re: [PR] Fix/solr 17018 branch 9 1 [solr]

2024-04-11 Thread via GitHub


alessandrobenedetti commented on PR #2358:
URL: https://github.com/apache/solr/pull/2358#issuecomment-2049310875

   Adding @dsmiley as a reviewer as I noticed some work in the same area back 
then: 
   https://github.com/apache/lucene-solr/pull/1726/files
   
   The rationale behind this additional porting is the fact that a client of 
mine agreed to sponsor this contribution (already merged in 10.0 and 9.6: 
https://github.com/apache/solr/pull/2348) only if also backported to 9.1.x (I 
mentioned that a few weeks ago in the Solr Slack).
   The idea is to follow up once merged in 9.1.x with 9.1.2 release including 
this bugfix.
   
   In terms of the contribution, this porting required a bit of additional time 
as the mechanism for query limits is massively different from 9.x and 10.x, 
specifically:
   
   1) I originally thought I could leverage the Lucene TimeLimitingCollector 
(and related TimeExceededException). But after some investigations and 
debugging I realised this is only valid for a first-stage retrieval and 
scoring, and it's not usable in the reranking phase easily.
   
   2) Then I thought of using the 
ExitableDirectoryReader.ExitingReaderException mechanism, but following this 
approach, the solr document result list is emptied here: 

org/apache/solr/handler/component/SearchHandler.java:440 when returning partial 
results. 
   Changing those parts to align with the expectations here, looked invasive 
and didn't like the outcome
   
   So I ended up with this solution, building on top of the 
SolrQueryTimeoutImpl from @dsmiley  past contribution.
   It's decently clean and allowed me to mimic the same approach we have in 
10.x and 9.x.
   
   Happy to gather some feedback, and unless any concern I'll proceed in a week 
to the merge on 9.1.x.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[PR] SOLR-17232: PropertiesOutputStream overrides write(byte[], int, int). [solr]

2024-04-11 Thread via GitHub


bruno-roustant opened a new pull request, #2400:
URL: https://github.com/apache/solr/pull/2400

   https://issues.apache.org/jira/browse/SOLR-17232


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org



[jira] [Created] (SOLR-17232) PropertiesOutputStream does not override write(byte[], int, int)

2024-04-11 Thread Bruno Roustant (Jira)
Bruno Roustant created SOLR-17232:
-

 Summary: PropertiesOutputStream does not override write(byte[], 
int, int)
 Key: SOLR-17232
 URL: https://issues.apache.org/jira/browse/SOLR-17232
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Bruno Roustant


PropertiesOutputStream implements only write(int) but does not override 
write(byte[], int, int), which makes it inefficient to write larger content.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org