[jira] [Comment Edited] (SOLR-11475) Endless loop and OOM in PeerSync
[ https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264959#comment-16264959 ] Pushkar Raste edited comment on SOLR-11475 at 11/24/17 5:55 PM: Here is a working patch. I came up completely hypothetical scenario where one replica has a version *ver* and other has version *-version*. [~shalinmangar] / [~noble.paul] / [~ichattopadhyaya] Can you please take look at the patch. I still don't understand how can one get into this scenario but robust check wouldn't hurt. was (Author: praste): Here is a working patch. I came up completely hypothetical scenario where one replica has a version *ver* and other has version *-version*. [~shalinmangar] / [~noble.paul] / [~ichattopadhyaya] Can you please take look at the patch. I still don't understand how can one into get this scenario but robust check wouldn't hurt. > Endless loop and OOM in PeerSync > > > Key: SOLR-11475 > URL: https://issues.apache.org/jira/browse/SOLR-11475 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrey Kudryavtsev > Attachments: SOLR-11475.patch > > > After problem described in SOLR-11459, I restarted cluster and got OOM on > start. > [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539] > contains this logic: > {code} > while (otherUpdatesIndex >= 0) { > // we have run out of ourUpdates, pick up all the remaining versions > from the other versions > if (ourUpdatesIndex < 0) { > String range = otherVersions.get(otherUpdatesIndex) + "..." + > otherVersions.get(0); > rangesToRequest.add(range); > totalRequestedVersions += otherUpdatesIndex + 1; > break; > } > // stop when the entries get old enough that reorders may lead us to > see updates we don't need > if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < > ourLowThreshold) break; > if (ourUpdates.get(ourUpdatesIndex).longValue() == > otherVersions.get(otherUpdatesIndex).longValue()) { > ourUpdatesIndex--; > otherUpdatesIndex--; > } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < > Math.abs(otherVersions.get(otherUpdatesIndex))) { > ourUpdatesIndex--; > } else { > long rangeStart = otherVersions.get(otherUpdatesIndex); > while ((otherUpdatesIndex < otherVersions.size()) > && (Math.abs(otherVersions.get(otherUpdatesIndex)) < > Math.abs(ourUpdates.get(ourUpdatesIndex { > otherUpdatesIndex--; > totalRequestedVersions++; > } > // construct range here > rangesToRequest.add(rangeStart + "..." + > otherVersions.get(otherUpdatesIndex + 1)); > } > } > {code} > If at some point there will be > {code} ourUpdates.get(ourUpdatesIndex) = > -otherVersions.get(otherUpdatesIndex) {code} > loop will never end. It will add same string again and again into > {{rangesToRequest}} until process runs out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11475) Endless loop and OOM in PeerSync
[ https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264959#comment-16264959 ] Pushkar Raste edited comment on SOLR-11475 at 11/24/17 6:03 AM: Here is a working patch. I came up completely hypothetical scenario where one replica has a version *ver* and other has version *-version*. [~shalinmangar] / [~noble.paul] / [~ichattopadhyaya] Can you please take look at the patch. I still don't understand how can one into get this scenario but robust check wouldn't hurt. was (Author: praste): Here is a working patch. I came up completely hypothetical scenario where one replica has a version *ver* and other has version *-version*. [~shalinmangar] / [~noble.paul] Can you please take look at the patch. I still don't understand how can one into get this scenario but robust check wouldn't hurt. > Endless loop and OOM in PeerSync > > > Key: SOLR-11475 > URL: https://issues.apache.org/jira/browse/SOLR-11475 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrey Kudryavtsev > Attachments: SOLR-11475.patch > > > After problem described in SOLR-11459, I restarted cluster and got OOM on > start. > [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539] > contains this logic: > {code} > while (otherUpdatesIndex >= 0) { > // we have run out of ourUpdates, pick up all the remaining versions > from the other versions > if (ourUpdatesIndex < 0) { > String range = otherVersions.get(otherUpdatesIndex) + "..." + > otherVersions.get(0); > rangesToRequest.add(range); > totalRequestedVersions += otherUpdatesIndex + 1; > break; > } > // stop when the entries get old enough that reorders may lead us to > see updates we don't need > if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < > ourLowThreshold) break; > if (ourUpdates.get(ourUpdatesIndex).longValue() == > otherVersions.get(otherUpdatesIndex).longValue()) { > ourUpdatesIndex--; > otherUpdatesIndex--; > } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < > Math.abs(otherVersions.get(otherUpdatesIndex))) { > ourUpdatesIndex--; > } else { > long rangeStart = otherVersions.get(otherUpdatesIndex); > while ((otherUpdatesIndex < otherVersions.size()) > && (Math.abs(otherVersions.get(otherUpdatesIndex)) < > Math.abs(ourUpdates.get(ourUpdatesIndex { > otherUpdatesIndex--; > totalRequestedVersions++; > } > // construct range here > rangesToRequest.add(rangeStart + "..." + > otherVersions.get(otherUpdatesIndex + 1)); > } > } > {code} > If at some point there will be > {code} ourUpdates.get(ourUpdatesIndex) = > -otherVersions.get(otherUpdatesIndex) {code} > loop will never end. It will add same string again and again into > {{rangesToRequest}} until process runs out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11475) Endless loop and OOM in PeerSync
[ https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-11475: - Attachment: SOLR-11475.patch Here is a working patch. I came up completely hypothetical scenario where one replica has a version *ver* and other has version *-version*. [~shalinmangar] / [~noble.paul] Can you please take look at the patch. I still don't understand how can one into get this scenario but robust check wouldn't hurt. > Endless loop and OOM in PeerSync > > > Key: SOLR-11475 > URL: https://issues.apache.org/jira/browse/SOLR-11475 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrey Kudryavtsev > Attachments: SOLR-11475.patch > > > After problem described in SOLR-11459, I restarted cluster and got OOM on > start. > [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539] > contains this logic: > {code} > while (otherUpdatesIndex >= 0) { > // we have run out of ourUpdates, pick up all the remaining versions > from the other versions > if (ourUpdatesIndex < 0) { > String range = otherVersions.get(otherUpdatesIndex) + "..." + > otherVersions.get(0); > rangesToRequest.add(range); > totalRequestedVersions += otherUpdatesIndex + 1; > break; > } > // stop when the entries get old enough that reorders may lead us to > see updates we don't need > if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < > ourLowThreshold) break; > if (ourUpdates.get(ourUpdatesIndex).longValue() == > otherVersions.get(otherUpdatesIndex).longValue()) { > ourUpdatesIndex--; > otherUpdatesIndex--; > } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < > Math.abs(otherVersions.get(otherUpdatesIndex))) { > ourUpdatesIndex--; > } else { > long rangeStart = otherVersions.get(otherUpdatesIndex); > while ((otherUpdatesIndex < otherVersions.size()) > && (Math.abs(otherVersions.get(otherUpdatesIndex)) < > Math.abs(ourUpdates.get(ourUpdatesIndex { > otherUpdatesIndex--; > totalRequestedVersions++; > } > // construct range here > rangesToRequest.add(rangeStart + "..." + > otherVersions.get(otherUpdatesIndex + 1)); > } > } > {code} > If at some point there will be > {code} ourUpdates.get(ourUpdatesIndex) = > -otherVersions.get(otherUpdatesIndex) {code} > loop will never end. It will add same string again and again into > {{rangesToRequest}} until process runs out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11216) Make PeerSync more robust
[ https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260117#comment-16260117 ] Pushkar Raste commented on SOLR-11216: -- [~caomanhdat] Can this fail if the leader processes updates out of order e.g. what if leader processed updates in the order 6 and has yet to process 5. Now the replica requests update 6. However, leader has just finished processing 5 (including a soft/hard commit) and when leader calculates index fingerprint up to 6, the leader's fingerprint will include version 5 as well. Considering all the race conditions, I think making fingerprint robust is tricky. > Make PeerSync more robust > - > > Key: SOLR-11216 > URL: https://issues.apache.org/jira/browse/SOLR-11216 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > First of all, I will change the issue's title with a better name when I have. > When digging into SOLR-10126. I found a case that can make peerSync fail. > * leader and replica receive update from 1 to 4 > * replica stop > * replica miss updates 5, 6 > * replica start recovery > ## replica buffer updates 7, 8 > ## replica request versions from leader, > ## replica get recent versions which is 1,2,3,4,7,8 > ## in the same time leader receive update 9, so it will return updates from 1 > to 9 (for request versions) > ## replica do peersync and request updates 5, 6, 9 from leader > ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and > maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will > fail > My idea here is why replica request update 9 (step 6) while it knows that > updates with lower version ( update 7, 8 ) are on its buffering tlog. Should > we request only updates that lower than the lowest update in its buffering > tlog ( < 7 )? > Someone my ask that what if replica won't receive update 9. In that case, > leader will put the replica into LIR state, so replica will run recovery > process again. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync
[ https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215049#comment-16215049 ] Pushkar Raste commented on SOLR-11475: -- If you are blocked then you can try to turn using versionRanges off and fallback to using individual versions. If you can wait for a code fix, I will take a stab at it this weekend. Solution I am thinking is keeping a counter and incrementing it for every iteration and if we don't break from the outermost `while` loop before `counter > Math.max(ourUpdates.size(), otherVersions.size())` then throw an exception. or in the `else` before we create a new rage add a check of X and -X and throw an exception if that is true > Endless loop and OOM in PeerSync > > > Key: SOLR-11475 > URL: https://issues.apache.org/jira/browse/SOLR-11475 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrey Kudryavtsev > > After problem described in SOLR-11459, I restarted cluster and got OOM on > start. > [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539] > contains this logic: > {code} > while (otherUpdatesIndex >= 0) { > // we have run out of ourUpdates, pick up all the remaining versions > from the other versions > if (ourUpdatesIndex < 0) { > String range = otherVersions.get(otherUpdatesIndex) + "..." + > otherVersions.get(0); > rangesToRequest.add(range); > totalRequestedVersions += otherUpdatesIndex + 1; > break; > } > // stop when the entries get old enough that reorders may lead us to > see updates we don't need > if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < > ourLowThreshold) break; > if (ourUpdates.get(ourUpdatesIndex).longValue() == > otherVersions.get(otherUpdatesIndex).longValue()) { > ourUpdatesIndex--; > otherUpdatesIndex--; > } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < > Math.abs(otherVersions.get(otherUpdatesIndex))) { > ourUpdatesIndex--; > } else { > long rangeStart = otherVersions.get(otherUpdatesIndex); > while ((otherUpdatesIndex < otherVersions.size()) > && (Math.abs(otherVersions.get(otherUpdatesIndex)) < > Math.abs(ourUpdates.get(ourUpdatesIndex { > otherUpdatesIndex--; > totalRequestedVersions++; > } > // construct range here > rangesToRequest.add(rangeStart + "..." + > otherVersions.get(otherUpdatesIndex + 1)); > } > } > {code} > If at some point there will be > {code} ourUpdates.get(ourUpdatesIndex) = > -otherVersions.get(otherUpdatesIndex) {code} > loop will never end. It will add same string again and again into > {{rangesToRequest}} until process runs out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync
[ https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212935#comment-16212935 ] Pushkar Raste commented on SOLR-11475: -- Version numbers are monotonically increasing sequence numbers and for deletes sequence number is multiplied by -1 I dont think we would ever have version number X in replica's tlog and -X in leader's (or any other replica's) tlog Can you provide a valid test case for your issue. I am not in front of computer right now, however, IIRC tests have token PeerSync in the name. On Oct 20, 2017 5:54 AM, "Andrey Kudryavtsev (JIRA)"wrote: [ https://issues.apache.org/jira/browse/SOLR-11475?page= com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Kudryavtsev mentioned you on SOLR-11475 -- I think throwing exception in case of {{ourUpdates.get(ourUpdatesIndex) = -otherVersions.get(otherUpdatesIndex)}} than OOM [~praste], [~shalinmangar] What do you think? comment Hint: You can mention someone in an issue description or comment by typing "@" in front of their username. -- This message was sent by Atlassian JIRA (v6.4.14#64029) > Endless loop and OOM in PeerSync > > > Key: SOLR-11475 > URL: https://issues.apache.org/jira/browse/SOLR-11475 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrey Kudryavtsev > > After problem described in SOLR-11459, I restarted cluster and got OOM on > start. > [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539] > contains this logic: > {code} > while (otherUpdatesIndex >= 0) { > // we have run out of ourUpdates, pick up all the remaining versions > from the other versions > if (ourUpdatesIndex < 0) { > String range = otherVersions.get(otherUpdatesIndex) + "..." + > otherVersions.get(0); > rangesToRequest.add(range); > totalRequestedVersions += otherUpdatesIndex + 1; > break; > } > // stop when the entries get old enough that reorders may lead us to > see updates we don't need > if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < > ourLowThreshold) break; > if (ourUpdates.get(ourUpdatesIndex).longValue() == > otherVersions.get(otherUpdatesIndex).longValue()) { > ourUpdatesIndex--; > otherUpdatesIndex--; > } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < > Math.abs(otherVersions.get(otherUpdatesIndex))) { > ourUpdatesIndex--; > } else { > long rangeStart = otherVersions.get(otherUpdatesIndex); > while ((otherUpdatesIndex < otherVersions.size()) > && (Math.abs(otherVersions.get(otherUpdatesIndex)) < > Math.abs(ourUpdates.get(ourUpdatesIndex { > otherUpdatesIndex--; > totalRequestedVersions++; > } > // construct range here > rangesToRequest.add(rangeStart + "..." + > otherVersions.get(otherUpdatesIndex + 1)); > } > } > {code} > If at some point there will be > {code} ourUpdates.get(ourUpdatesIndex) = > -otherVersions.get(otherUpdatesIndex) {code} > loop will never end. It will add same string again and again into > {{rangesToRequest}} until process runs out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10922) NPE in PeerSync
[ https://issues.apache.org/jira/browse/SOLR-10922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056741#comment-16056741 ] Pushkar Raste commented on SOLR-10922: -- Is this duplicate of SOLR-9915 ? > NPE in PeerSync > --- > > Key: SOLR-10922 > URL: https://issues.apache.org/jira/browse/SOLR-10922 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.6 >Reporter: Markus Jelsma >Priority: Minor > Fix For: master (7.0) > > > {code} > Error while trying to recover. > core=search_shard2_replica2:java.lang.NullPointerException > at org.apache.solr.update.PeerSync.alreadyInSync(PeerSync.java:381) > at org.apache.solr.update.PeerSync.sync(PeerSync.java:251) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:439) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:284) > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10169) PeerSync will hit an NPE on no response errors when looking for fingerprint.
[ https://issues.apache.org/jira/browse/SOLR-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056738#comment-16056738 ] Pushkar Raste commented on SOLR-10169: -- Is this duplicate of SOLR-9915 > PeerSync will hit an NPE on no response errors when looking for fingerprint. > > > Key: SOLR-10169 > URL: https://issues.apache.org/jira/browse/SOLR-10169 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mark Miller > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10873) Explore a utility for periodically checking the document counts for replicas of a shard
[ https://issues.apache.org/jira/browse/SOLR-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047280#comment-16047280 ] Pushkar Raste edited comment on SOLR-10873 at 6/13/17 1:03 AM: --- There are other advantages too * You can compute Index fingerprint upto any arbitrary version. Depending on tolerance, you can check if fingerprint matches the last version in the second from last tlog. No need to differ commits in this case * Index fingerprint is cached in SolrCore class and hence even if frequency of sync check is high you may not have recompute fingerprint every single time `RealTimeGetcomponent` already supports a call `processGetFingerprint` while working on SOLR-9446 was (Author: praste): There are other advantages too * You can compute Index fingerprint upto any arbitrary version. Depending on tolerance, you can check if fingerprint matches the last version in the second from last tlog. No need to differ commits in this case * Index fingerprint is cached in SolrCore class and hence even frequency of sync check is high you may not have recompute fingerprint every single time `RealTimeGetcomponent` already supports a call `processGetFingerprint` while working on SOLR-9446 > Explore a utility for periodically checking the document counts for replicas > of a shard > --- > > Key: SOLR-10873 > URL: https://issues.apache.org/jira/browse/SOLR-10873 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Erick Erickson > > We've had several situations "in the field" and on the user's list where the > number of documents on different replicas of the same shard differ. I've also > seen situations where the numbers are wildly different (two orders of > magnitude). I can force this situation by, say, taking down nodes, adding > replicas that become the leader then starting the nodes back up. But it > doesn't matter whether the discrepancy is a result of "pilot error" or a > problem with the code, in either case it would be useful to flag it. > Straw-man proposal: > We create a processor (modeled on DocExpirationUpdateProcessorFactory > perhaps?) that periodically wakes up and checks that each replica in the > given shard has the same document count (and perhaps other checks TBD?). Send > some kind of notification if a problem was detected. > Issues: > 1> this will require some way to deal with the differing commit times. > 1a> If we require a timestamp on each document we could check the config file > to see the autocommit interval and, say, check NOW-(2 x opensearcher > interval). In that case the config would just require the field to use be > specified. > 1b> we could require that part of the configuration is a query to use to > check document counts. I kind of like this one. > 2> How to let the admins know a discrepancy was found? e-mail? ERROR level > log message? Other? > 3> How does this fit into the autoscaling initiative? This is a "monitor the > system and do something" item. If we go forward with this we should do it > with an eye toward fitting it in that framework. > 3a> Is there anything we can do to auto-correct this situation? > Auto-correction could be tricky. Heuristics like "make the replica with the > most documents the leader and force full index replication on all the > replicas that don't agree" seem dangerous. > 4> How to keep the impact minimal? The simple approach would be for each > replica to check all other replicas in the shard. So say there are 10 > replicas on a single shard, that would be 90 queries. It would suffice for > just one of those to check the other 9, not have all 10 check the other nine. > Maybe restrict the checker to be the leader? Or otherwise just make it one > replica/shard that does the checking? > 5> It's probably useful to add a collections API call to fire this off > manually. Or maybe as part of CHECKSTATUS? > What do people think? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10873) Explore a utility for periodically checking the document counts for replicas of a shard
[ https://issues.apache.org/jira/browse/SOLR-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047280#comment-16047280 ] Pushkar Raste edited comment on SOLR-10873 at 6/13/17 1:02 AM: --- There are other advantages too * You can compute Index fingerprint upto any arbitrary version. Depending on tolerance, you can check if fingerprint matches the last version in the second from last tlog. No need to differ commits in this case * Index fingerprint is cached in SolrCore class and hence even frequency of sync check is high you may not have recompute fingerprint every single time `RealTimeGetcomponent` already supports a call `processGetFingerprint` while working on SOLR-9446 was (Author: praste): There are other advantages too * You can compute Index fingerprint upto any arbitrary version. Depending on tolerance, you can check if fingerprint matches the last version in second from last version in the tlog. No need to differ commits in this case * Index fingerprint is cached in SolrCore class and hence even frequency of sync check is high you may not have recompute fingerprint every single time `RealTimeGetcomponent` already supports a call `processGetFingerprint` while working on SOLR-9446 > Explore a utility for periodically checking the document counts for replicas > of a shard > --- > > Key: SOLR-10873 > URL: https://issues.apache.org/jira/browse/SOLR-10873 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Erick Erickson > > We've had several situations "in the field" and on the user's list where the > number of documents on different replicas of the same shard differ. I've also > seen situations where the numbers are wildly different (two orders of > magnitude). I can force this situation by, say, taking down nodes, adding > replicas that become the leader then starting the nodes back up. But it > doesn't matter whether the discrepancy is a result of "pilot error" or a > problem with the code, in either case it would be useful to flag it. > Straw-man proposal: > We create a processor (modeled on DocExpirationUpdateProcessorFactory > perhaps?) that periodically wakes up and checks that each replica in the > given shard has the same document count (and perhaps other checks TBD?). Send > some kind of notification if a problem was detected. > Issues: > 1> this will require some way to deal with the differing commit times. > 1a> If we require a timestamp on each document we could check the config file > to see the autocommit interval and, say, check NOW-(2 x opensearcher > interval). In that case the config would just require the field to use be > specified. > 1b> we could require that part of the configuration is a query to use to > check document counts. I kind of like this one. > 2> How to let the admins know a discrepancy was found? e-mail? ERROR level > log message? Other? > 3> How does this fit into the autoscaling initiative? This is a "monitor the > system and do something" item. If we go forward with this we should do it > with an eye toward fitting it in that framework. > 3a> Is there anything we can do to auto-correct this situation? > Auto-correction could be tricky. Heuristics like "make the replica with the > most documents the leader and force full index replication on all the > replicas that don't agree" seem dangerous. > 4> How to keep the impact minimal? The simple approach would be for each > replica to check all other replicas in the shard. So say there are 10 > replicas on a single shard, that would be 90 queries. It would suffice for > just one of those to check the other 9, not have all 10 check the other nine. > Maybe restrict the checker to be the leader? Or otherwise just make it one > replica/shard that does the checking? > 5> It's probably useful to add a collections API call to fire this off > manually. Or maybe as part of CHECKSTATUS? > What do people think? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10873) Explore a utility for periodically checking the document counts for replicas of a shard
[ https://issues.apache.org/jira/browse/SOLR-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047280#comment-16047280 ] Pushkar Raste commented on SOLR-10873: -- There are other advantages too * You can compute Index fingerprint upto any arbitrary version. Depending on tolerance, you can check if fingerprint matches the last version in second from last version in the tlog. No need to differ commits in this case * Index fingerprint is cached in SolrCore class and hence even frequency of sync check is high you may not have recompute fingerprint every single time `RealTimeGetcomponent` already supports a call `processGetFingerprint` while working on SOLR-9446 > Explore a utility for periodically checking the document counts for replicas > of a shard > --- > > Key: SOLR-10873 > URL: https://issues.apache.org/jira/browse/SOLR-10873 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Erick Erickson > > We've had several situations "in the field" and on the user's list where the > number of documents on different replicas of the same shard differ. I've also > seen situations where the numbers are wildly different (two orders of > magnitude). I can force this situation by, say, taking down nodes, adding > replicas that become the leader then starting the nodes back up. But it > doesn't matter whether the discrepancy is a result of "pilot error" or a > problem with the code, in either case it would be useful to flag it. > Straw-man proposal: > We create a processor (modeled on DocExpirationUpdateProcessorFactory > perhaps?) that periodically wakes up and checks that each replica in the > given shard has the same document count (and perhaps other checks TBD?). Send > some kind of notification if a problem was detected. > Issues: > 1> this will require some way to deal with the differing commit times. > 1a> If we require a timestamp on each document we could check the config file > to see the autocommit interval and, say, check NOW-(2 x opensearcher > interval). In that case the config would just require the field to use be > specified. > 1b> we could require that part of the configuration is a query to use to > check document counts. I kind of like this one. > 2> How to let the admins know a discrepancy was found? e-mail? ERROR level > log message? Other? > 3> How does this fit into the autoscaling initiative? This is a "monitor the > system and do something" item. If we go forward with this we should do it > with an eye toward fitting it in that framework. > 3a> Is there anything we can do to auto-correct this situation? > Auto-correction could be tricky. Heuristics like "make the replica with the > most documents the leader and force full index replication on all the > replicas that don't agree" seem dangerous. > 4> How to keep the impact minimal? The simple approach would be for each > replica to check all other replicas in the shard. So say there are 10 > replicas on a single shard, that would be 90 queries. It would suffice for > just one of those to check the other 9, not have all 10 check the other nine. > Maybe restrict the checker to be the leader? Or otherwise just make it one > replica/shard that does the checking? > 5> It's probably useful to add a collections API call to fire this off > manually. Or maybe as part of CHECKSTATUS? > What do people think? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10873) Explore a utility for periodically checking the document counts for replicas of a shard
[ https://issues.apache.org/jira/browse/SOLR-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047218#comment-16047218 ] Pushkar Raste commented on SOLR-10873: -- What if count is same but actual data is different. Can we use Index fingerprint instead to verify if replicas are in sync? > Explore a utility for periodically checking the document counts for replicas > of a shard > --- > > Key: SOLR-10873 > URL: https://issues.apache.org/jira/browse/SOLR-10873 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Erick Erickson > > We've had several situations "in the field" and on the user's list where the > number of documents on different replicas of the same shard differ. I've also > seen situations where the numbers are wildly different (two orders of > magnitude). I can force this situation by, say, taking down nodes, adding > replicas that become the leader then starting the nodes back up. But it > doesn't matter whether the discrepancy is a result of "pilot error" or a > problem with the code, in either case it would be useful to flag it. > Straw-man proposal: > We create a processor (modeled on DocExpirationUpdateProcessorFactory > perhaps?) that periodically wakes up and checks that each replica in the > given shard has the same document count (and perhaps other checks TBD?). Send > some kind of notification if a problem was detected. > Issues: > 1> this will require some way to deal with the differing commit times. > 1a> If we require a timestamp on each document we could check the config file > to see the autocommit interval and, say, check NOW-(2 x opensearcher > interval). In that case the config would just require the field to use be > specified. > 1b> we could require that part of the configuration is a query to use to > check document counts. I kind of like this one. > 2> How to let the admins know a discrepancy was found? e-mail? ERROR level > log message? Other? > 3> How does this fit into the autoscaling initiative? This is a "monitor the > system and do something" item. If we go forward with this we should do it > with an eye toward fitting it in that framework. > 3a> Is there anything we can do to auto-correct this situation? > Auto-correction could be tricky. Heuristics like "make the replica with the > most documents the leader and force full index replication on all the > replicas that don't agree" seem dangerous. > 4> How to keep the impact minimal? The simple approach would be for each > replica to check all other replicas in the shard. So say there are 10 > replicas on a single shard, that would be 90 queries. It would suffice for > just one of those to check the other 9, not have all 10 check the other nine. > Maybe restrict the checker to be the leader? Or otherwise just make it one > replica/shard that does the checking? > 5> It's probably useful to add a collections API call to fire this off > manually. Or maybe as part of CHECKSTATUS? > What do people think? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7750) Findbugs bug fixes
[ https://issues.apache.org/jira/browse/LUCENE-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941971#comment-15941971 ] Pushkar Raste commented on LUCENE-7750: --- Did you got a chance to look at my patch for SOLR-10080 and as mentioned in one of the mails work I had done in my fork https://github.com/praste/lucene-solr/tree/findbugs-lucene > Findbugs bug fixes > -- > > Key: LUCENE-7750 > URL: https://issues.apache.org/jira/browse/LUCENE-7750 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Daniel Jelinski >Priority: Minor > > Holder issue to keep track of Findbugs-related fixes -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] (SOLR-10080) Fix some issues reported by findbugs
[ https://issues.apache.org/jira/browse/SOLR-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-10080: - Attachment: SOLR-10080.patch I have fixed some of the issues reported by findbugs (and added some descriptive comments in some cases). In some cases when, I wasn't sure if code findbugs reported as issue was intentionally written like that or not. Once, I get feedback, I will polish the patch to remove unwanted comments (or remove code that I have just commented for now) > Fix some issues reported by findbugs > - > > Key: SOLR-10080 > URL: https://issues.apache.org/jira/browse/SOLR-10080 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: SOLR-10080.patch > > > I ran a findbugs analysis on code and found a lot of issues. I found issues > in both lucene and solr, however, I am not sure whether Solr developers > should fix issues like this in the lucene code or not. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] (SOLR-10080) Fix some issues reported by findbugs
Pushkar Raste created SOLR-10080: Summary: Fix some issues reported by findbugs Key: SOLR-10080 URL: https://issues.apache.org/jira/browse/SOLR-10080 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Pushkar Raste Priority: Minor I ran a findbugs analysis on code and found a lot of issues. I found issues in both lucene and solr, however, I am not sure whether Solr developers should fix issues like this in the lucene code or not. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9555) Recovery can hang if a node is put into LIR as it is starting up
[ https://issues.apache.org/jira/browse/SOLR-9555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828380#comment-15828380 ] Pushkar Raste commented on SOLR-9555: - Can't we just timeout in the test to let's say 200 > Recovery can hang if a node is put into LIR as it is starting up > > > Key: SOLR-9555 > URL: https://issues.apache.org/jira/browse/SOLR-9555 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Alan Woodward > > See > https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17888/consoleFull > for an example -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9555) Recovery can hang if a node is put into LIR as it is starting up
[ https://issues.apache.org/jira/browse/SOLR-9555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827013#comment-15827013 ] Pushkar Raste commented on SOLR-9555: - I assume this would happen in the prod deployment (outside of the test itself) as well. Does the {{PeerSyncReplicationTest.waitTillNodesActive}} method looks good or method itself has some bug? > Recovery can hang if a node is put into LIR as it is starting up > > > Key: SOLR-9555 > URL: https://issues.apache.org/jira/browse/SOLR-9555 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Alan Woodward > > See > https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17888/consoleFull > for an example -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824089#comment-15824089 ] Pushkar Raste edited comment on SOLR-9906 at 1/16/17 2:48 PM: -- [~romseygeek] - Thank you for catch the bug. I think check can be fixed by changing {{slice.getState() == State.ACTIVE}} to {{slice.getLeader().getState() == Replica.State.ACTIVE}} Let me know if that is correct and I will attach a patch to fix it (Not sure if I have attach patch for this issue in entirety or just the patch to fix the slice vs replica state. By log message is badly setup, do you mean line {{log.debug("Old leader {}, new leader. New leader got elected in {} ms", oldLeader, slice.getLeader(),timeOut.timeElapsed(MILLISECONDS) );}} is missing a {} placeholder for the new leader? was (Author: praste): [~romseygeek] - Thank you for catch the bug. I think check can be fixed by changing {{slice.getState() == State.ACTIVE}} to {{slice.getLeader().getState() == Replica.State.ACTIVE}} Let me know if that is correct and I will attach a patch to fix it (Not sure if I have attach patch for this issue in entirety or just the patch to fix the slice vs replica state. What do you mean by log message is badly setup? > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Fix For: 6.4 > > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824089#comment-15824089 ] Pushkar Raste commented on SOLR-9906: - [~romseygeek] - Thank you for catch the bug. I think check can be fixed by changing {{slice.getState() == State.ACTIVE}} to {{slice.getLeader().getState() == Replica.State.ACTIVE}} Let me know if that is correct and I will attach a patch to fix it (Not sure if I have attach patch for this issue in entirety or just the patch to fix the slice vs replica state. What do you mean by log message is badly setup? > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Fix For: 6.4 > > Attachments: SOLR-9906.patch, SOLR-9906.patch, > SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9453) NullPointerException on PeerSync recovery
[ https://issues.apache.org/jira/browse/SOLR-9453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812322#comment-15812322 ] Pushkar Raste commented on SOLR-9453: - Try switching to 6.3 > NullPointerException on PeerSync recovery > - > > Key: SOLR-9453 > URL: https://issues.apache.org/jira/browse/SOLR-9453 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.2 >Reporter: Michael Braun >Assignee: Shalin Shekhar Mangar > > Just updated to 6.2.0 (previously using 6.1.0) and we restarted the cluster a > few times - for one replica trying to sync on a shard, we got this on a > bootup and it's seemingly stuck. Cluster has 96 shards, 2 replicas per shard. > Shard 51 is where this issue occurred for us. It looks like the replica > eventually recovers, but we probably shouldn't see a NullPointerException. > {code} > java.lang.NullPointerException > at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:605) > at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:344) > at org.apache.solr.update.PeerSync.sync(PeerSync.java:257) > at > org.apache.solr.handler.component.RealTimeGetComponent.processSync(RealTimeGetComponent.java:658) > at > org.apache.solr.handler.component.RealTimeGetComponent.processGetVersions(RealTimeGetComponent.java:623) > at > org.apache.solr.handler.component.RealTimeGetComponent.process(RealTimeGetComponent.java:117) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:518) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) > at java.lang.Thread.run(Thread.java:745) > {code} > Before it in the log , pasting some relevant lines with full IPs redacted: > {code}ERROR - 2016-08-29 15:10:28.940; org.apache.solr.common.SolrException; > Error while trying to recover. > core=ourcollection_shard51_replica2:org.apache.solr.common.SolrException: No > registered leader was found after waiting for 4000ms , collection: > ourcollection slice: shard51 > at >
[jira] [Commented] (SOLR-9453) NullPointerException on PeerSync recovery
[ https://issues.apache.org/jira/browse/SOLR-9453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812164#comment-15812164 ] Pushkar Raste commented on SOLR-9453: - Looks like NPE is coming from a log statement https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.2.0/solr/core/src/java/org/apache/solr/update/PeerSync.java#L605 > NullPointerException on PeerSync recovery > - > > Key: SOLR-9453 > URL: https://issues.apache.org/jira/browse/SOLR-9453 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 6.2 >Reporter: Michael Braun >Assignee: Shalin Shekhar Mangar > > Just updated to 6.2.0 (previously using 6.1.0) and we restarted the cluster a > few times - for one replica trying to sync on a shard, we got this on a > bootup and it's seemingly stuck. Cluster has 96 shards, 2 replicas per shard. > Shard 51 is where this issue occurred for us. It looks like the replica > eventually recovers, but we probably shouldn't see a NullPointerException. > {code} > java.lang.NullPointerException > at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:605) > at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:344) > at org.apache.solr.update.PeerSync.sync(PeerSync.java:257) > at > org.apache.solr.handler.component.RealTimeGetComponent.processSync(RealTimeGetComponent.java:658) > at > org.apache.solr.handler.component.RealTimeGetComponent.processGetVersions(RealTimeGetComponent.java:623) > at > org.apache.solr.handler.component.RealTimeGetComponent.process(RealTimeGetComponent.java:117) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:518) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) > at java.lang.Thread.run(Thread.java:745) > {code} > Before it in the log , pasting some relevant lines with full IPs redacted: > {code}ERROR - 2016-08-29 15:10:28.940; org.apache.solr.common.SolrException; > Error while trying to recover. > core=ourcollection_shard51_replica2:org.apache.solr.common.SolrException: No > registered leader was found after
[jira] [Updated] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9906: Attachment: SOLR-9906.patch > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: SOLR-9906.patch, SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788716#comment-15788716 ] Pushkar Raste commented on SOLR-9835: - How are we handling leader failure here. if replicas are some what out of sync with the original leader, how would we elect a new leader. When the leader fails and a new leader gets elected, the new leader asks all the replicas to sync with the new leader. My understanding is, "since we are replicating index by fetching segments from leader, most of the segments on all the replicas should look the same, hence all the replicas will not go into full index copying". Is that correct ? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat >Assignee: Shalin Shekhar Mangar > Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch > > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9886) Add ability to turn off/on caches
[ https://issues.apache.org/jira/browse/SOLR-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786323#comment-15786323 ] Pushkar Raste commented on SOLR-9886: - [~noble.paul] My only concern adding legend to {{EditableSolrConfigAttributes.json}} is, if we ever parse this file using a JSON parser, we will have to move legend to some other place. > Add ability to turn off/on caches > -- > > Key: SOLR-9886 > URL: https://issues.apache.org/jira/browse/SOLR-9886 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Attachments: EnableDisableCacheAttribute.patch, SOLR-9886.patch, > SOLR-9886.patch > > > There is no elegant way to turn off caches (filterCache, queryResultCache > etc) from the solrconfig. When I tried setting size and initialSize to zero, > it resulted in caches of size 2. Here is the code that overrides setting zero > sized cache. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L61-L73 > Only way to disable cache right now is by removing cache configs from the > solrConfig, but we can simply provide an attribute to disable cache, so that > we can override it using a system property. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9886) Add ability to turn off/on caches
[ https://issues.apache.org/jira/browse/SOLR-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9886: Attachment: SOLR-9886.patch Updated patch with a test. > Add ability to turn off/on caches > -- > > Key: SOLR-9886 > URL: https://issues.apache.org/jira/browse/SOLR-9886 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Attachments: EnableDisableCacheAttribute.patch, SOLR-9886.patch, > SOLR-9886.patch > > > There is no elegant way to turn off caches (filterCache, queryResultCache > etc) from the solrconfig. When I tried setting size and initialSize to zero, > it resulted in caches of size 2. Here is the code that overrides setting zero > sized cache. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L61-L73 > Only way to disable cache right now is by removing cache configs from the > solrConfig, but we can simply provide an attribute to disable cache, so that > we can override it using a system property. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
[ https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9906: Attachment: SOLR-PeerSyncVsReplicationTest.diff Here is a patch. I have also fixed bugs in the tests I came across. > Use better check to validate if node recovered via PeerSync or Replication > -- > > Key: SOLR-9906 > URL: https://issues.apache.org/jira/browse/SOLR-9906 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: SOLR-PeerSyncVsReplicationTest.diff > > > Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} > currently rely on number of requests made to the leader's replication handler > to check if node recovered via PeerSync or replication. This check is not > very reliable and we have seen failures in the past. > While tinkering with different way to write a better test I found > [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better > way to distinguish recovery via PeerSync vs Replication. > * For {{PeerSyncReplicationTest}}, if node successfully recovers via > PeerSync, then file {{replication.properties}} should not exist > For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does > not go into replication recovery after the leader failure, contents > {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication
Pushkar Raste created SOLR-9906: --- Summary: Use better check to validate if node recovered via PeerSync or Replication Key: SOLR-9906 URL: https://issues.apache.org/jira/browse/SOLR-9906 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Pushkar Raste Priority: Minor Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} currently rely on number of requests made to the leader's replication handler to check if node recovered via PeerSync or replication. This check is not very reliable and we have seen failures in the past. While tinkering with different way to write a better test I found [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better way to distinguish recovery via PeerSync vs Replication. * For {{PeerSyncReplicationTest}}, if node successfully recovers via PeerSync, then file {{replication.properties}} should not exist For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does not go into replication recovery after the leader failure, contents {{replication.properties}} should not change -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9859) replication.properties cannot be updated after being written and neither replication.properties or index.properties are durable in the face of a crash
[ https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770591#comment-15770591 ] Pushkar Raste commented on SOLR-9859: - Yeah. Looks like these are the same issue > replication.properties cannot be updated after being written and neither > replication.properties or index.properties are durable in the face of a crash > -- > > Key: SOLR-9859 > URL: https://issues.apache.org/jira/browse/SOLR-9859 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.3, 6.3 >Reporter: Pushkar Raste >Assignee: Mark Miller >Priority: Minor > Attachments: SOLR-9859.patch, SOLR-9859.patch, SOLR-9859.patch, > SOLR-9859.patch, SOLR-9859.patch > > > If a shard recovers via replication (vs PeerSync) a file named > {{replication.properties}} gets created. If the same shard recovers once more > via replication, IndexFetcher fails to write latest replication information > as it tries to create {{replication.properties}} but as file already exists. > Here is the stack trace I saw > {code} > java.nio.file.FileAlreadyExistsException: > \shard-3-001\cores\collection1\data\replication.properties > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source) > at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source) > at java.nio.file.Files.newOutputStream(Unknown Source) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265) > at > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) > at > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9886) Add ability to turn off/on caches
[ https://issues.apache.org/jira/browse/SOLR-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768331#comment-15768331 ] Pushkar Raste commented on SOLR-9886: - Check attached patch. I think we may to make changes to {{EditableSolrConfigAttributes.json}} as well. I don't understand mapping between the attribute names and associated numbers. > Add ability to turn off/on caches > -- > > Key: SOLR-9886 > URL: https://issues.apache.org/jira/browse/SOLR-9886 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: EnableDisableCacheAttribute.patch > > > There is no elegant way to turn off caches (filterCache, queryResultCache > etc) from the solrconfig. When I tried setting size and initialSize to zero, > it resulted in caches of size 2. Here is the code that overrides setting zero > sized cache. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L61-L73 > Only way to disable cache right now is by removing cache configs from the > solrConfig, but we can simply provide an attribute to disable cache, so that > we can override it using a system property. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9886) Add ability to turn off/on caches
[ https://issues.apache.org/jira/browse/SOLR-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9886: Attachment: EnableDisableCacheAttribute.patch > Add ability to turn off/on caches > -- > > Key: SOLR-9886 > URL: https://issues.apache.org/jira/browse/SOLR-9886 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: EnableDisableCacheAttribute.patch > > > There is no elegant way to turn off caches (filterCache, queryResultCache > etc) from the solrconfig. When I tried setting size and initialSize to zero, > it resulted in caches of size 2. Here is the code that overrides setting zero > sized cache. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L61-L73 > Only way to disable cache right now is by removing cache configs from the > solrConfig, but we can simply provide an attribute to disable cache, so that > we can override it using a system property. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9886) Add ability to turn off/on caches
Pushkar Raste created SOLR-9886: --- Summary: Add ability to turn off/on caches Key: SOLR-9886 URL: https://issues.apache.org/jira/browse/SOLR-9886 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Pushkar Raste Priority: Minor There is no elegant way to turn off caches (filterCache, queryResultCache etc) from the solrconfig. When I tried setting size and initialSize to zero, it resulted in caches of size 2. Here is the code that overrides setting zero sized cache. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L61-L73 Only way to disable cache right now is by removing cache configs from the solrConfig, but we can simply provide an attribute to disable cache, so that we can override it using a system property. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9859) replication.properties cannot be updated after being written and neither replication.properties or index.properties are durable in the face of a crash
[ https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762069#comment-15762069 ] Pushkar Raste commented on SOLR-9859: - Looks good to me Can we write a test to validate the patch? > replication.properties cannot be updated after being written and neither > replication.properties or index.properties are durable in the face of a crash > -- > > Key: SOLR-9859 > URL: https://issues.apache.org/jira/browse/SOLR-9859 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.3, 6.3 >Reporter: Pushkar Raste >Assignee: Mark Miller >Priority: Minor > Attachments: SOLR-9859.patch, SOLR-9859.patch, SOLR-9859.patch, > SOLR-9859.patch, SOLR-9859.patch > > > If a shard recovers via replication (vs PeerSync) a file named > {{replication.properties}} gets created. If the same shard recovers once more > via replication, IndexFetcher fails to write latest replication information > as it tries to create {{replication.properties}} but as file already exists. > Here is the stack trace I saw > {code} > java.nio.file.FileAlreadyExistsException: > \shard-3-001\cores\collection1\data\replication.properties > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source) > at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source) > at java.nio.file.Files.newOutputStream(Unknown Source) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265) > at > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) > at > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9859) replication.properties does not get updated the second time around if index recovers via replication
[ https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751997#comment-15751997 ] Pushkar Raste commented on SOLR-9859: - [~markrmil...@gmail.com] looks like in the `atomicRename` file you are deleting existing file and then renaming the temp file. How is this better than just deleting the file a writing a new file, if we crash at a wrong time (as you have mentioned above). Would we need to manually rename the temp file in such a scenario? > replication.properties does not get updated the second time around if index > recovers via replication > > > Key: SOLR-9859 > URL: https://issues.apache.org/jira/browse/SOLR-9859 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.3, 6.3 >Reporter: Pushkar Raste >Assignee: Mark Miller >Priority: Minor > Attachments: SOLR-9859.patch, SOLR-9859.patch > > > If a shard recovers via replication (vs PeerSync) a file named > {{replication.properties}} gets created. If the same shard recovers once more > via replication, IndexFetcher fails to write latest replication information > as it tries to create {{replication.properties}} but as file already exists. > Here is the stack trace I saw > {code} > java.nio.file.FileAlreadyExistsException: > \shard-3-001\cores\collection1\data\replication.properties > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source) > at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source) > at java.nio.file.Files.newOutputStream(Unknown Source) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265) > at > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) > at > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9859) replication.properties does not get updated the second time around if index recovers via replication
[ https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749175#comment-15749175 ] Pushkar Raste commented on SOLR-9859: - Is there a way we can write a temp file and do a mv to rename/overwrite replication.properties Alternate solution would be to keep appending to existing file and read the latest stats from the file. > replication.properties does not get updated the second time around if index > recovers via replication > > > Key: SOLR-9859 > URL: https://issues.apache.org/jira/browse/SOLR-9859 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.3, 6.3 >Reporter: Pushkar Raste >Assignee: Mark Miller >Priority: Minor > > If a shard recovers via replication (vs PeerSync) a file named > {{replication.properties}} gets created. If the same shard recovers once more > via replication, IndexFetcher fails to write latest replication information > as it tries to create {{replication.properties}} but as file already exists. > Here is the stack trace I saw > {code} > java.nio.file.FileAlreadyExistsException: > \shard-3-001\cores\collection1\data\replication.properties > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source) > at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source) > at java.nio.file.Files.newOutputStream(Unknown Source) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265) > at > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) > at > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9859) replication.properties does not get updated the second time around if index recovers via replication
[ https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9859: Summary: replication.properties does not get updated the second time around if index recovers via replication (was: replication.properties does get updated the second time around if index recovers via replication) > replication.properties does not get updated the second time around if index > recovers via replication > > > Key: SOLR-9859 > URL: https://issues.apache.org/jira/browse/SOLR-9859 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.3, 6.3 >Reporter: Pushkar Raste >Priority: Minor > > If a shard recovers via replication (vs PeerSync) a file named > {{replication.properties}} gets created. If the same shard recovers once more > via replication, IndexFetcher fails to write latest replication information > as it tries to create {{replication.properties}} but as file already exists. > Here is the stack trace I saw > {code} > java.nio.file.FileAlreadyExistsException: > \shard-3-001\cores\collection1\data\replication.properties > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source) > at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source) > at java.nio.file.Files.newOutputStream(Unknown Source) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265) > at > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) > at > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9859) replication.properties does get updated the second time around if index recovers via replication
[ https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742967#comment-15742967 ] Pushkar Raste commented on SOLR-9859: - Proposed solution 'Delete existing file and create a new one' ? > replication.properties does get updated the second time around if index > recovers via replication > > > Key: SOLR-9859 > URL: https://issues.apache.org/jira/browse/SOLR-9859 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.5.3, 6.3 >Reporter: Pushkar Raste >Priority: Minor > > If a shard recovers via replication (vs PeerSync) a file named > {{replication.properties}} gets created. If the same shard recovers once more > via replication, IndexFetcher fails to write latest replication information > as it tries to create {{replication.properties}} but as file already exists. > Here is the stack trace I saw > {code} > java.nio.file.FileAlreadyExistsException: > \shard-3-001\cores\collection1\data\replication.properties > at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) > at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source) > at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source) > at java.nio.file.Files.newOutputStream(Unknown Source) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) > at > org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) > at > org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) > at > org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501) > at > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265) > at > org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) > at > org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157) > at > org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9859) replication.properties does get updated the second time around if index recovers via replication
Pushkar Raste created SOLR-9859: --- Summary: replication.properties does get updated the second time around if index recovers via replication Key: SOLR-9859 URL: https://issues.apache.org/jira/browse/SOLR-9859 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 6.3, 5.5.3 Reporter: Pushkar Raste Priority: Minor If a shard recovers via replication (vs PeerSync) a file named {{replication.properties}} gets created. If the same shard recovers once more via replication, IndexFetcher fails to write latest replication information as it tries to create {{replication.properties}} but as file already exists. Here is the stack trace I saw {code} java.nio.file.FileAlreadyExistsException: \shard-3-001\cores\collection1\data\replication.properties at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source) at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source) at java.nio.file.Files.newOutputStream(Unknown Source) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253) at org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689) at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501) at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736051#comment-15736051 ] Pushkar Raste commented on SOLR-9835: - Instead of periodic polling, can leader upon receiving and processing a commit command, send a notification to replicas asking them to sync up? > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732174#comment-15732174 ] Pushkar Raste commented on SOLR-9835: - I am curious to know how soft commits (in memory segments) would be handled. > Create another replication mode for SolrCloud > - > > Key: SOLR-9835 > URL: https://issues.apache.org/jira/browse/SOLR-9835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Cao Manh Dat > > The current replication mechanism of SolrCloud is called state machine, which > replicas start in same initial state and for each input, the input is > distributed across replicas so all replicas will end up with same next state. > But this type of replication have some drawbacks > - The commit (which costly) have to run on all replicas > - Slow recovery, because if replica miss more than N updates on its down > time, the replica have to download entire index from its leader. > So we create create another replication mode for SolrCloud called state > transfer, which acts like master/slave replication. In basically > - Leader distribute the update to other replicas, but the leader only apply > the update to IW, other replicas just store the update to UpdateLog (act like > replication). > - Replicas frequently polling the latest segments from leader. > Pros: > - Lightweight for indexing, because only leader are running the commit, > updates. > - Very fast recovery, replicas just have to download the missing segments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702936#comment-15702936 ] Pushkar Raste commented on SOLR-9546: - Looks like we stepped on each other foot when I was fixing the {{CloudMLTQParser}} class. Please check updated patch. > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Attachments: SOLR-9546.patch, SOLR-9546_CloudMLTQParser.patch > > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9546: Attachment: SOLR-9546_CloudMLTQParser.patch > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Attachments: SOLR-9546.patch, SOLR-9546_CloudMLTQParser.patch > > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9546: Attachment: (was: SOLR-9546_CloudMLTQParser.patch) > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Attachments: SOLR-9546.patch > > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1566#comment-1566 ] Pushkar Raste commented on SOLR-9546: - Are you still talking about the CloundMLTQParser patch? If it was applied how come I still see code that uses objects ? https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/mlt/CloudMLTQParser.java#L72-L91 > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Attachments: SOLR-9546.patch, SOLR-9546_CloudMLTQParser.patch > > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9511) Retire using individual versions to request updates during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696682#comment-15696682 ] Pushkar Raste commented on SOLR-9511: - I don't think individual versions API is being used anywhere. I agree to keep old API around for may be another major version (8.X), but don't see much harm to get rid of it in 8.X itself as old API is there in 6.X and 7.X > Retire using individual versions to request updates during PeerSync > --- > > Key: SOLR-9511 > URL: https://issues.apache.org/jira/browse/SOLR-9511 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Pushkar Raste >Priority: Minor > Fix For: master (7.0) > > Attachments: SOLR-9511.patch > > > We started using version ranges to request updates during PeerSync in > [SOLR-9207| https://issues.apache.org/jira/browse/SOLR-9207]. Using version > ranges was also made default. > There is no need to have code that uses individual versions start Solr 7. > Decommission (remove unnecessary code) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696663#comment-15696663 ] Pushkar Raste commented on SOLR-9546: - I think you reverted changes in the {{CloudMLTQParser}} class as some tests were failing. I added a patch {{SOLR-9546_CloudMLTQParser.patch}} only for {{CloudMLTQParser}} class > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul >Priority: Minor > Attachments: SOLR-9546.patch, SOLR-9546_CloudMLTQParser.patch > > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9689: Attachment: parallelize-peersync.patch > Process updates concurrently during PeerSync > > > Key: SOLR-9689 > URL: https://issues.apache.org/jira/browse/SOLR-9689 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste > Attachments: SOLR-9689.patch, SOLR-9689.patch2, > parallelize-peersync.patch > > > This came up during discussion with [~shalinmangar] > During {{PeerSync}}, updates are applied one a time by looping through the > updates received from the leader. This is slow and could keep node in > recovery for a long time if number of updates to apply were large. > We can apply updates concurrently, this should be no different than what > could happen during normal indexing (we can't really ensure that a replica > will process updates in the same order as the leader or other replicas). > There are few corner cases around dbq we should be careful about. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9689: Attachment: (was: parallelize-peersync.patch) > Process updates concurrently during PeerSync > > > Key: SOLR-9689 > URL: https://issues.apache.org/jira/browse/SOLR-9689 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste > Attachments: SOLR-9689.patch, SOLR-9689.patch2, > parallelize-peersync.patch > > > This came up during discussion with [~shalinmangar] > During {{PeerSync}}, updates are applied one a time by looping through the > updates received from the leader. This is slow and could keep node in > recovery for a long time if number of updates to apply were large. > We can apply updates concurrently, this should be no different than what > could happen during normal indexing (we can't really ensure that a replica > will process updates in the same order as the leader or other replicas). > There are few corner cases around dbq we should be careful about. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9689: Attachment: parallelize-peersync.patch Attached working patch. For my tests I didn't see much improvement (in fact in some cases performance degraded) with parallelization. I could not find any hotspot in the profile. My theory is documents in test are so shorts and simple, that although parallelizing is working functionally, we need to test this with more complex documents and verify performance gains. Most of the parallelization parameters would be subjective and people need to verify which ones work better for them. It also seems performance would suffer if there are relatively high DBQs to applied during DBQs, since updates are applied out of order. > Process updates concurrently during PeerSync > > > Key: SOLR-9689 > URL: https://issues.apache.org/jira/browse/SOLR-9689 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste > Attachments: SOLR-9689.patch, SOLR-9689.patch2, > parallelize-peersync.patch > > > This came up during discussion with [~shalinmangar] > During {{PeerSync}}, updates are applied one a time by looping through the > updates received from the leader. This is slow and could keep node in > recovery for a long time if number of updates to apply were large. > We can apply updates concurrently, this should be no different than what > could happen during normal indexing (we can't really ensure that a replica > will process updates in the same order as the leader or other replicas). > There are few corner cases around dbq we should be careful about. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9689) Process updates concurrently during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15641712#comment-15641712 ] Pushkar Raste edited comment on SOLR-9689 at 11/6/16 12:29 PM: --- [~ichattopadhyaya] - * Even for normal operations, updates for the leader can arrive at the replica in a different order and we already have a way to handle it. We currently store 100 DBQs, to handle reordered updates. If reordered DBQs are detected, DBQs are applied along with a add, https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L201 * I think even for partial updates, corresponding full update is stored in the tlog. I don't think tlog ever stores partial updates like inc value of a field or set value of a field. It always contains entire document with updated values. * I am creating a batch of only 100 updates and only 100 updates in the batch will be applied concurrently. I don't think there will be any issues. We can make size of DBQ list in the DirectUpdateHandler2 configurable as well was (Author: praste): [~ichattopadhyaya] - * Even for normal in normal operations, updates for the leader can arrive at the replica in a different order and we already have a way to handle it. We currently store 100 DBQs, to handle reordered updates. If reordered DBQs are detected, DBQs are applied along with a add, https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L201 * I think even for partial updates, corresponding full update is stored in the tlog. I don't think tlog ever stores partial updates like inc value of a field or set value of a field. It always contains entire document with updated values. * I am creating a batch of only 100 updates and only 100 updates in the batch will be applied concurrently. I don't think there will be any issues. We can make size of DBQ list in the DirectUpdateHandler2 configurable as well > Process updates concurrently during PeerSync > > > Key: SOLR-9689 > URL: https://issues.apache.org/jira/browse/SOLR-9689 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste > Attachments: SOLR-9689.patch, SOLR-9689.patch2 > > > This came up during discussion with [~shalinmangar] > During {{PeerSync}}, updates are applied one a time by looping through the > updates received from the leader. This is slow and could keep node in > recovery for a long time if number of updates to apply were large. > We can apply updates concurrently, this should be no different than what > could happen during normal indexing (we can't really ensure that a replica > will process updates in the same order as the leader or other replicas). > There are few corner cases around dbq we should be careful about. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9689) Process updates concurrently during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15641712#comment-15641712 ] Pushkar Raste commented on SOLR-9689: - [~ichattopadhyaya] - * Even for normal in normal operations, updates for the leader can arrive at the replica in a different order and we already have a way to handle it. We currently store 100 DBQs, to handle reordered updates. If reordered DBQs are detected, DBQs are applied along with a add, https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L201 * I think even for partial updates, corresponding full update is stored in the tlog. I don't think tlog ever stores partial updates like inc value of a field or set value of a field. It always contains entire document with updated values. * I am creating a batch of only 100 updates and only 100 updates in the batch will be applied concurrently. I don't think there will be any issues. We can make size of DBQ list in the DirectUpdateHandler2 configurable as well > Process updates concurrently during PeerSync > > > Key: SOLR-9689 > URL: https://issues.apache.org/jira/browse/SOLR-9689 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste > Attachments: SOLR-9689.patch, SOLR-9689.patch2 > > > This came up during discussion with [~shalinmangar] > During {{PeerSync}}, updates are applied one a time by looping through the > updates received from the leader. This is slow and could keep node in > recovery for a long time if number of updates to apply were large. > We can apply updates concurrently, this should be no different than what > could happen during normal indexing (we can't really ensure that a replica > will process updates in the same order as the leader or other replicas). > There are few corner cases around dbq we should be careful about. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9689: Attachment: SOLR-9689.patch2 A new patch with configurable threshold for parallelism > Process updates concurrently during PeerSync > > > Key: SOLR-9689 > URL: https://issues.apache.org/jira/browse/SOLR-9689 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste > Attachments: SOLR-9689.patch, SOLR-9689.patch2 > > > This came up during discussion with [~shalinmangar] > During {{PeerSync}}, updates are applied one a time by looping through the > updates received from the leader. This is slow and could keep node in > recovery for a long time if number of updates to apply were large. > We can apply updates concurrently, this should be no different than what > could happen during normal indexing (we can't really ensure that a replica > will process updates in the same order as the leader or other replicas). > There are few corner cases around dbq we should be careful about. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9689) Process updates concurrently during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603424#comment-15603424 ] Pushkar Raste edited comment on SOLR-9689 at 10/24/16 10:28 PM: POC for applying updates concurrently. Please review it and let me know if there are gaping issues. I would also appreciate any suggestions to handle out of order {{DBQ}} (I think by default we keep a few {{DBQs}} around to account for out of order upates), may be we can increase the number of {{DBQs}} we keep around if {{DBQs}} have {{PEER_SYNC}} flag set on it. was (Author: praste): POC for applying updates concurrently. Please review it and let me know if there are gaping issues. I would also appreciate any suggestions to handle out of order {{DBQ} (I think by default we keep a few {{DBQs}} around to account for out of order upates), may be we can increase the number of {{DBQs}} we keep around if {{DBQs}} have {{PEER_SYNC}} flag set on it. > Process updates concurrently during PeerSync > > > Key: SOLR-9689 > URL: https://issues.apache.org/jira/browse/SOLR-9689 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste > Attachments: SOLR-9689.patch > > > This came up during discussion with [~shalinmangar] > During {{PeerSync}}, updates are applied one a time by looping through the > updates received from the leader. This is slow and could keep node in > recovery for a long time if number of updates to apply were large. > We can apply updates concurrently, this should be no different than what > could happen during normal indexing (we can't really ensure that a replica > will process updates in the same order as the leader or other replicas). > There are few corner cases around dbq we should be careful about. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9689: Attachment: SOLR-9689.patch POC for applying updates concurrently. Please review it and let me know if there are gaping issues. I would also appreciate any suggestions to handle out of order {{DBQ} (I think by default we keep a few {{DBQs}} around to account for out of order upates), may be we can increase the number of {{DBQs}} we keep around if {{DBQs}} have {{PEER_SYNC}} flag set on it. > Process updates concurrently during PeerSync > > > Key: SOLR-9689 > URL: https://issues.apache.org/jira/browse/SOLR-9689 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste > Attachments: SOLR-9689.patch > > > This came up during discussion with [~shalinmangar] > During {{PeerSync}}, updates are applied one a time by looping through the > updates received from the leader. This is slow and could keep node in > recovery for a long time if number of updates to apply were large. > We can apply updates concurrently, this should be no different than what > could happen during normal indexing (we can't really ensure that a replica > will process updates in the same order as the leader or other replicas). > There are few corner cases around dbq we should be careful about. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9689: Summary: Process updates concurrently during PeerSync (was: Process updates concurrently during {{PeerSync}}) > Process updates concurrently during PeerSync > > > Key: SOLR-9689 > URL: https://issues.apache.org/jira/browse/SOLR-9689 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste > > This came up during discussion with [~shalinmangar] > During {{PeerSync}}, updates are applied one a time by looping through the > updates received from the leader. This is slow and could keep node in > recovery for a long time if number of updates to apply were large. > We can apply updates concurrently, this should be no different than what > could happen during normal indexing (we can't really ensure that a replica > will process updates in the same order as the leader or other replicas). > There are few corner cases around dbq we should be careful about. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-9689) Process updates concurrently during {{PeerSync}}
Pushkar Raste created SOLR-9689: --- Summary: Process updates concurrently during {{PeerSync}} Key: SOLR-9689 URL: https://issues.apache.org/jira/browse/SOLR-9689 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Pushkar Raste This came up during discussion with [~shalinmangar] During {{PeerSync}}, updates are applied one a time by looping through the updates received from the leader. This is slow and could keep node in recovery for a long time if number of updates to apply were large. We can apply updates concurrently, this should be no different than what could happen during normal indexing (we can't really ensure that a replica will process updates in the same order as the leader or other replicas). There are few corner cases around dbq we should be careful about. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15599826#comment-15599826 ] Pushkar Raste commented on SOLR-9506: - Yeah, I looked into it. I will try that approach, if I can get to it before [~noble.paul] applies the patch. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506_POC.patch, SOLR-9506_final.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596148#comment-15596148 ] Pushkar Raste commented on SOLR-9506: - Don't use patch for parallalized computation. Parallel streams in use a shared fork-join pool. A bad actor can create havoc. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9546: Attachment: SOLR-9546_CloudMLTQParser.patch Patch for CloudMLTQParser > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: SOLR-9546.patch, SOLR-9546_CloudMLTQParser.patch > > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9506: Attachment: SOLR-9506.patch Patch with parallalized computation > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, > SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9506: Attachment: SOLR-9506.patch > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, > SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589649#comment-15589649 ] Pushkar Raste commented on SOLR-9506: - [~noble.paul] and [~yo...@apache.org] I was able to put together test to show that current implementation is broken. I will update patch with the test and a fix by EOD today > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585736#comment-15585736 ] Pushkar Raste commented on SOLR-9506: - There is lot of confusion going on here. Would above test fail not fail, if we won't cache per segment indexfingerprint ? If yes, them we should revert the commit, if not we should open a new issue to fix the indexfingerprint computation altogether. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585707#comment-15585707 ] Pushkar Raste commented on SOLR-9506: - I think what Yonik is implying is that, if for some reason, replica does not apply delete properly, index fingerprint would still checkout and that would be a problem. Considering the issues with {{PeerSync}}, should add that option {{recoverWithReplicationOnly}} ? For most of the setups I doubt if people would have hundreds of thousands of records in updateLog in which which almost no one is using {{PeerSync}} anyway > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585688#comment-15585688 ] Pushkar Raste commented on SOLR-9506: - i.e. we really need fix IndexFingerprint computation, whether or not we cache. I will open a separate issue to fix it in that case. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585677#comment-15585677 ] Pushkar Raste commented on SOLR-9506: - I don't see why caching indexfingerprint per segment and using that later would be different than computing indexfingerprint on entire segment by going through one segment at time. I tried to come up with scenarios where caching solution would fail and original solution would not, but could not think of any. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585612#comment-15585612 ] Pushkar Raste commented on SOLR-9506: - I did not upload the patch with parallelStream. In SolrIndexSearcher where we compute and cache per segment indexfingerprint try switching from {{stream()}} to {{parallelStream()}} and you will see {{PeerSyncTest}} fails. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556139#comment-15556139 ] Pushkar Raste commented on SOLR-9506: - I computed hash w/o regard to deleted docs and cached it. All the tests are passing even without doing steps #2 and #3. I also verified that index fingerprint computed on entire index matches to that of fingerprint computed on from individual segments (even after deletions). > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556132#comment-15556132 ] Pushkar Raste commented on SOLR-9506: - I also found some weird behavior. If I use {{parallelStream}} to compute segment fingerprints in parallel. When I reduce it to the index fingerprint on the index searcher, test fails. Why should order of computation and reduction matter in this case? > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9591) Shards and replicas go down when indexing large number of files
[ https://issues.apache.org/jira/browse/SOLR-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553195#comment-15553195 ] Pushkar Raste commented on SOLR-9591: - Are you using MMapDirectory? Using MMApDirectory keep index off heap and reduces pressure on the garbage collector. In my experience G1GC with {{ParallelRefProcEnabled}} helps a lot to have short GC pauses. > Shards and replicas go down when indexing large number of files > --- > > Key: SOLR-9591 > URL: https://issues.apache.org/jira/browse/SOLR-9591 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 5.5.2 >Reporter: Khalid Alharbi > Attachments: solr_log_20161002_1504 > > > Solr shards and replicas go down when indexing a large number of text files > using the default [extracting request > handler|https://cwiki.apache.org/confluence/x/c4DxAQ]. > {code} > curl > 'http://localhost:8983/solr/myCollection/update/extract?literal.id=someId' -F > "myfile=/data/file1.txt" > {code} > and committing after indexing 5,000 files using: > {code} > curl 'http://localhost:8983/solr/myCollection/update?commit=true=json' > {code} > This was on Solr (SolrCloud) version 5.5.2 with an external zookeeper cluster > of five nodes. I also tried this on a single node SolrCloud with the embedded > ZooKeeper but the collection went down as well. In both cases the error > message is always "ERROR null DistributedUpdateProcessor ClusterState says we > are the leader,​ but locally we don't think so" > I managed to come up with a work around that helped me index over 400K files > without getting replicas down with that error message. The work around is to > index 5K files, restart Solr, wait for shards and replicas to get active, > then index the next 5K files, and repeat the previous steps. > If this is not enough to investigate this issue, I will be happy to provide > more details regarding this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9591) Shards and replicas go down when indexing large number of files
[ https://issues.apache.org/jira/browse/SOLR-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552791#comment-15552791 ] Pushkar Raste commented on SOLR-9591: - Have you looked into GC logs to see if there are any long GC pauses. > Shards and replicas go down when indexing large number of files > --- > > Key: SOLR-9591 > URL: https://issues.apache.org/jira/browse/SOLR-9591 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 5.5.2 >Reporter: Khalid Alharbi > Attachments: solr_log_20161002_1504 > > > Solr shards and replicas go down when indexing a large number of text files > using the default [extracting request > handler|https://cwiki.apache.org/confluence/x/c4DxAQ]. > {code} > curl > 'http://localhost:8983/solr/myCollection/update/extract?literal.id=someId' -F > "myfile=/data/file1.txt" > {code} > and committing after indexing 5,000 files using: > {code} > curl 'http://localhost:8983/solr/myCollection/update?commit=true=json' > {code} > This was on Solr (SolrCloud) version 5.5.2 with an external zookeeper cluster > of five nodes. I also tried this on a single node SolrCloud with the embedded > ZooKeeper but the collection went down as well. In both cases the error > message is always "ERROR null DistributedUpdateProcessor ClusterState says we > are the leader,​ but locally we don't think so" > I managed to come up with a work around that helped me index over 400K files > without getting replicas down with that error message. The work around is to > index 5K files, restart Solr, wait for shards and replicas to get active, > then index the next 5K files, and repeat the previous steps. > If this is not enough to investigate this issue, I will be happy to provide > more details regarding this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552125#comment-15552125 ] Pushkar Raste commented on SOLR-9506: - Updated patch, added a scenario in {{PeerSyncTest}} about replica missing an update. Looks like with don't need to remove live docs check {{if (liveDocs != null && !liveDocs.get(doc)) continue;}} > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9506: Attachment: SOLR-9506.patch > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9511) Retire using individual versions to request updates during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9511: Attachment: SOLR-9511.patch > Retire using individual versions to request updates during PeerSync > --- > > Key: SOLR-9511 > URL: https://issues.apache.org/jira/browse/SOLR-9511 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Pushkar Raste >Priority: Minor > Fix For: master (7.0) > > Attachments: SOLR-9511.patch > > > We started using version ranges to request updates during PeerSync in > [SOLR-9207| https://issues.apache.org/jira/browse/SOLR-9207]. Using version > ranges was also made default. > There is no need to have code that uses individual versions start Solr 7. > Decommission (remove unnecessary code) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9546: Attachment: SOLR-9546.patch > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > Attachments: SOLR-9546.patch > > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9506: Attachment: SOLR-9506.patch > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9506: Attachment: (was: SOLR-9506.patch) > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9506: Attachment: SOLR-9506.patch > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506.patch, SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9036) Solr slave is doing full replication (entire index) of index after master restart
[ https://issues.apache.org/jira/browse/SOLR-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15544142#comment-15544142 ] Pushkar Raste commented on SOLR-9036: - Does fix for [SOLR-9446|https://issues.apache.org/jira/browse/SOLR-9446] helps in this situation? > Solr slave is doing full replication (entire index) of index after master > restart > - > > Key: SOLR-9036 > URL: https://issues.apache.org/jira/browse/SOLR-9036 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 5.3.1, 6.0 >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Critical > Labels: impact-high > Fix For: 5.5.2, 5.6, 6.0.1, 6.1, master (7.0) > > Attachments: SOLR-9036.patch, SOLR-9036.patch, SOLR-9036.patch > > > This was first described in the following email: > https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3ccafgnfoyn+xmpxwzwbjuzddeuz7tjqhqktek6q7u8xgstqy3...@mail.gmail.com%3E > I tried Solr 5.3.1 and Solr 6 and I can reproduce the problem. If the master > comes back online before the next polling interval then the slave finds > itself in sync with the master but if the master is down for at least one > polling interval then the slave pulls the entire full index from the master > even if the index has not changed on the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9036) Solr slave is doing full replication (entire index) of index after master restart
[ https://issues.apache.org/jira/browse/SOLR-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15544142#comment-15544142 ] Pushkar Raste edited comment on SOLR-9036 at 10/4/16 2:45 AM: -- Does fix for [SOLR-9446|https://issues.apache.org/jira/browse/SOLR-9446] helps in this situation as well? was (Author: praste): Does fix for [SOLR-9446|https://issues.apache.org/jira/browse/SOLR-9446] helps in this situation? > Solr slave is doing full replication (entire index) of index after master > restart > - > > Key: SOLR-9036 > URL: https://issues.apache.org/jira/browse/SOLR-9036 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 5.3.1, 6.0 >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Critical > Labels: impact-high > Fix For: 5.5.2, 5.6, 6.0.1, 6.1, master (7.0) > > Attachments: SOLR-9036.patch, SOLR-9036.patch, SOLR-9036.patch > > > This was first described in the following email: > https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3ccafgnfoyn+xmpxwzwbjuzddeuz7tjqhqktek6q7u8xgstqy3...@mail.gmail.com%3E > I tried Solr 5.3.1 and Solr 6 and I can reproduce the problem. If the master > comes back online before the next polling interval then the slave finds > itself in sync with the master but if the master is down for at least one > polling interval then the slave pulls the entire full index from the master > even if the index has not changed on the master. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543018#comment-15543018 ] Pushkar Raste commented on SOLR-9546: - This is not a critical issue, and I might be doing premature optimization. > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543017#comment-15543017 ] Pushkar Raste commented on SOLR-9546: - This is not a critical issue, and I might be doing premature optimization. > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511196#comment-15511196 ] Pushkar Raste edited comment on SOLR-9546 at 10/3/16 3:29 PM: -- Got you. I will fix the {{Long getLong(String param, Long def)}} method only. It is not as bad as I initially thought. I don't even think that method is needed. Calling {{Long getLong(String param)}} would do the same thing, won't it? was (Author: praste): Got you. I will fix the {{Long getLong(String param, Long def)}} method only. It is not as bad as initially thought. I don't even think that method is needed. Calling {{Long getLong(String param)}} would do the same thing, won't it? > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9511) Retire using individual versions to request updates during PeerSync
[ https://issues.apache.org/jira/browse/SOLR-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542481#comment-15542481 ] Pushkar Raste commented on SOLR-9511: - We are planning to set num records in ulog to a very high number. If that number is too high, leader may run into issues (throw OOM), when replica asks for high number of updates. In such a case we will have to request updates in chunks/batches. In preparation of that, we should keep {{PeerSync.requestVersions()}} logic simple. This is ticket is to track effort for removing old way of using individual versions to request updates. > Retire using individual versions to request updates during PeerSync > --- > > Key: SOLR-9511 > URL: https://issues.apache.org/jira/browse/SOLR-9511 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: master (7.0) >Reporter: Pushkar Raste >Priority: Minor > Fix For: master (7.0) > > > We started using version ranges to request updates during PeerSync in > [SOLR-9207| https://issues.apache.org/jira/browse/SOLR-9207]. Using version > ranges was also made default. > There is no need to have code that uses individual versions start Solr 7. > Decommission (remove unnecessary code) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532555#comment-15532555 ] Pushkar Raste commented on SOLR-9506: - Discussed with [~noble.paul] We should cache fingerprint for a segment only if *maxVersion specified* > *max version in the segment* > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9369) SolrCloud should not compare commitTimeMSec to see if replicas are in sync
[ https://issues.apache.org/jira/browse/SOLR-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530492#comment-15530492 ] Pushkar Raste commented on SOLR-9369: - [SOLR-9446|https://issues.apache.org/jira/browse/SOLR-9446] might help as it provide alternate way to check if replicas are in sync or not. > SolrCloud should not compare commitTimeMSec to see if replicas are in sync > -- > > Key: SOLR-9369 > URL: https://issues.apache.org/jira/browse/SOLR-9369 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker > > Today the replication code we compare if two replicas are in sync by checking > the commit timestamp ( "commitTimeMSec" ) > This made sense for master slave but I don't think is useful for SolrCloud > since different replicas will commit at different times. We should not check > for this in SolrCloud mode. > Ramkumar noted this on SOLR-7859 as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523564#comment-15523564 ] Pushkar Raste commented on SOLR-9506: - I think what [~ichattopadhyaya] is hinting at, is that if {{numDocs}} account only for live (active) docs, then once documents are deleted in a segment, {{numDocs}} in the cached fingerprint might be wrong. Surprising, following test cases passed with my POC 1. {{PeerSyncTest}} 2. {{PeerSyncReplicationTest}} 3. {{SyncSliceTest}} In the worst case, we can atleast parallalize fingerprint computation. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523566#comment-15523566 ] Pushkar Raste commented on SOLR-9506: - Adding [~ysee...@gmail.com] in the loop > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9506: Comment: was deleted (was: In short you are suggesting that when we cache fingerprint for individual segments, we keep a list of version numbers in those segments around? That would be billions of {{Long}} values cached, which might be counter-productive,) > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523273#comment-15523273 ] Pushkar Raste commented on SOLR-9506: - In short you are suggesting that when we cache fingerprint for individual segments, we keep a list of version numbers in those segments around? That would be billions of {{Long}} values cached, which might be counter-productive, > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > Attachments: SOLR-9506_POC.patch > > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521130#comment-15521130 ] Pushkar Raste edited comment on SOLR-9506 at 9/25/16 5:14 PM: -- POC/Initial commit - https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13 There are two issues we still need to solve. * How to compute {{versionsInHash}} from {{versionsInHash}} of individual segments. We can not use current {{versionsHash}} (unless we cache all the individual version numbers), as it is not additive. Consider following scenario *Leader segments, versions and versionsHash* *seg1* : versions: 100, 101, 102 versionHash: hash(100) + hash(101) + hash(102) *seg2*: versions: 103, 104, 105 versionHash: hash(103) + hash(104) + hash(105) \\ \\ *Replica segments, versions and hash* *seg1*: versions: 100, 101 versionHash: hash(100) + hash(101) *seg2*: versions: 102, 103, 104, 105 versionHash: hash(102) + hash(103) + hash(104) + hash(105) \\ \\Leader and Replica are essentially in sync, however using current method there is no way to compute and ensure cumulative {{versionHash}} of leader and replica would match. \\ \\Even if decide not to cache {{IndexFingerprint}} per segment but just to parallalize the computation, I think we still would run into issue mentioned above. * I still need to figure out how to keep cache in {{DefaultSolrCoreState}}, so that we can reuse {{IndexFingerprint}} of individual segments when a new Searcher is opened. was (Author: praste): POC/Initial commit - https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13 There are two issues we still need to solve. * How to compute `versionsInHash` from `versionsInHash` of individual segments. We can not use current `versionsHash` (unless we cache all the individual version numbers), as it is not additive. Consider following scenario *Leader segments, versions and hash* *seg1* : versions: 100, 101, 102 versionHash: hash(100) + hash(101) + hash(102) *seg2*: versions: 103, 104, 105 versionHash: hash(103) + hash(104) + hash(105) \\ \\ *Replica segments, versions and hash* *seg1*: versions: 100, 101 versionHash: hash(100) + hash(101) *seg2*: versions: 102, 103, 104, 105 versionHash: hash(102) + hash(103) + hash(104) + hash(105) \\ \\Leader and Replica are essentially in sync, however using current method there is no way to compute and ensure cumulative `versionHash` of leader and replica would match * I still need to figure out how to keep cache in `DefaultSolrCoreState`, so that we can reuse `IndexFingerprint` of individual segments when a new Searcher is opened. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment
[ https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521130#comment-15521130 ] Pushkar Raste commented on SOLR-9506: - POC/Initial commit - https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13 There are two issues we still need to solve. * How to compute `versionsInHash` from `versionsInHash` of individual segments. We can not use current `versionsHash` (unless we cache all the individual version numbers), as it is not additive. Consider following scenario *Leader segments, versions and hash* *seg1* : versions: 100, 101, 102 versionHash: hash(100) + hash(101) + hash(102) *seg2*: versions: 103, 104, 105 versionHash: hash(103) + hash(104) + hash(105) \\ \\ *Replica segments, versions and hash* *seg1*: versions: 100, 101 versionHash: hash(100) + hash(101) *seg2*: versions: 102, 103, 104, 105 versionHash: hash(102) + hash(103) + hash(104) + hash(105) \\ \\Leader and Replica are essentially in sync, however using current method there is no way to compute and ensure cumulative `versionHash` of leader and replica would match * I still need to figure out how to keep cache in `DefaultSolrCoreState`, so that we can reuse `IndexFingerprint` of individual segments when a new Searcher is opened. > cache IndexFingerprint for each segment > --- > > Key: SOLR-9506 > URL: https://issues.apache.org/jira/browse/SOLR-9506 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul > > The IndexFingerprint is cached per index searcher. it is quite useless during > high throughput indexing. If the fingerprint is cached per segment it will > make it vastly more efficient to compute the fingerprint -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9310) PeerSync fails on a node restart due to IndexFingerPrint mismatch
[ https://issues.apache.org/jira/browse/SOLR-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520747#comment-15520747 ] Pushkar Raste edited comment on SOLR-9310 at 9/25/16 12:51 PM: --- I went through logs at https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-MacOSX/429/consoleFull If PeerSync was unsuccessful I would expect to see a line like {{o.a.s.u.PeerSync Fingerprint comparison: -1}} However, I don't see such line. I could think of two scenarios that could break the test * data directory could get deleted while a node is brought down, since data directory is created in {{temp}}. Upon restart replica would have no frame of reference and will have to fall back on replication. * we need a better check than relying number of requests made to {{ReplicationHandler}} was (Author: praste): I went through logs in the failed test email notification but those are truncated. Where can I look at the entire build.log for the test. Only thing I could think of at this point is data directory could get deleted while a node is brought down, since data directory is created in {{temp}}. Upon restart replica would have no frame of reference and will have to fall back on replication. > PeerSync fails on a node restart due to IndexFingerPrint mismatch > - > > Key: SOLR-9310 > URL: https://issues.apache.org/jira/browse/SOLR-9310 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul > Fix For: 5.5.3, 6.3, trunk > > Attachments: PeerSync_3Node_Setup.jpg, PeerSync_Experiment.patch, > SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, > SOLR-9310.patch, SOLR-9310.patch, SOLR-9310_3ReplicaTest.patch, > SOLR-9310_5x.patch, SOLR-9310_final.patch > > > I found that Peer Sync fails if a node restarts and documents were indexed > while node was down. IndexFingerPrint check fails after recovering node > applies updates. > This happens only when node restarts and not if node just misses updates due > reason other than it being down. > Please check attached patch for the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9310) PeerSync fails on a node restart due to IndexFingerPrint mismatch
[ https://issues.apache.org/jira/browse/SOLR-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520747#comment-15520747 ] Pushkar Raste commented on SOLR-9310: - I went through logs in the failed test email notification but those are truncated. Where can I look at the entire build.log for the test. Only thing I could think of at this point is data directory could get deleted while a node is brought down, since data directory is created in {{temp}}. Upon restart replica would have no frame of reference and will have to fall back on replication. > PeerSync fails on a node restart due to IndexFingerPrint mismatch > - > > Key: SOLR-9310 > URL: https://issues.apache.org/jira/browse/SOLR-9310 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Assignee: Noble Paul > Fix For: 5.5.3, 6.3, trunk > > Attachments: PeerSync_3Node_Setup.jpg, PeerSync_Experiment.patch, > SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, > SOLR-9310.patch, SOLR-9310.patch, SOLR-9310_3ReplicaTest.patch, > SOLR-9310_5x.patch, SOLR-9310_final.patch > > > I found that Peer Sync fails if a node restarts and documents were indexed > while node was down. IndexFingerPrint check fails after recovering node > applies updates. > This happens only when node restarts and not if node just misses updates due > reason other than it being down. > Please check attached patch for the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516844#comment-15516844 ] Pushkar Raste edited comment on SOLR-9546 at 9/23/16 4:08 PM: -- I went through usage of most of the methods that return Wrapper types. I think `SolrParams` class is encouraging usage of wrapper types (or people might are just missing the fact that they might end up creating lot of wrapper objects). Here are few are some places which can use primitive types by passing a default value {{SolrParams.getInt()}} * {{HashQParser.parse()}} * {{TextLogisticRegressionQParser.parse()}} * {{CloudMLTQParser.parse()}} * {{SimpleMLTQParser.parse()}} {{getBool()}} * {{ZkController.rejoinShardElection()}} * {{DumpRequestHandler.handleRequestBody()}} * {{PingRequestHandler.handleRequestBody()}} * {{MoreLikeThisComponent.process()}} * {{BinaryResponseWriter.write()}} * {{JSONResponseWriter.write()}} * {{PHPResponseWriter.write()}} * {{XMLResponseWriter.write()}} JVM might do something smart for `Boolean` type, since there are only two possible values. There are some *test* classes as well. There are some other classes that do depend upon values being `null`. * I can modify all the places mentioned above to call get(param, df) version, or * We can simply add `getPrimitive()` methods that return default value in absence of a param, to make it clear that these methods would return a primitive Another possibility, I am overthinking here :-), and this ticket can be closed. was (Author: praste): I went through usage of most of the methods that return Wrapper types. I think `SolrParams` class is encouraging usage of wrapper types (or people might are just missing the fact that they might end up creating lot of wrapper objects). Here are few are some places which can use primitive types by passing a default value {{SolrParams.getInt()}} * {{HashQParser.parse()}} * {{TextLogisticRegressionQParser.parse()}} * {{CloudMLTQParser.parse()}} * {{SimpleMLTQParser.parse()}} {{getBool()}} * {{ZkController.rejoinShardElection()}} * {{DumpRequestHandler.handleRequestBody()}} * {{PingRequestHandler.handleRequestBody()}} * {{MoreLikeThisComponent.process()}} * {{BinaryResponseWriter.write()}} * {{JSONResponseWriter.write()}} * {{PHPResponseWriter.write()}} * {{XMLResponseWriter.write()}} There are some *test* classes as well. There are some other classes that do depend upon values being `null`. * I can modify all the places mentioned above to call get(param, df) version, or * We can simply add `getPrimitive()` methods that return default value in absence of a param, to make it clear that these methods would return a primitive Another possibility, I am overthinking here :-), and this ticket can be closed. > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516844#comment-15516844 ] Pushkar Raste commented on SOLR-9546: - I went through usage of most of the methods that return Wrapper types. I think `SolrParams` class is encouraging usage of wrapper types (or people might are just missing the fact that they might end up creating lot of wrapper objects). Here are few are some places which can use primitive types by passing a default value {{SolrParams.getInt()}} * {{HashQParser.parse()}} * {{TextLogisticRegressionQParser.parse()}} * {{CloudMLTQParser.parse()}} * {{SimpleMLTQParser.parse()}} {{getBool()}} * {{ZkController.rejoinShardElection()}} * {{DumpRequestHandler.handleRequestBody()}} * {{PingRequestHandler.handleRequestBody()}} * {{MoreLikeThisComponent.process()}} * {{BinaryResponseWriter.write()}} * {{JSONResponseWriter.write()}} * {{PHPResponseWriter.write()}} * {{XMLResponseWriter.write()}} There are some *test* classes as well. There are some other classes that do depend upon values being `null`. * I can modify all the places mentioned above to call get(param, df) version, or * We can simply add `getPrimitive()` methods that return default value in absence of a param, to make it clear that these methods would return a primitive Another possibility, I am overthinking here :-), and this ticket can be closed. > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511196#comment-15511196 ] Pushkar Raste edited comment on SOLR-9546 at 9/21/16 9:16 PM: -- Got you. I will fix the {{Long getLong(String param, Long def)}} method only. It is not as bad as initially thought. I don't even think that method is needed. Calling {{Long getLong(String param)}} would do the same thing, won't it? was (Author: praste): Got you. I will fix the {{Long getLong(String param, Long def)}} method only. It is not as bad as initially thought > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511196#comment-15511196 ] Pushkar Raste commented on SOLR-9546: - Got you. I will fix the {{Long getLong(String param, Long def)}} method only. It is not as bad as initially thought > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pushkar Raste updated SOLR-9546: Comment: was deleted (was: That was just one example. Check {{getBool()}}, {{getFieldBool()}} methods those have the exact same problem, and there are many more. I am not sure which way we should go (primitive vs Wrapped types) but I am inclined towards primitive types.) > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511132#comment-15511132 ] Pushkar Raste edited comment on SOLR-9546 at 9/21/16 8:49 PM: -- That was just one example. Check {{getBool()}}, {{getFieldBool()}} methods those have the exact same problem, and there are many more. I am not sure which way we should go (primitive vs Wrapped types) but I am inclined towards primitive types. was (Author: praste): That was just one example check {{getBool()}}, {{getFieldBool()}} methods those have the exact same problem, and there are many more. I am not sure which way we should go (primitive vs Wrapped types) but I am inclined towards primitive types. > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
[ https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511132#comment-15511132 ] Pushkar Raste commented on SOLR-9546: - That was just one example check {{getBool()}}, {{getFieldBool()}} methods those have the exact same problem, and there are many more. I am not sure which way we should go (primitive vs Wrapped types) but I am inclined towards primitive types. > There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class > -- > > Key: SOLR-9546 > URL: https://issues.apache.org/jira/browse/SOLR-9546 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pushkar Raste >Priority: Minor > > Here is an excerpt > {code} > public Long getLong(String param, Long def) { > String val = get(param); > try { > return val== null ? def : Long.parseLong(val); > } > catch( Exception ex ) { > throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, > ex.getMessage(), ex ); > } > } > {code} > {{Long.parseLong()}} returns a primitive type but since method expect to > return a {{Long}}, it needs to be wrapped. There are many more method like > that. We might be creating a lot of unnecessary objects here. > I am not sure if JVM catches upto it and somehow optimizes it if these > methods are called enough times (or may be compiler does some modifications > at compile time) > Let me know if I am thinking of some premature optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org