[jira] [Comment Edited] (SOLR-11475) Endless loop and OOM in PeerSync

2017-11-24 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264959#comment-16264959
 ] 

Pushkar Raste edited comment on SOLR-11475 at 11/24/17 5:55 PM:


Here is a working patch. 
I came up completely hypothetical scenario where one replica has a version 
*ver* and other has version *-version*.

[~shalinmangar] / [~noble.paul] / [~ichattopadhyaya] 
  Can you please take look at the patch.

I still don't understand how can one get into this scenario but robust check 
wouldn't hurt. 



was (Author: praste):
Here is a working patch. 
I came up completely hypothetical scenario where one replica has a version 
*ver* and other has version *-version*.

[~shalinmangar] / [~noble.paul] / [~ichattopadhyaya] 
  Can you please take look at the patch.

I still don't understand how can one into get this scenario but robust check 
wouldn't hurt. 


> Endless loop and OOM in PeerSync
> 
>
> Key: SOLR-11475
> URL: https://issues.apache.org/jira/browse/SOLR-11475
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrey Kudryavtsev
> Attachments: SOLR-11475.patch
>
>
> After problem described in SOLR-11459, I restarted cluster and got OOM on 
> start. 
> [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539]
>  contains this logic: 
> {code}
> while (otherUpdatesIndex >= 0) {
>   // we have run out of ourUpdates, pick up all the remaining versions 
> from the other versions
>   if (ourUpdatesIndex < 0) {
> String range = otherVersions.get(otherUpdatesIndex) + "..." + 
> otherVersions.get(0);
> rangesToRequest.add(range);
> totalRequestedVersions += otherUpdatesIndex + 1;
> break;
>   }
>   // stop when the entries get old enough that reorders may lead us to 
> see updates we don't need
>   if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> ourLowThreshold) break;
>   if (ourUpdates.get(ourUpdatesIndex).longValue() == 
> otherVersions.get(otherUpdatesIndex).longValue()) {
> ourUpdatesIndex--;
> otherUpdatesIndex--;
>   } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < 
> Math.abs(otherVersions.get(otherUpdatesIndex))) {
> ourUpdatesIndex--;
>   } else {
> long rangeStart = otherVersions.get(otherUpdatesIndex);
> while ((otherUpdatesIndex < otherVersions.size())
> && (Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> Math.abs(ourUpdates.get(ourUpdatesIndex {
>   otherUpdatesIndex--;
>   totalRequestedVersions++;
> }
> // construct range here
> rangesToRequest.add(rangeStart + "..." + 
> otherVersions.get(otherUpdatesIndex + 1));
>   }
> }
> {code}
> If at some point there will be
> {code} ourUpdates.get(ourUpdatesIndex) = 
> -otherVersions.get(otherUpdatesIndex) {code}
> loop will never end. It will add same string again and again into 
> {{rangesToRequest}} until process runs out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11475) Endless loop and OOM in PeerSync

2017-11-23 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264959#comment-16264959
 ] 

Pushkar Raste edited comment on SOLR-11475 at 11/24/17 6:03 AM:


Here is a working patch. 
I came up completely hypothetical scenario where one replica has a version 
*ver* and other has version *-version*.

[~shalinmangar] / [~noble.paul] / [~ichattopadhyaya] 
  Can you please take look at the patch.

I still don't understand how can one into get this scenario but robust check 
wouldn't hurt. 



was (Author: praste):
Here is a working patch. 
I came up completely hypothetical scenario where one replica has a version 
*ver* and other has version *-version*.

[~shalinmangar] / [~noble.paul] Can you please take look at the patch.

I still don't understand how can one into get this scenario but robust check 
wouldn't hurt. 


> Endless loop and OOM in PeerSync
> 
>
> Key: SOLR-11475
> URL: https://issues.apache.org/jira/browse/SOLR-11475
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrey Kudryavtsev
> Attachments: SOLR-11475.patch
>
>
> After problem described in SOLR-11459, I restarted cluster and got OOM on 
> start. 
> [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539]
>  contains this logic: 
> {code}
> while (otherUpdatesIndex >= 0) {
>   // we have run out of ourUpdates, pick up all the remaining versions 
> from the other versions
>   if (ourUpdatesIndex < 0) {
> String range = otherVersions.get(otherUpdatesIndex) + "..." + 
> otherVersions.get(0);
> rangesToRequest.add(range);
> totalRequestedVersions += otherUpdatesIndex + 1;
> break;
>   }
>   // stop when the entries get old enough that reorders may lead us to 
> see updates we don't need
>   if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> ourLowThreshold) break;
>   if (ourUpdates.get(ourUpdatesIndex).longValue() == 
> otherVersions.get(otherUpdatesIndex).longValue()) {
> ourUpdatesIndex--;
> otherUpdatesIndex--;
>   } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < 
> Math.abs(otherVersions.get(otherUpdatesIndex))) {
> ourUpdatesIndex--;
>   } else {
> long rangeStart = otherVersions.get(otherUpdatesIndex);
> while ((otherUpdatesIndex < otherVersions.size())
> && (Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> Math.abs(ourUpdates.get(ourUpdatesIndex {
>   otherUpdatesIndex--;
>   totalRequestedVersions++;
> }
> // construct range here
> rangesToRequest.add(rangeStart + "..." + 
> otherVersions.get(otherUpdatesIndex + 1));
>   }
> }
> {code}
> If at some point there will be
> {code} ourUpdates.get(ourUpdatesIndex) = 
> -otherVersions.get(otherUpdatesIndex) {code}
> loop will never end. It will add same string again and again into 
> {{rangesToRequest}} until process runs out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11475) Endless loop and OOM in PeerSync

2017-11-23 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-11475:
-
Attachment: SOLR-11475.patch

Here is a working patch. 
I came up completely hypothetical scenario where one replica has a version 
*ver* and other has version *-version*.

[~shalinmangar] / [~noble.paul] Can you please take look at the patch.

I still don't understand how can one into get this scenario but robust check 
wouldn't hurt. 


> Endless loop and OOM in PeerSync
> 
>
> Key: SOLR-11475
> URL: https://issues.apache.org/jira/browse/SOLR-11475
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrey Kudryavtsev
> Attachments: SOLR-11475.patch
>
>
> After problem described in SOLR-11459, I restarted cluster and got OOM on 
> start. 
> [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539]
>  contains this logic: 
> {code}
> while (otherUpdatesIndex >= 0) {
>   // we have run out of ourUpdates, pick up all the remaining versions 
> from the other versions
>   if (ourUpdatesIndex < 0) {
> String range = otherVersions.get(otherUpdatesIndex) + "..." + 
> otherVersions.get(0);
> rangesToRequest.add(range);
> totalRequestedVersions += otherUpdatesIndex + 1;
> break;
>   }
>   // stop when the entries get old enough that reorders may lead us to 
> see updates we don't need
>   if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> ourLowThreshold) break;
>   if (ourUpdates.get(ourUpdatesIndex).longValue() == 
> otherVersions.get(otherUpdatesIndex).longValue()) {
> ourUpdatesIndex--;
> otherUpdatesIndex--;
>   } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < 
> Math.abs(otherVersions.get(otherUpdatesIndex))) {
> ourUpdatesIndex--;
>   } else {
> long rangeStart = otherVersions.get(otherUpdatesIndex);
> while ((otherUpdatesIndex < otherVersions.size())
> && (Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> Math.abs(ourUpdates.get(ourUpdatesIndex {
>   otherUpdatesIndex--;
>   totalRequestedVersions++;
> }
> // construct range here
> rangesToRequest.add(rangeStart + "..." + 
> otherVersions.get(otherUpdatesIndex + 1));
>   }
> }
> {code}
> If at some point there will be
> {code} ourUpdates.get(ourUpdatesIndex) = 
> -otherVersions.get(otherUpdatesIndex) {code}
> loop will never end. It will add same string again and again into 
> {{rangesToRequest}} until process runs out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11216) Make PeerSync more robust

2017-11-20 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260117#comment-16260117
 ] 

Pushkar Raste commented on SOLR-11216:
--

[~caomanhdat]
Can this fail if the leader processes updates out of order e.g. what if leader 
processed updates in the order 6 and has yet to process 5. Now the replica 
requests update 6. However, leader has just finished processing 5 (including a 
soft/hard commit) and when leader calculates index fingerprint up to 6, the 
leader's fingerprint will include version 5 as well. 

Considering all the race conditions, I think making fingerprint robust is 
tricky. 

> Make PeerSync more robust
> -
>
> Key: SOLR-11216
> URL: https://issues.apache.org/jira/browse/SOLR-11216
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> First of all, I will change the issue's title with a better name when I have.
> When digging into SOLR-10126. I found a case that can make peerSync fail.
> * leader and replica receive update from 1 to 4
> * replica stop
> * replica miss updates 5, 6
> * replica start recovery
> ## replica buffer updates 7, 8
> ## replica request versions from leader, 
> ## replica get recent versions which is 1,2,3,4,7,8
> ## in the same time leader receive update 9, so it will return updates from 1 
> to 9 (for request versions)
> ## replica do peersync and request updates 5, 6, 9 from leader
> ## replica apply updates 5, 6, 9. Its index does not have update 7, 8 and 
> maxVersionSpecified for fingerprint is 9, therefore compare fingerprint will 
> fail
> My idea here is why replica request update 9 (step 6) while it knows that 
> updates with lower version ( update 7, 8 ) are on its buffering tlog. Should 
> we request only updates that lower than the lowest update in its buffering 
> tlog ( < 7 )?
> Someone my ask that what if replica won't receive update 9. In that case, 
> leader will put the replica into LIR state, so replica will run recovery 
> process again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync

2017-10-23 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215049#comment-16215049
 ] 

Pushkar Raste commented on SOLR-11475:
--

If you are blocked then you can try to turn using versionRanges off and 
fallback to using individual versions. 

If you can wait for a code fix, I will take a stab at it this weekend. Solution 
I am thinking is keeping a counter and incrementing it for every iteration and 
if we don't break from the outermost `while` loop before `counter > 
Math.max(ourUpdates.size(), otherVersions.size())` then throw an exception. 

or in the `else` before we create a new rage add a check of X and -X and throw 
an exception if that is true

> Endless loop and OOM in PeerSync
> 
>
> Key: SOLR-11475
> URL: https://issues.apache.org/jira/browse/SOLR-11475
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrey Kudryavtsev
>
> After problem described in SOLR-11459, I restarted cluster and got OOM on 
> start. 
> [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539]
>  contains this logic: 
> {code}
> while (otherUpdatesIndex >= 0) {
>   // we have run out of ourUpdates, pick up all the remaining versions 
> from the other versions
>   if (ourUpdatesIndex < 0) {
> String range = otherVersions.get(otherUpdatesIndex) + "..." + 
> otherVersions.get(0);
> rangesToRequest.add(range);
> totalRequestedVersions += otherUpdatesIndex + 1;
> break;
>   }
>   // stop when the entries get old enough that reorders may lead us to 
> see updates we don't need
>   if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> ourLowThreshold) break;
>   if (ourUpdates.get(ourUpdatesIndex).longValue() == 
> otherVersions.get(otherUpdatesIndex).longValue()) {
> ourUpdatesIndex--;
> otherUpdatesIndex--;
>   } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < 
> Math.abs(otherVersions.get(otherUpdatesIndex))) {
> ourUpdatesIndex--;
>   } else {
> long rangeStart = otherVersions.get(otherUpdatesIndex);
> while ((otherUpdatesIndex < otherVersions.size())
> && (Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> Math.abs(ourUpdates.get(ourUpdatesIndex {
>   otherUpdatesIndex--;
>   totalRequestedVersions++;
> }
> // construct range here
> rangesToRequest.add(rangeStart + "..." + 
> otherVersions.get(otherUpdatesIndex + 1));
>   }
> }
> {code}
> If at some point there will be
> {code} ourUpdates.get(ourUpdatesIndex) = 
> -otherVersions.get(otherUpdatesIndex) {code}
> loop will never end. It will add same string again and again into 
> {{rangesToRequest}} until process runs out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11475) Endless loop and OOM in PeerSync

2017-10-20 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212935#comment-16212935
 ] 

Pushkar Raste commented on SOLR-11475:
--

Version numbers are monotonically increasing sequence numbers and for
deletes sequence number is multiplied by -1

I dont think we would ever have version number X in replica's tlog and -X
in leader's (or any other replica's) tlog

Can you provide a valid test case for your issue. I am not in front of
computer right now, however, IIRC tests have token PeerSync in the name.


On Oct 20, 2017 5:54 AM, "Andrey Kudryavtsev (JIRA)" 
wrote:


 [ https://issues.apache.org/jira/browse/SOLR-11475?page=
com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrey Kudryavtsev mentioned you on SOLR-11475
--

I think throwing exception in case of {{ourUpdates.get(ourUpdatesIndex) =
-otherVersions.get(otherUpdatesIndex)}} than OOM

[~praste], [~shalinmangar] What do you think?


comment

Hint: You can mention someone in an issue description or comment by typing
"@" in front of their username.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


> Endless loop and OOM in PeerSync
> 
>
> Key: SOLR-11475
> URL: https://issues.apache.org/jira/browse/SOLR-11475
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrey Kudryavtsev
>
> After problem described in SOLR-11459, I restarted cluster and got OOM on 
> start. 
> [PeerSync#handleVersionsWithRanges|https://github.com/apache/lucene-solr/blob/68bda0be421ce18811e03b229781fd6152fcc04a/solr/core/src/java/org/apache/solr/update/PeerSync.java#L539]
>  contains this logic: 
> {code}
> while (otherUpdatesIndex >= 0) {
>   // we have run out of ourUpdates, pick up all the remaining versions 
> from the other versions
>   if (ourUpdatesIndex < 0) {
> String range = otherVersions.get(otherUpdatesIndex) + "..." + 
> otherVersions.get(0);
> rangesToRequest.add(range);
> totalRequestedVersions += otherUpdatesIndex + 1;
> break;
>   }
>   // stop when the entries get old enough that reorders may lead us to 
> see updates we don't need
>   if (!completeList && Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> ourLowThreshold) break;
>   if (ourUpdates.get(ourUpdatesIndex).longValue() == 
> otherVersions.get(otherUpdatesIndex).longValue()) {
> ourUpdatesIndex--;
> otherUpdatesIndex--;
>   } else if (Math.abs(ourUpdates.get(ourUpdatesIndex)) < 
> Math.abs(otherVersions.get(otherUpdatesIndex))) {
> ourUpdatesIndex--;
>   } else {
> long rangeStart = otherVersions.get(otherUpdatesIndex);
> while ((otherUpdatesIndex < otherVersions.size())
> && (Math.abs(otherVersions.get(otherUpdatesIndex)) < 
> Math.abs(ourUpdates.get(ourUpdatesIndex {
>   otherUpdatesIndex--;
>   totalRequestedVersions++;
> }
> // construct range here
> rangesToRequest.add(rangeStart + "..." + 
> otherVersions.get(otherUpdatesIndex + 1));
>   }
> }
> {code}
> If at some point there will be
> {code} ourUpdates.get(ourUpdatesIndex) = 
> -otherVersions.get(otherUpdatesIndex) {code}
> loop will never end. It will add same string again and again into 
> {{rangesToRequest}} until process runs out of memory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10922) NPE in PeerSync

2017-06-20 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056741#comment-16056741
 ] 

Pushkar Raste commented on SOLR-10922:
--

Is this duplicate of SOLR-9915 ?

> NPE in PeerSync
> ---
>
> Key: SOLR-10922
> URL: https://issues.apache.org/jira/browse/SOLR-10922
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.6
>Reporter: Markus Jelsma
>Priority: Minor
> Fix For: master (7.0)
>
>
> {code}
> Error while trying to recover. 
> core=search_shard2_replica2:java.lang.NullPointerException
>   at org.apache.solr.update.PeerSync.alreadyInSync(PeerSync.java:381)
>   at org.apache.solr.update.PeerSync.sync(PeerSync.java:251)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:439)
>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:284)
>   at 
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10169) PeerSync will hit an NPE on no response errors when looking for fingerprint.

2017-06-20 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056738#comment-16056738
 ] 

Pushkar Raste commented on SOLR-10169:
--

Is this duplicate of SOLR-9915

> PeerSync will hit an NPE on no response errors when looking for fingerprint.
> 
>
> Key: SOLR-10169
> URL: https://issues.apache.org/jira/browse/SOLR-10169
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mark Miller
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10873) Explore a utility for periodically checking the document counts for replicas of a shard

2017-06-12 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047280#comment-16047280
 ] 

Pushkar Raste edited comment on SOLR-10873 at 6/13/17 1:03 AM:
---

There are other advantages too
* You can compute Index fingerprint upto any arbitrary version. Depending on 
tolerance, you can check if fingerprint matches the last version in the second 
from last tlog. No need to differ commits in this case 

* Index fingerprint is cached in SolrCore class and hence even if frequency of 
sync check is high you may not have recompute fingerprint every single time

`RealTimeGetcomponent` already supports a call `processGetFingerprint` while 
working on SOLR-9446

 


was (Author: praste):
There are other advantages too
* You can compute Index fingerprint upto any arbitrary version. Depending on 
tolerance, you can check if fingerprint matches the last version in the second 
from last tlog. No need to differ commits in this case 

* Index fingerprint is cached in SolrCore class and hence even frequency of 
sync check is high you may not have recompute fingerprint every single time

`RealTimeGetcomponent` already supports a call `processGetFingerprint` while 
working on SOLR-9446

 

> Explore a utility for periodically checking the document counts for replicas 
> of a shard
> ---
>
> Key: SOLR-10873
> URL: https://issues.apache.org/jira/browse/SOLR-10873
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>
> We've had several situations "in the field" and on the user's list where the 
> number of documents on different replicas of the same shard differ. I've also 
> seen situations where the numbers are wildly different (two orders of 
> magnitude). I can force this situation by, say, taking down nodes, adding 
> replicas that become the leader then starting the nodes back up. But it 
> doesn't matter whether the discrepancy is a result of "pilot error" or a 
> problem with the code, in either case it would be useful to flag it.
> Straw-man proposal:
> We create a processor (modeled on DocExpirationUpdateProcessorFactory 
> perhaps?) that periodically wakes up and checks that each replica in the 
> given shard has the same document count (and perhaps other checks TBD?). Send 
> some kind of notification if a problem was detected.
> Issues:
> 1> this will require some way to deal with the differing commit times. 
> 1a> If we require a timestamp on each document we could check the config file 
> to see the autocommit interval and, say, check NOW-(2 x opensearcher 
> interval). In that case the config would just require the field to use be 
> specified.
> 1b> we could require that part of the configuration is a query to use to 
> check document counts. I kind of like this one.
> 2> How to let the admins know a discrepancy was found? e-mail? ERROR level 
> log message? Other?
> 3> How does this fit into the autoscaling initiative? This is a "monitor the 
> system and do something" item. If we go forward with this we should do it 
> with an eye toward fitting it in that framework.
> 3a> Is there anything we can do to auto-correct this situation? 
> Auto-correction could be tricky. Heuristics like "make the replica with the 
> most documents the leader and force full index replication on all the 
> replicas that don't agree" seem dangerous. 
> 4> How to keep the impact minimal? The simple approach would be for each 
> replica to check all other replicas in the shard. So say there are 10 
> replicas on a single shard, that would be 90 queries. It would suffice for 
> just one of those to check the other 9, not have all 10 check the other nine. 
> Maybe restrict the checker to be the leader? Or otherwise just make it one 
> replica/shard that does the checking?
> 5> It's probably useful to add a collections API call to fire this off 
> manually. Or maybe as part of CHECKSTATUS?
> What do people think?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10873) Explore a utility for periodically checking the document counts for replicas of a shard

2017-06-12 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047280#comment-16047280
 ] 

Pushkar Raste edited comment on SOLR-10873 at 6/13/17 1:02 AM:
---

There are other advantages too
* You can compute Index fingerprint upto any arbitrary version. Depending on 
tolerance, you can check if fingerprint matches the last version in the second 
from last tlog. No need to differ commits in this case 

* Index fingerprint is cached in SolrCore class and hence even frequency of 
sync check is high you may not have recompute fingerprint every single time

`RealTimeGetcomponent` already supports a call `processGetFingerprint` while 
working on SOLR-9446

 


was (Author: praste):
There are other advantages too
* You can compute Index fingerprint upto any arbitrary version. Depending on 
tolerance, you can check if fingerprint matches the last version in second from 
last version in the tlog. No need to differ commits in this case 

* Index fingerprint is cached in SolrCore class and hence even frequency of 
sync check is high you may not have recompute fingerprint every single time

`RealTimeGetcomponent` already supports a call `processGetFingerprint` while 
working on SOLR-9446

 

> Explore a utility for periodically checking the document counts for replicas 
> of a shard
> ---
>
> Key: SOLR-10873
> URL: https://issues.apache.org/jira/browse/SOLR-10873
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>
> We've had several situations "in the field" and on the user's list where the 
> number of documents on different replicas of the same shard differ. I've also 
> seen situations where the numbers are wildly different (two orders of 
> magnitude). I can force this situation by, say, taking down nodes, adding 
> replicas that become the leader then starting the nodes back up. But it 
> doesn't matter whether the discrepancy is a result of "pilot error" or a 
> problem with the code, in either case it would be useful to flag it.
> Straw-man proposal:
> We create a processor (modeled on DocExpirationUpdateProcessorFactory 
> perhaps?) that periodically wakes up and checks that each replica in the 
> given shard has the same document count (and perhaps other checks TBD?). Send 
> some kind of notification if a problem was detected.
> Issues:
> 1> this will require some way to deal with the differing commit times. 
> 1a> If we require a timestamp on each document we could check the config file 
> to see the autocommit interval and, say, check NOW-(2 x opensearcher 
> interval). In that case the config would just require the field to use be 
> specified.
> 1b> we could require that part of the configuration is a query to use to 
> check document counts. I kind of like this one.
> 2> How to let the admins know a discrepancy was found? e-mail? ERROR level 
> log message? Other?
> 3> How does this fit into the autoscaling initiative? This is a "monitor the 
> system and do something" item. If we go forward with this we should do it 
> with an eye toward fitting it in that framework.
> 3a> Is there anything we can do to auto-correct this situation? 
> Auto-correction could be tricky. Heuristics like "make the replica with the 
> most documents the leader and force full index replication on all the 
> replicas that don't agree" seem dangerous. 
> 4> How to keep the impact minimal? The simple approach would be for each 
> replica to check all other replicas in the shard. So say there are 10 
> replicas on a single shard, that would be 90 queries. It would suffice for 
> just one of those to check the other 9, not have all 10 check the other nine. 
> Maybe restrict the checker to be the leader? Or otherwise just make it one 
> replica/shard that does the checking?
> 5> It's probably useful to add a collections API call to fire this off 
> manually. Or maybe as part of CHECKSTATUS?
> What do people think?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10873) Explore a utility for periodically checking the document counts for replicas of a shard

2017-06-12 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047280#comment-16047280
 ] 

Pushkar Raste commented on SOLR-10873:
--

There are other advantages too
* You can compute Index fingerprint upto any arbitrary version. Depending on 
tolerance, you can check if fingerprint matches the last version in second from 
last version in the tlog. No need to differ commits in this case 

* Index fingerprint is cached in SolrCore class and hence even frequency of 
sync check is high you may not have recompute fingerprint every single time

`RealTimeGetcomponent` already supports a call `processGetFingerprint` while 
working on SOLR-9446

 

> Explore a utility for periodically checking the document counts for replicas 
> of a shard
> ---
>
> Key: SOLR-10873
> URL: https://issues.apache.org/jira/browse/SOLR-10873
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>
> We've had several situations "in the field" and on the user's list where the 
> number of documents on different replicas of the same shard differ. I've also 
> seen situations where the numbers are wildly different (two orders of 
> magnitude). I can force this situation by, say, taking down nodes, adding 
> replicas that become the leader then starting the nodes back up. But it 
> doesn't matter whether the discrepancy is a result of "pilot error" or a 
> problem with the code, in either case it would be useful to flag it.
> Straw-man proposal:
> We create a processor (modeled on DocExpirationUpdateProcessorFactory 
> perhaps?) that periodically wakes up and checks that each replica in the 
> given shard has the same document count (and perhaps other checks TBD?). Send 
> some kind of notification if a problem was detected.
> Issues:
> 1> this will require some way to deal with the differing commit times. 
> 1a> If we require a timestamp on each document we could check the config file 
> to see the autocommit interval and, say, check NOW-(2 x opensearcher 
> interval). In that case the config would just require the field to use be 
> specified.
> 1b> we could require that part of the configuration is a query to use to 
> check document counts. I kind of like this one.
> 2> How to let the admins know a discrepancy was found? e-mail? ERROR level 
> log message? Other?
> 3> How does this fit into the autoscaling initiative? This is a "monitor the 
> system and do something" item. If we go forward with this we should do it 
> with an eye toward fitting it in that framework.
> 3a> Is there anything we can do to auto-correct this situation? 
> Auto-correction could be tricky. Heuristics like "make the replica with the 
> most documents the leader and force full index replication on all the 
> replicas that don't agree" seem dangerous. 
> 4> How to keep the impact minimal? The simple approach would be for each 
> replica to check all other replicas in the shard. So say there are 10 
> replicas on a single shard, that would be 90 queries. It would suffice for 
> just one of those to check the other 9, not have all 10 check the other nine. 
> Maybe restrict the checker to be the leader? Or otherwise just make it one 
> replica/shard that does the checking?
> 5> It's probably useful to add a collections API call to fire this off 
> manually. Or maybe as part of CHECKSTATUS?
> What do people think?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10873) Explore a utility for periodically checking the document counts for replicas of a shard

2017-06-12 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047218#comment-16047218
 ] 

Pushkar Raste commented on SOLR-10873:
--

 What if count is same but actual data is different. 
Can we use Index fingerprint instead to verify if replicas are in sync? 

> Explore a utility for periodically checking the document counts for replicas 
> of a shard
> ---
>
> Key: SOLR-10873
> URL: https://issues.apache.org/jira/browse/SOLR-10873
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>
> We've had several situations "in the field" and on the user's list where the 
> number of documents on different replicas of the same shard differ. I've also 
> seen situations where the numbers are wildly different (two orders of 
> magnitude). I can force this situation by, say, taking down nodes, adding 
> replicas that become the leader then starting the nodes back up. But it 
> doesn't matter whether the discrepancy is a result of "pilot error" or a 
> problem with the code, in either case it would be useful to flag it.
> Straw-man proposal:
> We create a processor (modeled on DocExpirationUpdateProcessorFactory 
> perhaps?) that periodically wakes up and checks that each replica in the 
> given shard has the same document count (and perhaps other checks TBD?). Send 
> some kind of notification if a problem was detected.
> Issues:
> 1> this will require some way to deal with the differing commit times. 
> 1a> If we require a timestamp on each document we could check the config file 
> to see the autocommit interval and, say, check NOW-(2 x opensearcher 
> interval). In that case the config would just require the field to use be 
> specified.
> 1b> we could require that part of the configuration is a query to use to 
> check document counts. I kind of like this one.
> 2> How to let the admins know a discrepancy was found? e-mail? ERROR level 
> log message? Other?
> 3> How does this fit into the autoscaling initiative? This is a "monitor the 
> system and do something" item. If we go forward with this we should do it 
> with an eye toward fitting it in that framework.
> 3a> Is there anything we can do to auto-correct this situation? 
> Auto-correction could be tricky. Heuristics like "make the replica with the 
> most documents the leader and force full index replication on all the 
> replicas that don't agree" seem dangerous. 
> 4> How to keep the impact minimal? The simple approach would be for each 
> replica to check all other replicas in the shard. So say there are 10 
> replicas on a single shard, that would be 90 queries. It would suffice for 
> just one of those to check the other 9, not have all 10 check the other nine. 
> Maybe restrict the checker to be the leader? Or otherwise just make it one 
> replica/shard that does the checking?
> 5> It's probably useful to add a collections API call to fire this off 
> manually. Or maybe as part of CHECKSTATUS?
> What do people think?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7750) Findbugs bug fixes

2017-03-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15941971#comment-15941971
 ] 

Pushkar Raste commented on LUCENE-7750:
---

Did you got a chance to look at my patch for SOLR-10080 and as mentioned in one 
of the mails work I had done in my fork 
https://github.com/praste/lucene-solr/tree/findbugs-lucene

> Findbugs bug fixes
> --
>
> Key: LUCENE-7750
> URL: https://issues.apache.org/jira/browse/LUCENE-7750
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Daniel Jelinski
>Priority: Minor
>
> Holder issue to keep track of Findbugs-related fixes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] (SOLR-10080) Fix some issues reported by findbugs

2017-01-31 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-10080:
-
Attachment: SOLR-10080.patch

I have fixed some of the issues reported by findbugs (and added some 
descriptive comments in some cases).

In some cases when, I wasn't sure if code findbugs reported as issue was 
intentionally written like that or not.

Once, I get feedback, I will polish the patch to remove unwanted comments (or 
remove code that I have just commented for now)

> Fix some issues reported by findbugs 
> -
>
> Key: SOLR-10080
> URL: https://issues.apache.org/jira/browse/SOLR-10080
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
> Attachments: SOLR-10080.patch
>
>
> I ran a findbugs analysis on code and found a lot of issues. I found issues 
> in both lucene and solr, however, I am not sure whether Solr developers 
> should fix issues like this in the lucene code or not.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] (SOLR-10080) Fix some issues reported by findbugs

2017-01-31 Thread Pushkar Raste (JIRA)
Pushkar Raste created SOLR-10080:


 Summary: Fix some issues reported by findbugs 
 Key: SOLR-10080
 URL: https://issues.apache.org/jira/browse/SOLR-10080
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Pushkar Raste
Priority: Minor


I ran a findbugs analysis on code and found a lot of issues. I found issues in 
both lucene and solr, however, I am not sure whether Solr developers should fix 
issues like this in the lucene code or not.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9555) Recovery can hang if a node is put into LIR as it is starting up

2017-01-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828380#comment-15828380
 ] 

Pushkar Raste commented on SOLR-9555:
-

Can't we just timeout in the test to let's say 200

> Recovery can hang if a node is put into LIR as it is starting up
> 
>
> Key: SOLR-9555
> URL: https://issues.apache.org/jira/browse/SOLR-9555
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Alan Woodward
>
> See 
> https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17888/consoleFull 
> for an example



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9555) Recovery can hang if a node is put into LIR as it is starting up

2017-01-17 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15827013#comment-15827013
 ] 

Pushkar Raste commented on SOLR-9555:
-

I assume this would happen in the prod deployment (outside of the test itself) 
as well. 

Does the {{PeerSyncReplicationTest.waitTillNodesActive}} method looks good or 
method itself has some bug? 

> Recovery can hang if a node is put into LIR as it is starting up
> 
>
> Key: SOLR-9555
> URL: https://issues.apache.org/jira/browse/SOLR-9555
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Alan Woodward
>
> See 
> https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/17888/consoleFull 
> for an example



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication

2017-01-16 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824089#comment-15824089
 ] 

Pushkar Raste edited comment on SOLR-9906 at 1/16/17 2:48 PM:
--

[~romseygeek] - Thank you for catch the bug. I think check can be fixed by 
changing {{slice.getState() == State.ACTIVE}} to {{slice.getLeader().getState() 
== Replica.State.ACTIVE}} 

Let me know if that is correct and I will attach a patch to fix it (Not sure if 
I have attach patch for this issue in entirety or just the patch to fix the 
slice vs replica state.

By log message is badly setup, do you mean line {{log.debug("Old leader {}, new 
leader. New leader got elected in {} ms", oldLeader, 
slice.getLeader(),timeOut.timeElapsed(MILLISECONDS) );}} is missing a {} 
placeholder for the new leader?


was (Author: praste):
[~romseygeek] - Thank you for catch the bug. I think check can be fixed by 
changing {{slice.getState() == State.ACTIVE}} to {{slice.getLeader().getState() 
== Replica.State.ACTIVE}} 

Let me know if that is correct and I will attach a patch to fix it (Not sure if 
I have attach patch for this issue in entirety or just the patch to fix the 
slice vs replica state.

What do you mean by log message is badly setup?

> Use better check to validate if node recovered via PeerSync or Replication
> --
>
> Key: SOLR-9906
> URL: https://issues.apache.org/jira/browse/SOLR-9906
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9906.patch, SOLR-9906.patch, 
> SOLR-PeerSyncVsReplicationTest.diff
>
>
> Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} 
> currently rely on number of requests made to the leader's replication handler 
> to check if node recovered via PeerSync or replication. This check is not 
> very reliable and we have seen failures in the past. 
> While tinkering with different way to write a better test I found 
> [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better 
> way to distinguish recovery via PeerSync vs Replication. 
> * For {{PeerSyncReplicationTest}}, if node successfully recovers via 
> PeerSync, then file {{replication.properties}} should not exist
> For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does 
> not go into replication recovery after the leader failure, contents 
> {{replication.properties}} should not change 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication

2017-01-16 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824089#comment-15824089
 ] 

Pushkar Raste commented on SOLR-9906:
-

[~romseygeek] - Thank you for catch the bug. I think check can be fixed by 
changing {{slice.getState() == State.ACTIVE}} to {{slice.getLeader().getState() 
== Replica.State.ACTIVE}} 

Let me know if that is correct and I will attach a patch to fix it (Not sure if 
I have attach patch for this issue in entirety or just the patch to fix the 
slice vs replica state.

What do you mean by log message is badly setup?

> Use better check to validate if node recovered via PeerSync or Replication
> --
>
> Key: SOLR-9906
> URL: https://issues.apache.org/jira/browse/SOLR-9906
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
>Priority: Minor
> Fix For: 6.4
>
> Attachments: SOLR-9906.patch, SOLR-9906.patch, 
> SOLR-PeerSyncVsReplicationTest.diff
>
>
> Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} 
> currently rely on number of requests made to the leader's replication handler 
> to check if node recovered via PeerSync or replication. This check is not 
> very reliable and we have seen failures in the past. 
> While tinkering with different way to write a better test I found 
> [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better 
> way to distinguish recovery via PeerSync vs Replication. 
> * For {{PeerSyncReplicationTest}}, if node successfully recovers via 
> PeerSync, then file {{replication.properties}} should not exist
> For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does 
> not go into replication recovery after the leader failure, contents 
> {{replication.properties}} should not change 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9453) NullPointerException on PeerSync recovery

2017-01-09 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812322#comment-15812322
 ] 

Pushkar Raste commented on SOLR-9453:
-

Try switching to 6.3

> NullPointerException on PeerSync recovery
> -
>
> Key: SOLR-9453
> URL: https://issues.apache.org/jira/browse/SOLR-9453
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.2
>Reporter: Michael Braun
>Assignee: Shalin Shekhar Mangar
>
> Just updated to 6.2.0 (previously using 6.1.0) and we restarted the cluster a 
> few times - for one replica trying to sync on a shard, we got this on a 
> bootup and it's seemingly stuck. Cluster has 96 shards, 2 replicas per shard. 
> Shard 51 is where this issue occurred for us. It looks like the replica 
> eventually recovers, but we probably shouldn't see a NullPointerException.
> {code}
> java.lang.NullPointerException
>   at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:605)
>   at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:344)
>   at org.apache.solr.update.PeerSync.sync(PeerSync.java:257)
>   at 
> org.apache.solr.handler.component.RealTimeGetComponent.processSync(RealTimeGetComponent.java:658)
>   at 
> org.apache.solr.handler.component.RealTimeGetComponent.processGetVersions(RealTimeGetComponent.java:623)
>   at 
> org.apache.solr.handler.component.RealTimeGetComponent.process(RealTimeGetComponent.java:117)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:518)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Before it in the log , pasting some relevant lines with full IPs redacted:
> {code}ERROR - 2016-08-29 15:10:28.940; org.apache.solr.common.SolrException; 
> Error while trying to recover. 
> core=ourcollection_shard51_replica2:org.apache.solr.common.SolrException: No 
> registered leader was found after waiting for 4000ms , collection: 
> ourcollection slice: shard51
> at 
> 

[jira] [Commented] (SOLR-9453) NullPointerException on PeerSync recovery

2017-01-09 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812164#comment-15812164
 ] 

Pushkar Raste commented on SOLR-9453:
-

Looks like NPE is coming from a log statement 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.2.0/solr/core/src/java/org/apache/solr/update/PeerSync.java#L605

> NullPointerException on PeerSync recovery
> -
>
> Key: SOLR-9453
> URL: https://issues.apache.org/jira/browse/SOLR-9453
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 6.2
>Reporter: Michael Braun
>Assignee: Shalin Shekhar Mangar
>
> Just updated to 6.2.0 (previously using 6.1.0) and we restarted the cluster a 
> few times - for one replica trying to sync on a shard, we got this on a 
> bootup and it's seemingly stuck. Cluster has 96 shards, 2 replicas per shard. 
> Shard 51 is where this issue occurred for us. It looks like the replica 
> eventually recovers, but we probably shouldn't see a NullPointerException.
> {code}
> java.lang.NullPointerException
>   at org.apache.solr.update.PeerSync.handleUpdates(PeerSync.java:605)
>   at org.apache.solr.update.PeerSync.handleResponse(PeerSync.java:344)
>   at org.apache.solr.update.PeerSync.sync(PeerSync.java:257)
>   at 
> org.apache.solr.handler.component.RealTimeGetComponent.processSync(RealTimeGetComponent.java:658)
>   at 
> org.apache.solr.handler.component.RealTimeGetComponent.processGetVersions(RealTimeGetComponent.java:623)
>   at 
> org.apache.solr.handler.component.RealTimeGetComponent.process(RealTimeGetComponent.java:117)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2089)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.eclipse.jetty.server.Server.handle(Server.java:518)
>   at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
>   at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
>   at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
>   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>   at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
>   at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Before it in the log , pasting some relevant lines with full IPs redacted:
> {code}ERROR - 2016-08-29 15:10:28.940; org.apache.solr.common.SolrException; 
> Error while trying to recover. 
> core=ourcollection_shard51_replica2:org.apache.solr.common.SolrException: No 
> registered leader was found after 

[jira] [Updated] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication

2017-01-02 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9906:

Attachment: SOLR-9906.patch

> Use better check to validate if node recovered via PeerSync or Replication
> --
>
> Key: SOLR-9906
> URL: https://issues.apache.org/jira/browse/SOLR-9906
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
> Attachments: SOLR-9906.patch, SOLR-PeerSyncVsReplicationTest.diff
>
>
> Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} 
> currently rely on number of requests made to the leader's replication handler 
> to check if node recovered via PeerSync or replication. This check is not 
> very reliable and we have seen failures in the past. 
> While tinkering with different way to write a better test I found 
> [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better 
> way to distinguish recovery via PeerSync vs Replication. 
> * For {{PeerSyncReplicationTest}}, if node successfully recovers via 
> PeerSync, then file {{replication.properties}} should not exist
> For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does 
> not go into replication recovery after the leader failure, contents 
> {{replication.properties}} should not change 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-30 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788716#comment-15788716
 ] 

Pushkar Raste commented on SOLR-9835:
-

How are we handling leader failure here. if replicas are some what out of sync 
with the original leader, how would we elect a new leader. 

When the leader fails and a new leader gets elected, the  new leader asks all 
the replicas to sync with the new leader. My understanding is, "since we are 
replicating index by fetching segments from leader, most of the segments on all 
the replicas should look the same, hence all the replicas will not go into full 
index copying". Is that correct ?

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9886) Add ability to turn off/on caches

2016-12-29 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786323#comment-15786323
 ] 

Pushkar Raste commented on SOLR-9886:
-

[~noble.paul] My only concern adding legend to 
{{EditableSolrConfigAttributes.json}} is, if we ever parse this file using a 
JSON parser, we will have to move legend to some other place.

> Add ability to turn off/on caches 
> --
>
> Key: SOLR-9886
> URL: https://issues.apache.org/jira/browse/SOLR-9886
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
>Priority: Minor
> Attachments: EnableDisableCacheAttribute.patch, SOLR-9886.patch, 
> SOLR-9886.patch
>
>
> There is no elegant way to turn off caches (filterCache, queryResultCache 
> etc) from the solrconfig. When I tried setting size and initialSize to zero, 
> it resulted in caches of size 2. Here is the code that overrides setting zero 
> sized cache. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L61-L73
> Only way to disable cache right now is by removing cache configs from the 
> solrConfig, but we can simply provide an attribute to disable cache, so that 
> we can override it using a system property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9886) Add ability to turn off/on caches

2016-12-29 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9886:

Attachment: SOLR-9886.patch

Updated patch with a test.

> Add ability to turn off/on caches 
> --
>
> Key: SOLR-9886
> URL: https://issues.apache.org/jira/browse/SOLR-9886
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
>Priority: Minor
> Attachments: EnableDisableCacheAttribute.patch, SOLR-9886.patch, 
> SOLR-9886.patch
>
>
> There is no elegant way to turn off caches (filterCache, queryResultCache 
> etc) from the solrconfig. When I tried setting size and initialSize to zero, 
> it resulted in caches of size 2. Here is the code that overrides setting zero 
> sized cache. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L61-L73
> Only way to disable cache right now is by removing cache configs from the 
> solrConfig, but we can simply provide an attribute to disable cache, so that 
> we can override it using a system property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication

2016-12-29 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9906:

Attachment: SOLR-PeerSyncVsReplicationTest.diff

Here is a patch. 

I have also fixed bugs in the tests I came across.

> Use better check to validate if node recovered via PeerSync or Replication
> --
>
> Key: SOLR-9906
> URL: https://issues.apache.org/jira/browse/SOLR-9906
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
> Attachments: SOLR-PeerSyncVsReplicationTest.diff
>
>
> Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} 
> currently rely on number of requests made to the leader's replication handler 
> to check if node recovered via PeerSync or replication. This check is not 
> very reliable and we have seen failures in the past. 
> While tinkering with different way to write a better test I found 
> [SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better 
> way to distinguish recovery via PeerSync vs Replication. 
> * For {{PeerSyncReplicationTest}}, if node successfully recovers via 
> PeerSync, then file {{replication.properties}} should not exist
> For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does 
> not go into replication recovery after the leader failure, contents 
> {{replication.properties}} should not change 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9906) Use better check to validate if node recovered via PeerSync or Replication

2016-12-29 Thread Pushkar Raste (JIRA)
Pushkar Raste created SOLR-9906:
---

 Summary: Use better check to validate if node recovered via 
PeerSync or Replication
 Key: SOLR-9906
 URL: https://issues.apache.org/jira/browse/SOLR-9906
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Pushkar Raste
Priority: Minor


Tests {{LeaderFailureAfterFreshStartTest}} and {{PeerSyncReplicationTest}} 
currently rely on number of requests made to the leader's replication handler 
to check if node recovered via PeerSync or replication. This check is not very 
reliable and we have seen failures in the past. 

While tinkering with different way to write a better test I found 
[SOLR-9859|SOLR-9859]. Now that SOLR-9859 is fixed, here is idea for better way 
to distinguish recovery via PeerSync vs Replication. 

* For {{PeerSyncReplicationTest}}, if node successfully recovers via PeerSync, 
then file {{replication.properties}} should not exist

For {{LeaderFailureAfterFreshStartTest}}, if the freshly replicated node does 
not go into replication recovery after the leader failure, contents 
{{replication.properties}} should not change 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9859) replication.properties cannot be updated after being written and neither replication.properties or index.properties are durable in the face of a crash

2016-12-22 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15770591#comment-15770591
 ] 

Pushkar Raste commented on SOLR-9859:
-

Yeah. Looks like these are the same issue


> replication.properties cannot be updated after being written and neither 
> replication.properties or index.properties are durable in the face of a crash
> --
>
> Key: SOLR-9859
> URL: https://issues.apache.org/jira/browse/SOLR-9859
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.3, 6.3
>Reporter: Pushkar Raste
>Assignee: Mark Miller
>Priority: Minor
> Attachments: SOLR-9859.patch, SOLR-9859.patch, SOLR-9859.patch, 
> SOLR-9859.patch, SOLR-9859.patch
>
>
> If a shard recovers via replication (vs PeerSync) a file named 
> {{replication.properties}} gets created. If the same shard recovers once more 
> via replication, IndexFetcher fails to write latest replication information 
> as it tries to create {{replication.properties}} but as file already exists. 
> Here is the stack trace I saw 
> {code}
> java.nio.file.FileAlreadyExistsException: 
> \shard-3-001\cores\collection1\data\replication.properties
>   at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source)
>   at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source)
>   at java.nio.file.Files.newOutputStream(Unknown Source)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
>   at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
>   at 
> org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265)
>   at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9886) Add ability to turn off/on caches

2016-12-21 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15768331#comment-15768331
 ] 

Pushkar Raste commented on SOLR-9886:
-

Check attached patch. I think we may to make changes to 
{{EditableSolrConfigAttributes.json}} as well. I don't understand mapping 
between the attribute names and associated numbers. 

> Add ability to turn off/on caches 
> --
>
> Key: SOLR-9886
> URL: https://issues.apache.org/jira/browse/SOLR-9886
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
> Attachments: EnableDisableCacheAttribute.patch
>
>
> There is no elegant way to turn off caches (filterCache, queryResultCache 
> etc) from the solrconfig. When I tried setting size and initialSize to zero, 
> it resulted in caches of size 2. Here is the code that overrides setting zero 
> sized cache. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L61-L73
> Only way to disable cache right now is by removing cache configs from the 
> solrConfig, but we can simply provide an attribute to disable cache, so that 
> we can override it using a system property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9886) Add ability to turn off/on caches

2016-12-21 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9886:

Attachment: EnableDisableCacheAttribute.patch

> Add ability to turn off/on caches 
> --
>
> Key: SOLR-9886
> URL: https://issues.apache.org/jira/browse/SOLR-9886
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
> Attachments: EnableDisableCacheAttribute.patch
>
>
> There is no elegant way to turn off caches (filterCache, queryResultCache 
> etc) from the solrconfig. When I tried setting size and initialSize to zero, 
> it resulted in caches of size 2. Here is the code that overrides setting zero 
> sized cache. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L61-L73
> Only way to disable cache right now is by removing cache configs from the 
> solrConfig, but we can simply provide an attribute to disable cache, so that 
> we can override it using a system property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9886) Add ability to turn off/on caches

2016-12-21 Thread Pushkar Raste (JIRA)
Pushkar Raste created SOLR-9886:
---

 Summary: Add ability to turn off/on caches 
 Key: SOLR-9886
 URL: https://issues.apache.org/jira/browse/SOLR-9886
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Pushkar Raste
Priority: Minor


There is no elegant way to turn off caches (filterCache, queryResultCache etc) 
from the solrconfig. When I tried setting size and initialSize to zero, it 
resulted in caches of size 2. Here is the code that overrides setting zero 
sized cache. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/FastLRUCache.java#L61-L73

Only way to disable cache right now is by removing cache configs from the 
solrConfig, but we can simply provide an attribute to disable cache, so that we 
can override it using a system property. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9859) replication.properties cannot be updated after being written and neither replication.properties or index.properties are durable in the face of a crash

2016-12-19 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762069#comment-15762069
 ] 

Pushkar Raste commented on SOLR-9859:
-

Looks good to me
Can we write a test to validate the patch? 

> replication.properties cannot be updated after being written and neither 
> replication.properties or index.properties are durable in the face of a crash
> --
>
> Key: SOLR-9859
> URL: https://issues.apache.org/jira/browse/SOLR-9859
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.3, 6.3
>Reporter: Pushkar Raste
>Assignee: Mark Miller
>Priority: Minor
> Attachments: SOLR-9859.patch, SOLR-9859.patch, SOLR-9859.patch, 
> SOLR-9859.patch, SOLR-9859.patch
>
>
> If a shard recovers via replication (vs PeerSync) a file named 
> {{replication.properties}} gets created. If the same shard recovers once more 
> via replication, IndexFetcher fails to write latest replication information 
> as it tries to create {{replication.properties}} but as file already exists. 
> Here is the stack trace I saw 
> {code}
> java.nio.file.FileAlreadyExistsException: 
> \shard-3-001\cores\collection1\data\replication.properties
>   at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source)
>   at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source)
>   at java.nio.file.Files.newOutputStream(Unknown Source)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
>   at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
>   at 
> org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265)
>   at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9859) replication.properties does not get updated the second time around if index recovers via replication

2016-12-15 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751997#comment-15751997
 ] 

Pushkar Raste commented on SOLR-9859:
-

[~markrmil...@gmail.com] looks like in the `atomicRename` file you are deleting 
existing file and then renaming the temp file. How is this better than just 
deleting the file a writing a new file, if we crash at a wrong time (as you 
have mentioned above). 

Would we need to manually rename the temp file in such a scenario?

> replication.properties does not get updated the second time around if index 
> recovers via replication
> 
>
> Key: SOLR-9859
> URL: https://issues.apache.org/jira/browse/SOLR-9859
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.3, 6.3
>Reporter: Pushkar Raste
>Assignee: Mark Miller
>Priority: Minor
> Attachments: SOLR-9859.patch, SOLR-9859.patch
>
>
> If a shard recovers via replication (vs PeerSync) a file named 
> {{replication.properties}} gets created. If the same shard recovers once more 
> via replication, IndexFetcher fails to write latest replication information 
> as it tries to create {{replication.properties}} but as file already exists. 
> Here is the stack trace I saw 
> {code}
> java.nio.file.FileAlreadyExistsException: 
> \shard-3-001\cores\collection1\data\replication.properties
>   at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source)
>   at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source)
>   at java.nio.file.Files.newOutputStream(Unknown Source)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
>   at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
>   at 
> org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265)
>   at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9859) replication.properties does not get updated the second time around if index recovers via replication

2016-12-14 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749175#comment-15749175
 ] 

Pushkar Raste commented on SOLR-9859:
-

Is there a way we can write a temp file and do a mv to rename/overwrite 
replication.properties

Alternate solution would be to keep appending to existing file and read the 
latest stats from the file.

> replication.properties does not get updated the second time around if index 
> recovers via replication
> 
>
> Key: SOLR-9859
> URL: https://issues.apache.org/jira/browse/SOLR-9859
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.3, 6.3
>Reporter: Pushkar Raste
>Assignee: Mark Miller
>Priority: Minor
>
> If a shard recovers via replication (vs PeerSync) a file named 
> {{replication.properties}} gets created. If the same shard recovers once more 
> via replication, IndexFetcher fails to write latest replication information 
> as it tries to create {{replication.properties}} but as file already exists. 
> Here is the stack trace I saw 
> {code}
> java.nio.file.FileAlreadyExistsException: 
> \shard-3-001\cores\collection1\data\replication.properties
>   at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source)
>   at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source)
>   at java.nio.file.Files.newOutputStream(Unknown Source)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
>   at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
>   at 
> org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265)
>   at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9859) replication.properties does not get updated the second time around if index recovers via replication

2016-12-12 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9859:

Summary: replication.properties does not get updated the second time around 
if index recovers via replication  (was: replication.properties does get 
updated the second time around if index recovers via replication)

> replication.properties does not get updated the second time around if index 
> recovers via replication
> 
>
> Key: SOLR-9859
> URL: https://issues.apache.org/jira/browse/SOLR-9859
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.3, 6.3
>Reporter: Pushkar Raste
>Priority: Minor
>
> If a shard recovers via replication (vs PeerSync) a file named 
> {{replication.properties}} gets created. If the same shard recovers once more 
> via replication, IndexFetcher fails to write latest replication information 
> as it tries to create {{replication.properties}} but as file already exists. 
> Here is the stack trace I saw 
> {code}
> java.nio.file.FileAlreadyExistsException: 
> \shard-3-001\cores\collection1\data\replication.properties
>   at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source)
>   at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source)
>   at java.nio.file.Files.newOutputStream(Unknown Source)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
>   at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
>   at 
> org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265)
>   at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9859) replication.properties does get updated the second time around if index recovers via replication

2016-12-12 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742967#comment-15742967
 ] 

Pushkar Raste commented on SOLR-9859:
-

Proposed solution 'Delete existing file and create a new one' ?

> replication.properties does get updated the second time around if index 
> recovers via replication
> 
>
> Key: SOLR-9859
> URL: https://issues.apache.org/jira/browse/SOLR-9859
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.3, 6.3
>Reporter: Pushkar Raste
>Priority: Minor
>
> If a shard recovers via replication (vs PeerSync) a file named 
> {{replication.properties}} gets created. If the same shard recovers once more 
> via replication, IndexFetcher fails to write latest replication information 
> as it tries to create {{replication.properties}} but as file already exists. 
> Here is the stack trace I saw 
> {code}
> java.nio.file.FileAlreadyExistsException: 
> \shard-3-001\cores\collection1\data\replication.properties
>   at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
>   at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source)
>   at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source)
>   at java.nio.file.Files.newOutputStream(Unknown Source)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
>   at 
> org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
>   at 
> org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
>   at 
> org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501)
>   at 
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265)
>   at 
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157)
>   at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
>   at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9859) replication.properties does get updated the second time around if index recovers via replication

2016-12-12 Thread Pushkar Raste (JIRA)
Pushkar Raste created SOLR-9859:
---

 Summary: replication.properties does get updated the second time 
around if index recovers via replication
 Key: SOLR-9859
 URL: https://issues.apache.org/jira/browse/SOLR-9859
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 6.3, 5.5.3
Reporter: Pushkar Raste
Priority: Minor


If a shard recovers via replication (vs PeerSync) a file named 
{{replication.properties}} gets created. If the same shard recovers once more 
via replication, IndexFetcher fails to write latest replication information as 
it tries to create {{replication.properties}} but as file already exists. Here 
is the stack trace I saw 
{code}
java.nio.file.FileAlreadyExistsException: 
\shard-3-001\cores\collection1\data\replication.properties
at sun.nio.fs.WindowsException.translateToIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source)
at java.nio.file.spi.FileSystemProvider.newOutputStream(Unknown Source)
at java.nio.file.Files.newOutputStream(Unknown Source)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:413)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:409)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:253)
at 
org.apache.solr.handler.IndexFetcher.logReplicationTimeAndConfFiles(IndexFetcher.java:689)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:501)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:265)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:157)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:409)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$0(ExecutorUtil.java:229)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-09 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15736051#comment-15736051
 ] 

Pushkar Raste commented on SOLR-9835:
-

Instead of periodic polling, can leader upon receiving and processing a commit 
command, send a notification to replicas asking them to sync up?

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud

2016-12-08 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15732174#comment-15732174
 ] 

Pushkar Raste commented on SOLR-9835:
-

I am curious to know how soft commits (in memory segments) would be handled.

> Create another replication mode for SolrCloud
> -
>
> Key: SOLR-9835
> URL: https://issues.apache.org/jira/browse/SOLR-9835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which 
> replicas start in same initial state and for each input, the input is 
> distributed across replicas so all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down 
> time, the replica have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state 
> transfer, which acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply 
> the update to IW, other replicas just store the update to UpdateLog (act like 
> replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, 
> updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-11-28 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702936#comment-15702936
 ] 

Pushkar Raste commented on SOLR-9546:
-

Looks like we stepped on each other foot when I was fixing the 
{{CloudMLTQParser}} class. Please check updated patch.

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
>Priority: Minor
> Attachments: SOLR-9546.patch, SOLR-9546_CloudMLTQParser.patch
>
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-11-28 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9546:

Attachment: SOLR-9546_CloudMLTQParser.patch

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
>Priority: Minor
> Attachments: SOLR-9546.patch, SOLR-9546_CloudMLTQParser.patch
>
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-11-28 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9546:

Attachment: (was: SOLR-9546_CloudMLTQParser.patch)

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
>Priority: Minor
> Attachments: SOLR-9546.patch
>
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-11-27 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1566#comment-1566
 ] 

Pushkar Raste commented on SOLR-9546:
-

Are you still talking about the CloundMLTQParser patch? If it was applied how 
come I still see code that uses objects ?

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/mlt/CloudMLTQParser.java#L72-L91

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
>Priority: Minor
> Attachments: SOLR-9546.patch, SOLR-9546_CloudMLTQParser.patch
>
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9511) Retire using individual versions to request updates during PeerSync

2016-11-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696682#comment-15696682
 ] 

Pushkar Raste commented on SOLR-9511:
-

I don't think individual versions API is being used anywhere. 

I agree to keep old API around for may be another major version (8.X), but 
don't see much harm to get rid of it in 8.X itself as old API is there in 6.X 
and 7.X

> Retire using individual versions to request updates during PeerSync
> ---
>
> Key: SOLR-9511
> URL: https://issues.apache.org/jira/browse/SOLR-9511
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Pushkar Raste
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: SOLR-9511.patch
>
>
> We started using version ranges to request updates during PeerSync in 
> [SOLR-9207| https://issues.apache.org/jira/browse/SOLR-9207]. Using version 
> ranges was also made default.
> There is no need to have code that uses individual versions start Solr 7. 
> Decommission (remove unnecessary code)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-11-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696663#comment-15696663
 ] 

Pushkar Raste commented on SOLR-9546:
-

I think you reverted changes in the {{CloudMLTQParser}} class as some tests 
were failing. I added a patch {{SOLR-9546_CloudMLTQParser.patch}} only for  
{{CloudMLTQParser}} class

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
>Priority: Minor
> Attachments: SOLR-9546.patch, SOLR-9546_CloudMLTQParser.patch
>
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-15 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: parallelize-peersync.patch

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2, 
> parallelize-peersync.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-15 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: (was: parallelize-peersync.patch)

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2, 
> parallelize-peersync.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-15 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: parallelize-peersync.patch

Attached working patch. 
For my tests I didn't see much improvement (in fact in some cases performance 
degraded) with parallelization. I could not find any hotspot in the profile.

My theory is documents in test are so shorts and simple, that although 
parallelizing is working functionally, we need to test this with more complex 
documents and verify performance gains. 

Most of the parallelization parameters would be subjective and people need to 
verify which ones work better for them.

It also seems performance would suffer if there are relatively high DBQs to  
applied during DBQs, since updates are applied out of order.

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2, 
> parallelize-peersync.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-06 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15641712#comment-15641712
 ] 

Pushkar Raste edited comment on SOLR-9689 at 11/6/16 12:29 PM:
---

[~ichattopadhyaya] - 

*  Even for normal operations, updates for the leader can arrive at the replica 
in a different order and we already have a way to handle it.  We currently 
store 100 DBQs, to handle reordered updates. If reordered DBQs are detected, 
DBQs are applied along with a add, 
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L201

* I think even for partial updates, corresponding full update is stored in the 
tlog. I don't think tlog ever stores partial updates like inc value of a field 
or set value of a field. It always contains entire document with updated values.

* I am creating a batch of only 100 updates and only 100 updates in the batch 
will be applied concurrently. I don't think there will be any issues. We can 
make size of  DBQ list in the DirectUpdateHandler2 configurable as well


was (Author: praste):
[~ichattopadhyaya] - 

*  Even for normal in normal operations, updates for the leader can arrive at 
the replica in a different order and we already have a way to handle it.  We 
currently store 100 DBQs, to handle reordered updates. If reordered DBQs are 
detected, DBQs are applied along with a add, 
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L201

* I think even for partial updates, corresponding full update is stored in the 
tlog. I don't think tlog ever stores partial updates like inc value of a field 
or set value of a field. It always contains entire document with updated values.

* I am creating a batch of only 100 updates and only 100 updates in the batch 
will be applied concurrently. I don't think there will be any issues. We can 
make size of  DBQ list in the DirectUpdateHandler2 configurable as well

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-06 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15641712#comment-15641712
 ] 

Pushkar Raste commented on SOLR-9689:
-

[~ichattopadhyaya] - 

*  Even for normal in normal operations, updates for the leader can arrive at 
the replica in a different order and we already have a way to handle it.  We 
currently store 100 DBQs, to handle reordered updates. If reordered DBQs are 
detected, DBQs are applied along with a add, 
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L201

* I think even for partial updates, corresponding full update is stored in the 
tlog. I don't think tlog ever stores partial updates like inc value of a field 
or set value of a field. It always contains entire document with updated values.

* I am creating a batch of only 100 updates and only 100 updates in the batch 
will be applied concurrently. I don't think there will be any issues. We can 
make size of  DBQ list in the DirectUpdateHandler2 configurable as well

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-11-04 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: SOLR-9689.patch2

A new patch with configurable threshold for parallelism  

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch, SOLR-9689.patch2
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9689) Process updates concurrently during PeerSync

2016-10-24 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15603424#comment-15603424
 ] 

Pushkar Raste edited comment on SOLR-9689 at 10/24/16 10:28 PM:


POC for applying updates concurrently. 
Please review it and let me know if there are gaping issues. 

I would also appreciate any suggestions to handle out of order {{DBQ}} (I think 
by default we keep a few {{DBQs}} around to account for out of order upates), 
may be we can increase the number of {{DBQs}} we keep around if {{DBQs}} have 
{{PEER_SYNC}} flag set on it.


was (Author: praste):
POC for applying updates concurrently. 
Please review it and let me know if there are gaping issues. 

I would also appreciate any suggestions to handle out of order {{DBQ} (I think 
by default we keep a few {{DBQs}} around to account for out of order upates), 
may be we can increase the number of {{DBQs}} we keep around if {{DBQs}} have 
{{PEER_SYNC}} flag set on it.

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-10-24 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Attachment: SOLR-9689.patch

POC for applying updates concurrently. 
Please review it and let me know if there are gaping issues. 

I would also appreciate any suggestions to handle out of order {{DBQ} (I think 
by default we keep a few {{DBQs}} around to account for out of order upates), 
may be we can increase the number of {{DBQs}} we keep around if {{DBQs}} have 
{{PEER_SYNC}} flag set on it.

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
> Attachments: SOLR-9689.patch
>
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9689) Process updates concurrently during PeerSync

2016-10-24 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9689:

Summary: Process updates concurrently during PeerSync  (was: Process 
updates concurrently during {{PeerSync}})

> Process updates concurrently during PeerSync
> 
>
> Key: SOLR-9689
> URL: https://issues.apache.org/jira/browse/SOLR-9689
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>
> This came up during discussion with [~shalinmangar]
> During {{PeerSync}}, updates are applied one a time by looping through the 
> updates received from the leader. This is slow and could keep node in 
> recovery for a long time if number of updates to apply were large. 
> We can apply updates concurrently, this should be no different than what 
> could happen during normal indexing (we can't really ensure that a replica 
> will process updates in the same order as the leader or other replicas).
> There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9689) Process updates concurrently during {{PeerSync}}

2016-10-24 Thread Pushkar Raste (JIRA)
Pushkar Raste created SOLR-9689:
---

 Summary: Process updates concurrently during {{PeerSync}}
 Key: SOLR-9689
 URL: https://issues.apache.org/jira/browse/SOLR-9689
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Pushkar Raste


This came up during discussion with [~shalinmangar]

During {{PeerSync}}, updates are applied one a time by looping through the 
updates received from the leader. This is slow and could keep node in recovery 
for a long time if number of updates to apply were large. 

We can apply updates concurrently, this should be no different than what could 
happen during normal indexing (we can't really ensure that a replica will 
process updates in the same order as the leader or other replicas).

There are few corner cases around dbq we should be careful about. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-23 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15599826#comment-15599826
 ] 

Pushkar Raste commented on SOLR-9506:
-

Yeah, I looked into it. I will try that approach, if I can get to it before 
[~noble.paul] applies the patch. 


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506_POC.patch, SOLR-9506_final.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-21 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596148#comment-15596148
 ] 

Pushkar Raste commented on SOLR-9506:
-

Don't use patch for parallalized computation. Parallel streams in use a shared 
fork-join pool. A bad actor can create havoc.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-10-21 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9546:

Attachment: SOLR-9546_CloudMLTQParser.patch

Patch for CloudMLTQParser

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
> Attachments: SOLR-9546.patch, SOLR-9546_CloudMLTQParser.patch
>
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-20 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9506:

Attachment: SOLR-9506.patch

Patch with parallalized computation 

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, 
> SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-19 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9506:

Attachment: SOLR-9506.patch

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506.patch, 
> SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-19 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589649#comment-15589649
 ] 

Pushkar Raste commented on SOLR-9506:
-

[~noble.paul] and [~yo...@apache.org] I was able to put together test to show 
that current implementation is broken. 
I will update patch with the test and a fix by EOD today

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585736#comment-15585736
 ] 

Pushkar Raste commented on SOLR-9506:
-

There is lot of confusion going on here. Would above test fail not fail, if we 
won't cache per segment indexfingerprint ?
If yes, them we should revert the commit, if not we should open a new issue to 
fix the indexfingerprint computation altogether. 


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585707#comment-15585707
 ] 

Pushkar Raste commented on SOLR-9506:
-

I think what Yonik is implying is that, if for some reason, replica does not 
apply delete properly, index fingerprint would still checkout and that would be 
a problem.

Considering the issues with {{PeerSync}}, should add that option  
{{recoverWithReplicationOnly}} ? For most of the setups I doubt if people would 
have hundreds of thousands of records in updateLog in which which almost no one 
is using {{PeerSync}} anyway

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585688#comment-15585688
 ] 

Pushkar Raste commented on SOLR-9506:
-

i.e. we really need fix IndexFingerprint computation, whether or not we cache. 
I will open a separate issue to fix it in that case.

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585677#comment-15585677
 ] 

Pushkar Raste commented on SOLR-9506:
-

I don't see why caching indexfingerprint per segment and using that later would 
be different than computing indexfingerprint on entire segment by going through 
one segment at time. 

I tried to come up with scenarios where caching solution would fail and 
original solution would not, but could not think of any. 


> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-18 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585612#comment-15585612
 ] 

Pushkar Raste commented on SOLR-9506:
-

I did not upload the patch with parallelStream. In SolrIndexSearcher where we 
compute and cache per segment indexfingerprint try switching from {{stream()}} 
to {{parallelStream()}} and you will see {{PeerSyncTest}} fails. 

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-07 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556139#comment-15556139
 ] 

Pushkar Raste commented on SOLR-9506:
-

I computed hash w/o regard to deleted docs and cached it. All the tests are 
passing even without doing steps #2 and #3. I also verified that index 
fingerprint computed on entire index matches to that of fingerprint computed on 
from individual segments (even after deletions).

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-07 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556132#comment-15556132
 ] 

Pushkar Raste commented on SOLR-9506:
-

I also found some weird behavior. If I use {{parallelStream}} to compute 
segment fingerprints in parallel. When I reduce it to the index fingerprint on 
the index searcher, test fails. Why should order of computation and reduction 
matter in this case?

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9591) Shards and replicas go down when indexing large number of files

2016-10-06 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553195#comment-15553195
 ] 

Pushkar Raste commented on SOLR-9591:
-

Are you using MMapDirectory? Using MMApDirectory keep index off heap and 
reduces pressure on the garbage collector.

In my experience G1GC with {{ParallelRefProcEnabled}} helps a lot to have short 
GC pauses. 

> Shards and replicas go down when indexing large number of files
> ---
>
> Key: SOLR-9591
> URL: https://issues.apache.org/jira/browse/SOLR-9591
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 5.5.2
>Reporter: Khalid Alharbi
> Attachments: solr_log_20161002_1504
>
>
> Solr shards and replicas go down when indexing a large number of text files 
> using the default [extracting request 
> handler|https://cwiki.apache.org/confluence/x/c4DxAQ].
> {code}
> curl 
> 'http://localhost:8983/solr/myCollection/update/extract?literal.id=someId' -F 
> "myfile=/data/file1.txt"
> {code}
> and committing after indexing 5,000 files using:
> {code}
> curl 'http://localhost:8983/solr/myCollection/update?commit=true=json'
> {code}
> This was on Solr (SolrCloud) version 5.5.2 with an external zookeeper cluster 
> of five nodes. I also tried this on a single node SolrCloud with the embedded 
> ZooKeeper but the collection went down as well. In both cases the error 
> message is always "ERROR null DistributedUpdateProcessor ClusterState says we 
> are the leader,​ but locally we don't think so"
> I managed to come up with a work around that helped me index over 400K files 
> without getting replicas down with that error message. The work around is to 
> index 5K files, restart Solr, wait for shards and replicas to get active, 
> then index the next 5K files, and repeat the previous steps.
> If this is not enough to investigate this issue, I will be happy to provide 
> more details regarding this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9591) Shards and replicas go down when indexing large number of files

2016-10-06 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552791#comment-15552791
 ] 

Pushkar Raste commented on SOLR-9591:
-

Have you looked into GC logs to see if there are any long GC pauses.

> Shards and replicas go down when indexing large number of files
> ---
>
> Key: SOLR-9591
> URL: https://issues.apache.org/jira/browse/SOLR-9591
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 5.5.2
>Reporter: Khalid Alharbi
> Attachments: solr_log_20161002_1504
>
>
> Solr shards and replicas go down when indexing a large number of text files 
> using the default [extracting request 
> handler|https://cwiki.apache.org/confluence/x/c4DxAQ].
> {code}
> curl 
> 'http://localhost:8983/solr/myCollection/update/extract?literal.id=someId' -F 
> "myfile=/data/file1.txt"
> {code}
> and committing after indexing 5,000 files using:
> {code}
> curl 'http://localhost:8983/solr/myCollection/update?commit=true=json'
> {code}
> This was on Solr (SolrCloud) version 5.5.2 with an external zookeeper cluster 
> of five nodes. I also tried this on a single node SolrCloud with the embedded 
> ZooKeeper but the collection went down as well. In both cases the error 
> message is always "ERROR null DistributedUpdateProcessor ClusterState says we 
> are the leader,​ but locally we don't think so"
> I managed to come up with a work around that helped me index over 400K files 
> without getting replicas down with that error message. The work around is to 
> index 5K files, restart Solr, wait for shards and replicas to get active, 
> then index the next 5K files, and repeat the previous steps.
> If this is not enough to investigate this issue, I will be happy to provide 
> more details regarding this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-06 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552125#comment-15552125
 ] 

Pushkar Raste commented on SOLR-9506:
-

Updated patch, added a scenario in {{PeerSyncTest}} about replica missing an 
update.
Looks like with don't need to remove live docs check {{if (liveDocs != null && 
!liveDocs.get(doc)) continue;}}

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-06 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9506:

Attachment: SOLR-9506.patch

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9511) Retire using individual versions to request updates during PeerSync

2016-10-05 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9511:

Attachment: SOLR-9511.patch

> Retire using individual versions to request updates during PeerSync
> ---
>
> Key: SOLR-9511
> URL: https://issues.apache.org/jira/browse/SOLR-9511
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Pushkar Raste
>Priority: Minor
> Fix For: master (7.0)
>
> Attachments: SOLR-9511.patch
>
>
> We started using version ranges to request updates during PeerSync in 
> [SOLR-9207| https://issues.apache.org/jira/browse/SOLR-9207]. Using version 
> ranges was also made default.
> There is no need to have code that uses individual versions start Solr 7. 
> Decommission (remove unnecessary code)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-10-05 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9546:

Attachment: SOLR-9546.patch

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
> Attachments: SOLR-9546.patch
>
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-05 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9506:

Attachment: SOLR-9506.patch

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-05 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9506:

Attachment: (was: SOLR-9506.patch)

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-9506) cache IndexFingerprint for each segment

2016-10-05 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9506:

Attachment: SOLR-9506.patch

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506.patch, SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9036) Solr slave is doing full replication (entire index) of index after master restart

2016-10-03 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15544142#comment-15544142
 ] 

Pushkar Raste commented on SOLR-9036:
-

Does fix for [SOLR-9446|https://issues.apache.org/jira/browse/SOLR-9446] helps 
in this situation?

> Solr slave is doing full replication (entire index) of index after master 
> restart
> -
>
> Key: SOLR-9036
> URL: https://issues.apache.org/jira/browse/SOLR-9036
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 5.3.1, 6.0
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Critical
>  Labels: impact-high
> Fix For: 5.5.2, 5.6, 6.0.1, 6.1, master (7.0)
>
> Attachments: SOLR-9036.patch, SOLR-9036.patch, SOLR-9036.patch
>
>
> This was first described in the following email:
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3ccafgnfoyn+xmpxwzwbjuzddeuz7tjqhqktek6q7u8xgstqy3...@mail.gmail.com%3E
> I tried Solr 5.3.1 and Solr 6 and I can reproduce the problem. If the master 
> comes back online before the next polling interval then the slave finds 
> itself in sync with the master but if the master is down for at least one 
> polling interval then the slave pulls the entire full index from the master 
> even if the index has not changed on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9036) Solr slave is doing full replication (entire index) of index after master restart

2016-10-03 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15544142#comment-15544142
 ] 

Pushkar Raste edited comment on SOLR-9036 at 10/4/16 2:45 AM:
--

Does fix for [SOLR-9446|https://issues.apache.org/jira/browse/SOLR-9446] helps 
in this situation as well?


was (Author: praste):
Does fix for [SOLR-9446|https://issues.apache.org/jira/browse/SOLR-9446] helps 
in this situation?

> Solr slave is doing full replication (entire index) of index after master 
> restart
> -
>
> Key: SOLR-9036
> URL: https://issues.apache.org/jira/browse/SOLR-9036
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 5.3.1, 6.0
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Critical
>  Labels: impact-high
> Fix For: 5.5.2, 5.6, 6.0.1, 6.1, master (7.0)
>
> Attachments: SOLR-9036.patch, SOLR-9036.patch, SOLR-9036.patch
>
>
> This was first described in the following email:
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3ccafgnfoyn+xmpxwzwbjuzddeuz7tjqhqktek6q7u8xgstqy3...@mail.gmail.com%3E
> I tried Solr 5.3.1 and Solr 6 and I can reproduce the problem. If the master 
> comes back online before the next polling interval then the slave finds 
> itself in sync with the master but if the master is down for at least one 
> polling interval then the slave pulls the entire full index from the master 
> even if the index has not changed on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-10-03 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543018#comment-15543018
 ] 

Pushkar Raste commented on SOLR-9546:
-

This is not a critical issue, and I might be doing premature optimization.

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-10-03 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543017#comment-15543017
 ] 

Pushkar Raste commented on SOLR-9546:
-

This is not a critical issue, and I might be doing premature optimization.

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-10-03 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511196#comment-15511196
 ] 

Pushkar Raste edited comment on SOLR-9546 at 10/3/16 3:29 PM:
--

Got you.
I will fix the  {{Long getLong(String param, Long def)}} method only. It is not 
as bad as I initially thought.

I don't even think that method is needed. Calling {{Long getLong(String 
param)}} would do the same thing, won't it?


was (Author: praste):
Got you.
I will fix the  {{Long getLong(String param, Long def)}} method only. It is not 
as bad as initially thought.

I don't even think that method is needed. Calling {{Long getLong(String 
param)}} would do the same thing, won't it?

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9511) Retire using individual versions to request updates during PeerSync

2016-10-03 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15542481#comment-15542481
 ] 

Pushkar Raste commented on SOLR-9511:
-

We are planning to set num records in ulog to a very high number. If that 
number is too high, leader may run into issues (throw OOM), when replica asks 
for high number of updates. In such a case we will have to request updates in 
chunks/batches. In preparation of that, we should keep 
{{PeerSync.requestVersions()}} logic simple. 

This is ticket is to track effort for removing old way of using individual 
versions to request updates.

> Retire using individual versions to request updates during PeerSync
> ---
>
> Key: SOLR-9511
> URL: https://issues.apache.org/jira/browse/SOLR-9511
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: master (7.0)
>Reporter: Pushkar Raste
>Priority: Minor
> Fix For: master (7.0)
>
>
> We started using version ranges to request updates during PeerSync in 
> [SOLR-9207| https://issues.apache.org/jira/browse/SOLR-9207]. Using version 
> ranges was also made default.
> There is no need to have code that uses individual versions start Solr 7. 
> Decommission (remove unnecessary code)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-29 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532555#comment-15532555
 ] 

Pushkar Raste commented on SOLR-9506:
-

Discussed with [~noble.paul] 
We should cache fingerprint for a segment only if  *maxVersion specified* > 
*max version in the segment*

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9369) SolrCloud should not compare commitTimeMSec to see if replicas are in sync

2016-09-28 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530492#comment-15530492
 ] 

Pushkar Raste commented on SOLR-9369:
-

[SOLR-9446|https://issues.apache.org/jira/browse/SOLR-9446] might help as it 
provide alternate way to check if replicas are in sync or not. 

> SolrCloud should not compare commitTimeMSec to see if replicas are in sync
> --
>
> Key: SOLR-9369
> URL: https://issues.apache.org/jira/browse/SOLR-9369
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Varun Thacker
>
> Today the replication code we compare if two replicas are in sync by checking 
> the commit timestamp ( "commitTimeMSec" ) 
> This made sense for master slave but I don't think is useful for SolrCloud 
> since different replicas will commit at different times. We should not check 
> for this in SolrCloud mode.
> Ramkumar noted this on SOLR-7859 as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523564#comment-15523564
 ] 

Pushkar Raste commented on SOLR-9506:
-

I think what [~ichattopadhyaya] is hinting at, is that if {{numDocs}} account 
only for live (active) docs, then once documents are deleted in a segment, 
{{numDocs}} in the cached fingerprint might be wrong. 

Surprising, following test cases passed with my POC
1. {{PeerSyncTest}}
2. {{PeerSyncReplicationTest}}
3. {{SyncSliceTest}}

In the worst case, we can atleast parallalize fingerprint computation. 

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523566#comment-15523566
 ] 

Pushkar Raste commented on SOLR-9506:
-

Adding [~ysee...@gmail.com] in the loop

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9506:

Comment: was deleted

(was: In short you are suggesting that when we cache fingerprint for individual 
segments, we keep a list of version numbers in those segments around? That 
would be billions of {{Long}} values cached, which might be counter-productive,)

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-26 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523273#comment-15523273
 ] 

Pushkar Raste commented on SOLR-9506:
-

In short you are suggesting that when we cache fingerprint for individual 
segments, we keep a list of version numbers in those segments around? That 
would be billions of {{Long}} values cached, which might be counter-productive,

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
> Attachments: SOLR-9506_POC.patch
>
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521130#comment-15521130
 ] 

Pushkar Raste edited comment on SOLR-9506 at 9/25/16 5:14 PM:
--

POC/Initial commit - 
https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13

There are two issues we still need to solve. 
* How to compute {{versionsInHash}} from {{versionsInHash}} of individual 
segments. We can not use current {{versionsHash}} (unless we cache all the 
individual version numbers), as it is not additive.  Consider following scenario
*Leader segments, versions and versionsHash*
*seg1* : 
 versions: 100, 101, 102  
  versionHash: hash(100) + hash(101) + hash(102)
*seg2*: 
 versions: 103, 104, 105
  versionHash: hash(103) + hash(104) + hash(105) 
\\ \\ *Replica segments, versions and hash*
*seg1*: 
 versions: 100, 101
  versionHash: hash(100) + hash(101) 
*seg2*: 
 versions: 102, 103, 104, 105
  versionHash: hash(102) + hash(103) + hash(104) + hash(105)
\\ \\Leader and Replica are essentially in sync, however using current method 
there is no way to compute and ensure cumulative {{versionHash}} of leader and 
replica would match. 
\\ \\Even if decide not to cache {{IndexFingerprint}} per segment but just to 
parallalize the computation, I think we still would run into issue mentioned 
above.

* I still need to figure out how to keep cache in   {{DefaultSolrCoreState}}, 
so that we can reuse {{IndexFingerprint}} of individual segments when a new 
Searcher is opened.  


was (Author: praste):
POC/Initial commit - 
https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13

There are two issues we still need to solve. 
* How to compute `versionsInHash` from `versionsInHash` of individual segments. 
We can not use current `versionsHash` (unless we cache all the individual 
version numbers), as it is not additive.  Consider following scenario
*Leader segments, versions and hash*
*seg1* : 
 versions: 100, 101, 102  
  versionHash: hash(100) + hash(101) + hash(102)
*seg2*: 
 versions: 103, 104, 105
  versionHash: hash(103) + hash(104) + hash(105) 
\\ \\ *Replica segments, versions and hash*
*seg1*: 
 versions: 100, 101
  versionHash: hash(100) + hash(101) 
*seg2*: 
 versions: 102, 103, 104, 105
  versionHash: hash(102) + hash(103) + hash(104) + hash(105)
\\ \\Leader and Replica are essentially in sync, however using current method 
there is no way to compute and ensure cumulative `versionHash` of leader and 
replica would match

* I still need to figure out how to keep cache in   `DefaultSolrCoreState`, so 
that we can reuse `IndexFingerprint` of individual segments when a new Searcher 
is opened.  

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9506) cache IndexFingerprint for each segment

2016-09-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521130#comment-15521130
 ] 

Pushkar Raste commented on SOLR-9506:
-

POC/Initial commit - 
https://github.com/praste/lucene-solr/commit/ca55daa9ea1eb23232173b50111b9068f1817c13

There are two issues we still need to solve. 
* How to compute `versionsInHash` from `versionsInHash` of individual segments. 
We can not use current `versionsHash` (unless we cache all the individual 
version numbers), as it is not additive.  Consider following scenario
*Leader segments, versions and hash*
*seg1* : 
 versions: 100, 101, 102  
  versionHash: hash(100) + hash(101) + hash(102)
*seg2*: 
 versions: 103, 104, 105
  versionHash: hash(103) + hash(104) + hash(105) 
\\ \\ *Replica segments, versions and hash*
*seg1*: 
 versions: 100, 101
  versionHash: hash(100) + hash(101) 
*seg2*: 
 versions: 102, 103, 104, 105
  versionHash: hash(102) + hash(103) + hash(104) + hash(105)
\\ \\Leader and Replica are essentially in sync, however using current method 
there is no way to compute and ensure cumulative `versionHash` of leader and 
replica would match

* I still need to figure out how to keep cache in   `DefaultSolrCoreState`, so 
that we can reuse `IndexFingerprint` of individual segments when a new Searcher 
is opened.  

> cache IndexFingerprint for each segment
> ---
>
> Key: SOLR-9506
> URL: https://issues.apache.org/jira/browse/SOLR-9506
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>
> The IndexFingerprint is cached per index searcher. it is quite useless during 
> high throughput indexing. If the fingerprint is cached per segment it will 
> make it vastly more efficient to compute the fingerprint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9310) PeerSync fails on a node restart due to IndexFingerPrint mismatch

2016-09-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520747#comment-15520747
 ] 

Pushkar Raste edited comment on SOLR-9310 at 9/25/16 12:51 PM:
---

I went through logs at 
https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-MacOSX/429/consoleFull 
If PeerSync was unsuccessful I would expect to see a line like 
{{o.a.s.u.PeerSync Fingerprint comparison: -1}} 

However, I don't see such line. I could think of two scenarios that could break 
the test 
* data directory could get deleted while a node is brought down, since data 
directory is created in {{temp}}. Upon restart replica would have no frame of 
reference and will have to fall back on replication.
* we need a better check than relying number of requests made to 
{{ReplicationHandler}}



was (Author: praste):
I went through logs in the failed test email notification but those are 
truncated. Where can I look at the entire build.log for the test. 

Only thing I could think of at this point is data directory could get deleted 
while a node is brought down, since data directory is created in {{temp}}. Upon 
restart replica would have no frame of reference and will have to fall back on 
replication.



> PeerSync fails on a node restart due to IndexFingerPrint mismatch
> -
>
> Key: SOLR-9310
> URL: https://issues.apache.org/jira/browse/SOLR-9310
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
> Fix For: 5.5.3, 6.3, trunk
>
> Attachments: PeerSync_3Node_Setup.jpg, PeerSync_Experiment.patch, 
> SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, 
> SOLR-9310.patch, SOLR-9310.patch, SOLR-9310_3ReplicaTest.patch, 
> SOLR-9310_5x.patch, SOLR-9310_final.patch
>
>
> I found that Peer Sync fails if a node restarts and documents were indexed 
> while node was down. IndexFingerPrint check fails after recovering node 
> applies updates. 
> This happens only when node restarts and not if node just misses updates due 
> reason other than it being down.
> Please check attached patch for the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9310) PeerSync fails on a node restart due to IndexFingerPrint mismatch

2016-09-25 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15520747#comment-15520747
 ] 

Pushkar Raste commented on SOLR-9310:
-

I went through logs in the failed test email notification but those are 
truncated. Where can I look at the entire build.log for the test. 

Only thing I could think of at this point is data directory could get deleted 
while a node is brought down, since data directory is created in {{temp}}. Upon 
restart replica would have no frame of reference and will have to fall back on 
replication.



> PeerSync fails on a node restart due to IndexFingerPrint mismatch
> -
>
> Key: SOLR-9310
> URL: https://issues.apache.org/jira/browse/SOLR-9310
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Assignee: Noble Paul
> Fix For: 5.5.3, 6.3, trunk
>
> Attachments: PeerSync_3Node_Setup.jpg, PeerSync_Experiment.patch, 
> SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, SOLR-9310.patch, 
> SOLR-9310.patch, SOLR-9310.patch, SOLR-9310_3ReplicaTest.patch, 
> SOLR-9310_5x.patch, SOLR-9310_final.patch
>
>
> I found that Peer Sync fails if a node restarts and documents were indexed 
> while node was down. IndexFingerPrint check fails after recovering node 
> applies updates. 
> This happens only when node restarts and not if node just misses updates due 
> reason other than it being down.
> Please check attached patch for the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-09-23 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516844#comment-15516844
 ] 

Pushkar Raste edited comment on SOLR-9546 at 9/23/16 4:08 PM:
--

I went through usage of most of the methods that return Wrapper types. I think 
`SolrParams` class is encouraging usage of wrapper types (or people might are 
just missing the fact that they might end up creating lot of wrapper objects). 
Here are few are some places which can use primitive types by passing a default 
value 

{{SolrParams.getInt()}}
* {{HashQParser.parse()}}
* {{TextLogisticRegressionQParser.parse()}}
* {{CloudMLTQParser.parse()}}
* {{SimpleMLTQParser.parse()}}

{{getBool()}}
* {{ZkController.rejoinShardElection()}}
* {{DumpRequestHandler.handleRequestBody()}}
* {{PingRequestHandler.handleRequestBody()}}
* {{MoreLikeThisComponent.process()}}
* {{BinaryResponseWriter.write()}}
* {{JSONResponseWriter.write()}}
* {{PHPResponseWriter.write()}}
* {{XMLResponseWriter.write()}}
JVM might do something smart for `Boolean` type, since there are only two 
possible values.


There are some *test* classes as well.

There are some other classes that do depend upon values being `null`. 

* I can modify all the places mentioned above to call get(param, df) 
version,  or 
* We can simply add `getPrimitive()` methods that return default value  
in absence of a param, to make it clear that these methods would return a 
primitive 


Another possibility, I am overthinking here :-), and this ticket can be closed.


was (Author: praste):
I went through usage of most of the methods that return Wrapper types. I think 
`SolrParams` class is encouraging usage of wrapper types (or people might are 
just missing the fact that they might end up creating lot of wrapper objects). 
Here are few are some places which can use primitive types by passing a default 
value 

{{SolrParams.getInt()}}
* {{HashQParser.parse()}}
* {{TextLogisticRegressionQParser.parse()}}
* {{CloudMLTQParser.parse()}}
* {{SimpleMLTQParser.parse()}}

{{getBool()}}
* {{ZkController.rejoinShardElection()}}
* {{DumpRequestHandler.handleRequestBody()}}
* {{PingRequestHandler.handleRequestBody()}}
* {{MoreLikeThisComponent.process()}}
* {{BinaryResponseWriter.write()}}
* {{JSONResponseWriter.write()}}
* {{PHPResponseWriter.write()}}
* {{XMLResponseWriter.write()}}

There are some *test* classes as well.

There are some other classes that do depend upon values being `null`. 

* I can modify all the places mentioned above to call get(param, df) 
version,  or 
* We can simply add `getPrimitive()` methods that return default value  
in absence of a param, to make it clear that these methods would return a 
primitive 


Another possibility, I am overthinking here :-), and this ticket can be closed.

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-09-23 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15516844#comment-15516844
 ] 

Pushkar Raste commented on SOLR-9546:
-

I went through usage of most of the methods that return Wrapper types. I think 
`SolrParams` class is encouraging usage of wrapper types (or people might are 
just missing the fact that they might end up creating lot of wrapper objects). 
Here are few are some places which can use primitive types by passing a default 
value 

{{SolrParams.getInt()}}
* {{HashQParser.parse()}}
* {{TextLogisticRegressionQParser.parse()}}
* {{CloudMLTQParser.parse()}}
* {{SimpleMLTQParser.parse()}}

{{getBool()}}
* {{ZkController.rejoinShardElection()}}
* {{DumpRequestHandler.handleRequestBody()}}
* {{PingRequestHandler.handleRequestBody()}}
* {{MoreLikeThisComponent.process()}}
* {{BinaryResponseWriter.write()}}
* {{JSONResponseWriter.write()}}
* {{PHPResponseWriter.write()}}
* {{XMLResponseWriter.write()}}

There are some *test* classes as well.

There are some other classes that do depend upon values being `null`. 

* I can modify all the places mentioned above to call get(param, df) 
version,  or 
* We can simply add `getPrimitive()` methods that return default value  
in absence of a param, to make it clear that these methods would return a 
primitive 


Another possibility, I am overthinking here :-), and this ticket can be closed.

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-09-21 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511196#comment-15511196
 ] 

Pushkar Raste edited comment on SOLR-9546 at 9/21/16 9:16 PM:
--

Got you.
I will fix the  {{Long getLong(String param, Long def)}} method only. It is not 
as bad as initially thought.

I don't even think that method is needed. Calling {{Long getLong(String 
param)}} would do the same thing, won't it?


was (Author: praste):
Got you.
I will fix the  {{Long getLong(String param, Long def)}} method only. It is not 
as bad as initially thought

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-09-21 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511196#comment-15511196
 ] 

Pushkar Raste commented on SOLR-9546:
-

Got you.
I will fix the  {{Long getLong(String param, Long def)}} method only. It is not 
as bad as initially thought

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-09-21 Thread Pushkar Raste (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pushkar Raste updated SOLR-9546:

Comment: was deleted

(was: That was just one example. Check {{getBool()}}, {{getFieldBool()}} 
methods those have the exact same problem, and there are many more.

I am not sure which way we should go (primitive vs Wrapped types) but I am 
inclined towards primitive types.)

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-09-21 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511132#comment-15511132
 ] 

Pushkar Raste edited comment on SOLR-9546 at 9/21/16 8:49 PM:
--

That was just one example. Check {{getBool()}}, {{getFieldBool()}} methods 
those have the exact same problem, and there are many more.

I am not sure which way we should go (primitive vs Wrapped types) but I am 
inclined towards primitive types.


was (Author: praste):
That was just one example check {{getBool()}}, {{getFieldBool()}} methods those 
have the exact same problem, and there are many more.

I am not sure which way we should go (primitive vs Wrapped types) but I am 
inclined towards primitive types.

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9546) There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class

2016-09-21 Thread Pushkar Raste (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511132#comment-15511132
 ] 

Pushkar Raste commented on SOLR-9546:
-

That was just one example check {{getBool()}}, {{getFieldBool()}} methods those 
have the exact same problem, and there are many more.

I am not sure which way we should go (primitive vs Wrapped types) but I am 
inclined towards primitive types.

> There is a lot of unnecessary boxing/unboxing going on in {{SolrParams}} class
> --
>
> Key: SOLR-9546
> URL: https://issues.apache.org/jira/browse/SOLR-9546
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pushkar Raste
>Priority: Minor
>
> Here is an excerpt 
> {code}
>   public Long getLong(String param, Long def) {
> String val = get(param);
> try {
>   return val== null ? def : Long.parseLong(val);
> }
> catch( Exception ex ) {
>   throw new SolrException( SolrException.ErrorCode.BAD_REQUEST, 
> ex.getMessage(), ex );
> }
>   }
> {code}
> {{Long.parseLong()}} returns a primitive type but since method expect to 
> return a {{Long}}, it needs to be wrapped. There are many more method like 
> that. We might be creating a lot of unnecessary objects here.
> I am not sure if JVM catches upto it and somehow optimizes it if these 
> methods are called enough times (or may be compiler does some modifications 
> at compile time)
> Let me know if I am thinking of some premature optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >