[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-6920: -- Attachment: SOLR-6920-5x.patch New patch - cleaned it up a bit, handled Varun's comment. During replication use checksums to verify if files are the same Key: SOLR-6920 URL: https://issues.apache.org/jira/browse/SOLR-6920 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Attachments: SOLR-6920-5x.patch, SOLR-6920-5x.patch, SOLR-6920-5x.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch Currently we check if an index file on the master and slave is the same by checking if it's name and file length match. With LUCENE-2446 we now have a checksums for each index file in the segment. We should leverage this to verify if two files are the same. Places like SnapPuller.isIndexStale and SnapPuller.downloadIndexFiles should check against the checksum also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-6920: -- Attachment: SOLR-6920-5x.patch Been thinking about adding this since last night - should go in the trunk version too - this patch will also company any files under 100kb if there is no checksum - more future proof for new small files, alternate codec proof, etc. During replication use checksums to verify if files are the same Key: SOLR-6920 URL: https://issues.apache.org/jira/browse/SOLR-6920 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Attachments: SOLR-6920-5x.patch, SOLR-6920-5x.patch, SOLR-6920-5x.patch, SOLR-6920-5x.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch Currently we check if an index file on the master and slave is the same by checking if it's name and file length match. With LUCENE-2446 we now have a checksums for each index file in the segment. We should leverage this to verify if two files are the same. Places like SnapPuller.isIndexStale and SnapPuller.downloadIndexFiles should check against the checksum also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-6920: -- Attachment: SOLR-6920.patch Another patch incorporating Varun's feedback and minor tweaks. During replication use checksums to verify if files are the same Key: SOLR-6920 URL: https://issues.apache.org/jira/browse/SOLR-6920 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Attachments: SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch Currently we check if an index file on the master and slave is the same by checking if it's name and file length match. With LUCENE-2446 we now have a checksums for each index file in the segment. We should leverage this to verify if two files are the same. Places like SnapPuller.isIndexStale and SnapPuller.downloadIndexFiles should check against the checksum also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6920: Attachment: SOLR-6920.patch 1. Changed the log message in {{SnapPuller.compareFile()}} since checksum won't be present in case of exception. From - {{LOG.warn(File {} did not match. expected checksum is {} and actual is checksum {}. + expected length is {} and actual length is {}, filename, backupIndexFileChecksum indexFileChecksum, backupIndexFileLen, indexFileLen);}} To - {{LOG.warn(File {} did not match. expected length is {} and actual length is {}, filename,backupIndexFileLen, indexFileLen);}} 2. In {{SnapPuller.downloadIndexFiles}} made {{(String) file.get(NAME))}} into a variable for better readibility. 3. The if condition still needs tweaking? We could still have a non checksummed/checksum threw error .si/.liv/segments_n file and be equal in length and we wouldn't re-download? Maybe the condition could be the check you initially proposed - 1. If file is a .si/.liv/segments_n file download regardless 2. else if if (!compareResult.equal || downloadCompleteIndex) then re-download? During replication use checksums to verify if files are the same Key: SOLR-6920 URL: https://issues.apache.org/jira/browse/SOLR-6920 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Attachments: SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch Currently we check if an index file on the master and slave is the same by checking if it's name and file length match. With LUCENE-2446 we now have a checksums for each index file in the segment. We should leverage this to verify if two files are the same. Places like SnapPuller.isIndexStale and SnapPuller.downloadIndexFiles should check against the checksum also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-6920: -- Attachment: SOLR-6920-5x.patch First pass at the 5x version. During replication use checksums to verify if files are the same Key: SOLR-6920 URL: https://issues.apache.org/jira/browse/SOLR-6920 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Attachments: SOLR-6920-5x.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch Currently we check if an index file on the master and slave is the same by checking if it's name and file length match. With LUCENE-2446 we now have a checksums for each index file in the segment. We should leverage this to verify if two files are the same. Places like SnapPuller.isIndexStale and SnapPuller.downloadIndexFiles should check against the checksum also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-6920: -- Attachment: SOLR-6920-5x.patch New 5x patch to fix where Version is parsed in SnapPuller. During replication use checksums to verify if files are the same Key: SOLR-6920 URL: https://issues.apache.org/jira/browse/SOLR-6920 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Attachments: SOLR-6920-5x.patch, SOLR-6920-5x.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch Currently we check if an index file on the master and slave is the same by checking if it's name and file length match. With LUCENE-2446 we now have a checksums for each index file in the segment. We should leverage this to verify if two files are the same. Places like SnapPuller.isIndexStale and SnapPuller.downloadIndexFiles should check against the checksum also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-6920: -- Priority: Critical (was: Major) During replication use checksums to verify if files are the same Key: SOLR-6920 URL: https://issues.apache.org/jira/browse/SOLR-6920 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Varun Thacker Priority: Critical Attachments: SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch Currently we check if an index file on the master and slave is the same by checking if it's name and file length match. With LUCENE-2446 we now have a checksums for each index file in the segment. We should leverage this to verify if two files are the same. Places like SnapPuller.isIndexStale and SnapPuller.downloadIndexFiles should check against the checksum also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-6920: -- Attachment: SOLR-6920.patch Here is a first pass at making the patch suitable for trunk. During replication use checksums to verify if files are the same Key: SOLR-6920 URL: https://issues.apache.org/jira/browse/SOLR-6920 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Varun Thacker Priority: Critical Attachments: SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch, SOLR-6920.patch Currently we check if an index file on the master and slave is the same by checking if it's name and file length match. With LUCENE-2446 we now have a checksums for each index file in the segment. We should leverage this to verify if two files are the same. Places like SnapPuller.isIndexStale and SnapPuller.downloadIndexFiles should check against the checksum also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6920: Attachment: SOLR-6920.patch Updated patch. This handles the back compat check correctly While running the tests I got a failure which can be reproduced with {noformat}ant test -Dtestcase=SyncSliceTest -Dtests.method=test -Dtests.seed=588DD6F3A8F57A44 -Dtests.slow=true -Dtests.locale=no_NO_NY -Dtests.timezone=America/Bahia -Dtests.asserts=true -Dtests.file.encoding=UTF-8{noformat} The exception thrown is - {code} 131990 T79 C17 P63349 oasc.SolrException.log ERROR java.lang.ArrayIndexOutOfBoundsException: -8 at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:73) at org.apache.lucene.store.DataInput.readInt(DataInput.java:98) at org.apache.lucene.store.MockIndexInputWrapper.readInt(MockIndexInputWrapper.java:159) at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:414) at org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUtil.java:401) at org.apache.solr.handler.ReplicationHandler.getFileList(ReplicationHandler.java:445) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:212) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:142) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) {code} On debugging I found out the file which was causing it - {{_0_MockRandom_0.sd}} . This is a MockRandomPostingsFormat.SEED_EXT fille. Adding this to SyncSliceTest fixed the fail - {noformat}@LuceneTestCase.SuppressCodecs({ MockRandom }){noformat} but any other test could end up using it causing a failure. Any idea on how to tackle it During replication use checksums to verify if files are the same Key: SOLR-6920 URL: https://issues.apache.org/jira/browse/SOLR-6920 Project: Solr
[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6920: Attachment: SOLR-6920.patch Updated patch. Tests pass. The difference from the earlier patch is the API usage of SegmentInfos For reference, here is the link to the question I asked a question on the lucene user mailing list about the SegmentInfos API usage - http://mail-archives.apache.org/mod_mbox/lucene-java-user/201501.mbox/%3CCAEH2wZDm%2BEXEhWEyp9RoQDVffb7jJSG31A3WVGxV_TNCE%3D12zA%40mail.gmail.com%3E During replication use checksums to verify if files are the same Key: SOLR-6920 URL: https://issues.apache.org/jira/browse/SOLR-6920 Project: Solr Issue Type: Bug Components: replication (java) Reporter: Varun Thacker Attachments: SOLR-6920.patch, SOLR-6920.patch Currently we check if an index file on the master and slave is the same by checking if it's name and file length match. With LUCENE-2446 we now have a checksums for each index file in the segment. We should leverage this to verify if two files are the same. Places like SnapPuller.isIndexStale and SnapPuller.downloadIndexFiles should check against the checksum also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6920) During replication use checksums to verify if files are the same
[ https://issues.apache.org/jira/browse/SOLR-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6920: Attachment: SOLR-6920.patch Few tests fail with this patch - {code} [junit4] Tests with failures: [junit4] - org.apache.solr.cloud.BasicDistributedZk2Test.testDistribSearch [junit4] - org.apache.solr.cloud.ShardSplitTest.testDistribSearch [junit4] - org.apache.solr.cloud.SyncSliceTest.testDistribSearch [junit4] - org.apache.solr.cloud.RecoveryZkTest.testDistribSearch {code} I picked ShardSplitTest.testDistribSearch and started investigating why is it failing always with the patch - I was seeing the following stack trace - {code} 53340 T166 C65 P58357 oasc.SolrException.log ERROR SnapPull failed :org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1603) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1715) at org.apache.solr.handler.SnapPuller.openNewSearcherAndUpdateCommitPoint(SnapPuller.java:680) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:496) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:340) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:163) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:447) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235) Caused by: java.io.FileNotFoundException: _1_2.liv in dir=RAMDirectory@2fdfb9bd lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@91b3fd9 at org.apache.lucene.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:655) at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:110) at org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat.readLiveDocs(Lucene50LiveDocsFormat.java:84) at org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat.readLiveDocs(AssertingLiveDocsFormat.java:64) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:101) at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:134) at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:186) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:94) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:430) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:268) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:203) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1522) ... 7 more 53344 T166 C65 P58357 oasc.SolrException.log ERROR Error while trying to recover:org.apache.solr.common.SolrException: Replication for recovery failed. at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:166) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:447) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:235) {code} This looked very similar to the problem we were trying to solve in SOLR-6640. So I applied the patch from SOLR-6640 along with this patch and now I am seeing these - {code} 392008 T11 oasc.Diagnostics.logThreadDumps ERROR Gave up waiting for recovery to finish. THREAD DUMP: qtp2004060070-174 Id=174 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@42da41bf at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@42da41bf at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:342) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:526) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:44) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572) at java.lang.Thread.run(Thread.java:745) RecoveryThread-collection1_shard1_1_replica2 Id=168 TIMED_WAITING on java.lang.Object@1398321c at java.lang.Object.wait(Native Method) - waiting on