[jira] [Commented] (TEPHRA-253) TransactionProcessorTest is sometimes flaky
[ https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160097#comment-16160097 ] Andreas Neumann commented on TEPHRA-253: Fix is to wait with the flush until the transaction state is loaded. PR: https://github.com/apache/incubator-tephra/pull/54 > TransactionProcessorTest is sometimes flaky > --- > > Key: TEPHRA-253 > URL: https://issues.apache.org/jira/browse/TEPHRA-253 > Project: Tephra > Issue Type: Bug >Affects Versions: 0.12.0-incubating >Reporter: Andreas Neumann >Assignee: Andreas Neumann > Fix For: 0.13.0-incubating > > > The test sometimes fails as follows: > {noformat} > Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< > FAILURE! > testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 1.526 sec > testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 0.053 sec > testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 0.288 sec <<< FAILURE! > org.junit.internal.ArrayComparisonFailure: arrays first differed at element > [3]; expected:<4> but was:<1> > at > org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50) > at org.junit.Assert.internalArrayEquals(Assert.java:473) > at org.junit.Assert.assertArrayEquals(Assert.java:294) > at org.junit.Assert.assertArrayEquals(Assert.java:305) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190) > {noformat} > It is not clear what is causing this, most likely the region server did not > have an up-to-date transaction state snapshot at the time of the lfush (that > might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where > flush() has no effect because the region is already flushing, > Let's observe this and gather more information when/if it happens again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEPHRA-253) TransactionProcessorTest is sometimes flaky
[ https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Neumann updated TEPHRA-253: --- Fix Version/s: 0.13.0-incubating > TransactionProcessorTest is sometimes flaky > --- > > Key: TEPHRA-253 > URL: https://issues.apache.org/jira/browse/TEPHRA-253 > Project: Tephra > Issue Type: Bug >Affects Versions: 0.12.0-incubating >Reporter: Andreas Neumann >Assignee: Andreas Neumann > Fix For: 0.13.0-incubating > > > The test sometimes fails as follows: > {noformat} > Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< > FAILURE! > testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 1.526 sec > testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 0.053 sec > testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 0.288 sec <<< FAILURE! > org.junit.internal.ArrayComparisonFailure: arrays first differed at element > [3]; expected:<4> but was:<1> > at > org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50) > at org.junit.Assert.internalArrayEquals(Assert.java:473) > at org.junit.Assert.assertArrayEquals(Assert.java:294) > at org.junit.Assert.assertArrayEquals(Assert.java:305) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190) > {noformat} > It is not clear what is causing this, most likely the region server did not > have an up-to-date transaction state snapshot at the time of the lfush (that > might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where > flush() has no effect because the region is already flushing, > Let's observe this and gather more information when/if it happens again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEPHRA-253) TransactionProcessorTest is sometimes flaky
[ https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160061#comment-16160061 ] Andreas Neumann commented on TEPHRA-253: Suspicion confirmed. After changing travis to dump the standard output of the test case, I see: {noformat} 2017-09-09 19:18:16,851 - INFO [main:o.a.h.h.r.RegionCoprocessorHost@196] - Load coprocessor org.apache.tephra.hbase.coprocessor.TransactionProcessor from HTD of TestRegionScanner successfully. 2017-09-09 19:18:16,868 - INFO [StoreOpener-fc704aec719b675f06e5d7bd12da85f0-1:o.a.h.h.r.c.CompactionConfiguration@85] - size [134217728, 9223372036854775807); files [3, 10); ratio 1.20; off-peak ratio 5.00; throttle point 2684354560; delete expired; major period 60480, major jitter 0.50 2017-09-09 19:18:16,883 - INFO [main:o.a.h.h.r.HRegion@644] - Onlined fc704aec719b675f06e5d7bd12da85f0; next sequenceid=1 2017-09-09 19:18:16,883 - INFO [main:o.a.t.h.c.TransactionProcessorTest@178] - Coprocessor is using transaction state: null 2017-09-09 19:18:16,926 - INFO [main:o.a.t.h.c.TransactionProcessorTest@192] - Flushing region TestRegionScanner,,1504984696824.fc704aec719b675f06e5d7bd12da85f0. 2017-09-09 19:18:16,960 - INFO [HDFSTransactionStateStorage STARTING:o.a.t.p.HDFSTransactionStateStorage@109] - Using snapshot dir /home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit8165179254738335598 2017-09-09 19:18:16,981 - INFO [TransactionStateCache STARTING:o.a.t.p.HDFSTransactionStateStorage@185] - Read encoded transaction snapshot of 84 bytes 2017-09-09 19:18:16,984 - INFO [TransactionStateCache STARTING:o.a.t.c.TransactionStateCache@166] - Transaction state reloaded with snapshot from 1504984695267 2017-09-09 19:18:17,393 - INFO [main:o.a.h.h.r.DefaultStoreFlusher@88] - Flushed, sequenceid=37, memsize=5.9 K, hasBloomFilter=true, into tmp file hdfs://localhost:53322/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit7077794411994061305/hbase/data/default/TestRegionScanner/fc704aec719b675f06e5d7bd12da85f0/.tmp/6e813e3b7af94e13afc9dc1303dda3f8 2017-09-09 19:18:17,415 - INFO [main:o.a.h.h.r.HStore@770] - Added hdfs://localhost:53322/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit7077794411994061305/hbase/data/default/TestRegionScanner/fc704aec719b675f06e5d7bd12da85f0/f/6e813e3b7af94e13afc9dc1303dda3f8, entries=36, sequenceid=37, filesize=2.2 K 2017-09-09 19:18:17,416 - INFO [main:o.a.h.h.r.HRegion@1708] - Finished memstore flush of ~5.9 K/6048, currentsize=0/0 for region TestRegionScanner,,1504984696824.fc704aec719b675f06e5d7bd12da85f0. in 489ms, sequenceid=37, compaction requested=false {noformat} Clearly, the flush begins before the transaction state is loaded. > TransactionProcessorTest is sometimes flaky > --- > > Key: TEPHRA-253 > URL: https://issues.apache.org/jira/browse/TEPHRA-253 > Project: Tephra > Issue Type: Bug >Affects Versions: 0.12.0-incubating >Reporter: Andreas Neumann >Assignee: Andreas Neumann > Fix For: 0.13.0-incubating > > > The test sometimes fails as follows: > {noformat} > Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< > FAILURE! > testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 1.526 sec > testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 0.053 sec > testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 0.288 sec <<< FAILURE! > org.junit.internal.ArrayComparisonFailure: arrays first differed at element > [3]; expected:<4> but was:<1> > at > org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50) > at org.junit.Assert.internalArrayEquals(Assert.java:473) > at org.junit.Assert.assertArrayEquals(Assert.java:294) > at org.junit.Assert.assertArrayEquals(Assert.java:305) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190) > {noformat} > It is not clear what is causing this, most likely the region server did not > have an up-to-date transaction state snapshot at the time of the lfush (that > might be due to TEPHRA-239 orTEPHRA-249, or it might b
[jira] [Assigned] (TEPHRA-253) TransactionProcessorTest is sometimes flaky
[ https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Neumann reassigned TEPHRA-253: -- Assignee: Andreas Neumann (was: Poorna Chandra) > TransactionProcessorTest is sometimes flaky > --- > > Key: TEPHRA-253 > URL: https://issues.apache.org/jira/browse/TEPHRA-253 > Project: Tephra > Issue Type: Bug >Affects Versions: 0.12.0-incubating >Reporter: Andreas Neumann >Assignee: Andreas Neumann > > The test sometimes fails as follows: > {noformat} > Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< > FAILURE! > testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 1.526 sec > testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 0.053 sec > testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 0.288 sec <<< FAILURE! > org.junit.internal.ArrayComparisonFailure: arrays first differed at element > [3]; expected:<4> but was:<1> > at > org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50) > at org.junit.Assert.internalArrayEquals(Assert.java:473) > at org.junit.Assert.assertArrayEquals(Assert.java:294) > at org.junit.Assert.assertArrayEquals(Assert.java:305) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190) > {noformat} > It is not clear what is causing this, most likely the region server did not > have an up-to-date transaction state snapshot at the time of the lfush (that > might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where > flush() has no effect because the region is already flushing, > Let's observe this and gather more information when/if it happens again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (TEPHRA-241) Introduce a way to limit the size of a transaction
[ https://issues.apache.org/jira/browse/TEPHRA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Neumann resolved TEPHRA-241. Resolution: Fixed > Introduce a way to limit the size of a transaction > -- > > Key: TEPHRA-241 > URL: https://issues.apache.org/jira/browse/TEPHRA-241 > Project: Tephra > Issue Type: Improvement > Components: api, manager >Affects Versions: 0.12.0-incubating >Reporter: Andreas Neumann >Assignee: Andreas Neumann > Fix For: 0.13.0-incubating > > > When clients perform a huge number of writes in a short transaction, that can > result in huge change sets. For example, if a client performs 10M writes and > sends that change set over, that can easily be 1GB large. The transaction > manager will keep this in memory. It will also write this as an edit to the > transaction log. > Assume it runs out of memory because the change set is too large. It crashes > and when it restarts, it will replay the log, load that huge change set > again, and crash again. > To prevent this kind of systemic failure, and to encourage developers to use > long transactions when performing many writes, we can introduce two new > properties in the configuration: > - change set warn threshold: if a change set exceeds this size, a warning is > logged. > - change set reject threshold: if a change set exceeds this size, it is > rejected (canCommit will throw an exception) and that will fail the > transaction. > Both thresholds should be Long.MAX_VALUE by default, to preserve existing > behavior after upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEPHRA-241) Introduce a way to limit the size of a transaction
[ https://issues.apache.org/jira/browse/TEPHRA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159800#comment-16159800 ] ASF GitHub Bot commented on TEPHRA-241: --- Github user asfgit closed the pull request at: https://github.com/apache/incubator-tephra/pull/48 > Introduce a way to limit the size of a transaction > -- > > Key: TEPHRA-241 > URL: https://issues.apache.org/jira/browse/TEPHRA-241 > Project: Tephra > Issue Type: Improvement > Components: api, manager >Affects Versions: 0.12.0-incubating >Reporter: Andreas Neumann >Assignee: Andreas Neumann > Fix For: 0.13.0-incubating > > > When clients perform a huge number of writes in a short transaction, that can > result in huge change sets. For example, if a client performs 10M writes and > sends that change set over, that can easily be 1GB large. The transaction > manager will keep this in memory. It will also write this as an edit to the > transaction log. > Assume it runs out of memory because the change set is too large. It crashes > and when it restarts, it will replay the log, load that huge change set > again, and crash again. > To prevent this kind of systemic failure, and to encourage developers to use > long transactions when performing many writes, we can introduce two new > properties in the configuration: > - change set warn threshold: if a change set exceeds this size, a warning is > logged. > - change set reject threshold: if a change set exceeds this size, it is > rejected (canCommit will throw an exception) and that will fail the > transaction. > Both thresholds should be Long.MAX_VALUE by default, to preserve existing > behavior after upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-tephra pull request #48: [TEPHRA-241] Introduce a way to limit the...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-tephra/pull/48 ---
[jira] [Commented] (TEPHRA-241) Introduce a way to limit the size of a transaction
[ https://issues.apache.org/jira/browse/TEPHRA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159796#comment-16159796 ] ASF GitHub Bot commented on TEPHRA-241: --- Github user anew commented on the issue: https://github.com/apache/incubator-tephra/pull/48 Same travis failure. This appears to happen a lot more frequently with Java 8. I will commit this now and try to fix the flaky test before 0.13 release. > Introduce a way to limit the size of a transaction > -- > > Key: TEPHRA-241 > URL: https://issues.apache.org/jira/browse/TEPHRA-241 > Project: Tephra > Issue Type: Improvement > Components: api, manager >Affects Versions: 0.12.0-incubating >Reporter: Andreas Neumann >Assignee: Andreas Neumann > Fix For: 0.13.0-incubating > > > When clients perform a huge number of writes in a short transaction, that can > result in huge change sets. For example, if a client performs 10M writes and > sends that change set over, that can easily be 1GB large. The transaction > manager will keep this in memory. It will also write this as an edit to the > transaction log. > Assume it runs out of memory because the change set is too large. It crashes > and when it restarts, it will replay the log, load that huge change set > again, and crash again. > To prevent this kind of systemic failure, and to encourage developers to use > long transactions when performing many writes, we can introduce two new > properties in the configuration: > - change set warn threshold: if a change set exceeds this size, a warning is > logged. > - change set reject threshold: if a change set exceeds this size, it is > rejected (canCommit will throw an exception) and that will fail the > transaction. > Both thresholds should be Long.MAX_VALUE by default, to preserve existing > behavior after upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-tephra issue #48: [TEPHRA-241] Introduce a way to limit the size o...
Github user anew commented on the issue: https://github.com/apache/incubator-tephra/pull/48 Same travis failure. This appears to happen a lot more frequently with Java 8. I will commit this now and try to fix the flaky test before 0.13 release. ---
[jira] [Commented] (TEPHRA-253) TransactionProcessorTest is sometimes flaky
[ https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159795#comment-16159795 ] Andreas Neumann commented on TEPHRA-253: I suspect that this test is flaky because we call flush on the region very quickly after creating it. The coprocessor may not have read the transaction state at that time. Hence it would not remove invalid transactions recorded in that transaction state, and that fails the test. > TransactionProcessorTest is sometimes flaky > --- > > Key: TEPHRA-253 > URL: https://issues.apache.org/jira/browse/TEPHRA-253 > Project: Tephra > Issue Type: Bug >Affects Versions: 0.12.0-incubating >Reporter: Andreas Neumann >Assignee: Poorna Chandra > > The test sometimes fails as follows: > {noformat} > Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest > Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< > FAILURE! > testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 1.526 sec > testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 0.053 sec > testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest) > Time elapsed: 0.288 sec <<< FAILURE! > org.junit.internal.ArrayComparisonFailure: arrays first differed at element > [3]; expected:<4> but was:<1> > at > org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50) > at org.junit.Assert.internalArrayEquals(Assert.java:473) > at org.junit.Assert.assertArrayEquals(Assert.java:294) > at org.junit.Assert.assertArrayEquals(Assert.java:305) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585) > at > org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190) > {noformat} > It is not clear what is causing this, most likely the region server did not > have an up-to-date transaction state snapshot at the time of the lfush (that > might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where > flush() has no effect because the region is already flushing, > Let's observe this and gather more information when/if it happens again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-tephra pull request #54: wip
GitHub user anew opened a pull request: https://github.com/apache/incubator-tephra/pull/54 wip on hold You can merge this pull request into a Git repository by running: $ git pull https://github.com/anew/incubator-tephra tephra-253 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-tephra/pull/54.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #54 commit 140c1f0aa621fb7ab7a80c087b5be293d3b68035 Author: anew Date: 2017-09-09T06:43:55Z wip ---