[jira] [Commented] (TEPHRA-253) TransactionProcessorTest is sometimes flaky

2017-09-09 Thread Andreas Neumann (JIRA)

[ 
https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160097#comment-16160097
 ] 

Andreas Neumann commented on TEPHRA-253:


Fix is to wait with the flush until the transaction state is loaded. 
PR: https://github.com/apache/incubator-tephra/pull/54

> TransactionProcessorTest is sometimes flaky
> ---
>
> Key: TEPHRA-253
> URL: https://issues.apache.org/jira/browse/TEPHRA-253
> Project: Tephra
>  Issue Type: Bug
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> The test sometimes fails as follows:
> {noformat}
> Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< 
> FAILURE!
> testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 1.526 sec
> testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.053 sec
> testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.288 sec  <<< FAILURE!
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [3]; expected:<4> but was:<1>
>   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
>   at org.junit.Assert.internalArrayEquals(Assert.java:473)
>   at org.junit.Assert.assertArrayEquals(Assert.java:294)
>   at org.junit.Assert.assertArrayEquals(Assert.java:305)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190)
> {noformat}
> It is not clear what is causing this, most likely the region server did not 
> have an up-to-date transaction state snapshot at the time of the lfush (that 
> might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where 
> flush() has no effect because the region is already flushing, 
> Let's observe this and gather more information when/if it happens again. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (TEPHRA-253) TransactionProcessorTest is sometimes flaky

2017-09-09 Thread Andreas Neumann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Neumann updated TEPHRA-253:
---
Fix Version/s: 0.13.0-incubating

> TransactionProcessorTest is sometimes flaky
> ---
>
> Key: TEPHRA-253
> URL: https://issues.apache.org/jira/browse/TEPHRA-253
> Project: Tephra
>  Issue Type: Bug
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> The test sometimes fails as follows:
> {noformat}
> Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< 
> FAILURE!
> testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 1.526 sec
> testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.053 sec
> testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.288 sec  <<< FAILURE!
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [3]; expected:<4> but was:<1>
>   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
>   at org.junit.Assert.internalArrayEquals(Assert.java:473)
>   at org.junit.Assert.assertArrayEquals(Assert.java:294)
>   at org.junit.Assert.assertArrayEquals(Assert.java:305)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190)
> {noformat}
> It is not clear what is causing this, most likely the region server did not 
> have an up-to-date transaction state snapshot at the time of the lfush (that 
> might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where 
> flush() has no effect because the region is already flushing, 
> Let's observe this and gather more information when/if it happens again. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEPHRA-253) TransactionProcessorTest is sometimes flaky

2017-09-09 Thread Andreas Neumann (JIRA)

[ 
https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160061#comment-16160061
 ] 

Andreas Neumann commented on TEPHRA-253:


Suspicion confirmed. After changing travis to dump the standard output of the 
test case, I see:
{noformat}
2017-09-09 19:18:16,851 - INFO  [main:o.a.h.h.r.RegionCoprocessorHost@196] - 
Load coprocessor org.apache.tephra.hbase.coprocessor.TransactionProcessor from 
HTD of TestRegionScanner successfully.
2017-09-09 19:18:16,868 - INFO  
[StoreOpener-fc704aec719b675f06e5d7bd12da85f0-1:o.a.h.h.r.c.CompactionConfiguration@85]
 - size [134217728, 9223372036854775807); files [3, 10); ratio 1.20; 
off-peak ratio 5.00; throttle point 2684354560; delete expired; major 
period 60480, major jitter 0.50
2017-09-09 19:18:16,883 - INFO  [main:o.a.h.h.r.HRegion@644] - Onlined 
fc704aec719b675f06e5d7bd12da85f0; next sequenceid=1
2017-09-09 19:18:16,883 - INFO  [main:o.a.t.h.c.TransactionProcessorTest@178] - 
Coprocessor is using transaction state: null
2017-09-09 19:18:16,926 - INFO  [main:o.a.t.h.c.TransactionProcessorTest@192] - 
Flushing region 
TestRegionScanner,,1504984696824.fc704aec719b675f06e5d7bd12da85f0.
2017-09-09 19:18:16,960 - INFO  [HDFSTransactionStateStorage 
STARTING:o.a.t.p.HDFSTransactionStateStorage@109] - Using snapshot dir 
/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit8165179254738335598
2017-09-09 19:18:16,981 - INFO  [TransactionStateCache 
STARTING:o.a.t.p.HDFSTransactionStateStorage@185] - Read encoded transaction 
snapshot of 84 bytes
2017-09-09 19:18:16,984 - INFO  [TransactionStateCache 
STARTING:o.a.t.c.TransactionStateCache@166] - Transaction state reloaded with 
snapshot from 1504984695267
2017-09-09 19:18:17,393 - INFO  [main:o.a.h.h.r.DefaultStoreFlusher@88] - 
Flushed, sequenceid=37, memsize=5.9 K, hasBloomFilter=true, into tmp file 
hdfs://localhost:53322/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit7077794411994061305/hbase/data/default/TestRegionScanner/fc704aec719b675f06e5d7bd12da85f0/.tmp/6e813e3b7af94e13afc9dc1303dda3f8
2017-09-09 19:18:17,415 - INFO  [main:o.a.h.h.r.HStore@770] - Added 
hdfs://localhost:53322/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit7077794411994061305/hbase/data/default/TestRegionScanner/fc704aec719b675f06e5d7bd12da85f0/f/6e813e3b7af94e13afc9dc1303dda3f8,
 entries=36, sequenceid=37, filesize=2.2 K
2017-09-09 19:18:17,416 - INFO  [main:o.a.h.h.r.HRegion@1708] - Finished 
memstore flush of ~5.9 K/6048, currentsize=0/0 for region 
TestRegionScanner,,1504984696824.fc704aec719b675f06e5d7bd12da85f0. in 489ms, 
sequenceid=37, compaction requested=false
{noformat}
Clearly, the flush begins before the transaction state is loaded. 

> TransactionProcessorTest is sometimes flaky
> ---
>
> Key: TEPHRA-253
> URL: https://issues.apache.org/jira/browse/TEPHRA-253
> Project: Tephra
>  Issue Type: Bug
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> The test sometimes fails as follows:
> {noformat}
> Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< 
> FAILURE!
> testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 1.526 sec
> testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.053 sec
> testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.288 sec  <<< FAILURE!
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [3]; expected:<4> but was:<1>
>   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
>   at org.junit.Assert.internalArrayEquals(Assert.java:473)
>   at org.junit.Assert.assertArrayEquals(Assert.java:294)
>   at org.junit.Assert.assertArrayEquals(Assert.java:305)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190)
> {noformat}
> It is not clear what is causing this, most likely the region server did not 
> have an up-to-date transaction state snapshot at the time of the lfush (that 
> might be due to TEPHRA-239 orTEPHRA-249, or it might b

[jira] [Assigned] (TEPHRA-253) TransactionProcessorTest is sometimes flaky

2017-09-09 Thread Andreas Neumann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Neumann reassigned TEPHRA-253:
--

Assignee: Andreas Neumann  (was: Poorna Chandra)

> TransactionProcessorTest is sometimes flaky
> ---
>
> Key: TEPHRA-253
> URL: https://issues.apache.org/jira/browse/TEPHRA-253
> Project: Tephra
>  Issue Type: Bug
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
>
> The test sometimes fails as follows:
> {noformat}
> Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< 
> FAILURE!
> testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 1.526 sec
> testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.053 sec
> testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.288 sec  <<< FAILURE!
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [3]; expected:<4> but was:<1>
>   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
>   at org.junit.Assert.internalArrayEquals(Assert.java:473)
>   at org.junit.Assert.assertArrayEquals(Assert.java:294)
>   at org.junit.Assert.assertArrayEquals(Assert.java:305)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190)
> {noformat}
> It is not clear what is causing this, most likely the region server did not 
> have an up-to-date transaction state snapshot at the time of the lfush (that 
> might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where 
> flush() has no effect because the region is already flushing, 
> Let's observe this and gather more information when/if it happens again. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (TEPHRA-241) Introduce a way to limit the size of a transaction

2017-09-09 Thread Andreas Neumann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEPHRA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Neumann resolved TEPHRA-241.

Resolution: Fixed

> Introduce a way to limit the size of a transaction
> --
>
> Key: TEPHRA-241
> URL: https://issues.apache.org/jira/browse/TEPHRA-241
> Project: Tephra
>  Issue Type: Improvement
>  Components: api, manager
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> When clients perform a huge number of writes in a short transaction, that can 
> result in huge change sets. For example, if a client performs 10M writes and 
> sends that change set over, that can easily be 1GB large. The transaction 
> manager will keep this in memory. It will also write this as an edit to the 
> transaction log.
> Assume it runs out of memory because the change set is too large. It crashes 
> and when it restarts, it will replay the log, load that huge change set 
> again, and crash again. 
> To prevent this kind of systemic failure, and to encourage developers to use 
> long transactions when performing many writes, we can introduce two new 
> properties in the configuration:
> - change set warn threshold: if a change set exceeds this size, a warning is 
> logged. 
> - change set reject threshold: if a change set exceeds this size, it is 
> rejected (canCommit will throw an exception) and that will fail the 
> transaction.
> Both thresholds should be Long.MAX_VALUE by default, to preserve existing 
> behavior after upgrade. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEPHRA-241) Introduce a way to limit the size of a transaction

2017-09-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEPHRA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159800#comment-16159800
 ] 

ASF GitHub Bot commented on TEPHRA-241:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-tephra/pull/48


> Introduce a way to limit the size of a transaction
> --
>
> Key: TEPHRA-241
> URL: https://issues.apache.org/jira/browse/TEPHRA-241
> Project: Tephra
>  Issue Type: Improvement
>  Components: api, manager
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> When clients perform a huge number of writes in a short transaction, that can 
> result in huge change sets. For example, if a client performs 10M writes and 
> sends that change set over, that can easily be 1GB large. The transaction 
> manager will keep this in memory. It will also write this as an edit to the 
> transaction log.
> Assume it runs out of memory because the change set is too large. It crashes 
> and when it restarts, it will replay the log, load that huge change set 
> again, and crash again. 
> To prevent this kind of systemic failure, and to encourage developers to use 
> long transactions when performing many writes, we can introduce two new 
> properties in the configuration:
> - change set warn threshold: if a change set exceeds this size, a warning is 
> logged. 
> - change set reject threshold: if a change set exceeds this size, it is 
> rejected (canCommit will throw an exception) and that will fail the 
> transaction.
> Both thresholds should be Long.MAX_VALUE by default, to preserve existing 
> behavior after upgrade. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-tephra pull request #48: [TEPHRA-241] Introduce a way to limit the...

2017-09-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-tephra/pull/48


---


[jira] [Commented] (TEPHRA-241) Introduce a way to limit the size of a transaction

2017-09-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEPHRA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159796#comment-16159796
 ] 

ASF GitHub Bot commented on TEPHRA-241:
---

Github user anew commented on the issue:

https://github.com/apache/incubator-tephra/pull/48
  
Same travis failure. This appears to happen a lot more frequently with Java 
8. I will commit this now and try to fix the flaky test before 0.13 release.


> Introduce a way to limit the size of a transaction
> --
>
> Key: TEPHRA-241
> URL: https://issues.apache.org/jira/browse/TEPHRA-241
> Project: Tephra
>  Issue Type: Improvement
>  Components: api, manager
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> When clients perform a huge number of writes in a short transaction, that can 
> result in huge change sets. For example, if a client performs 10M writes and 
> sends that change set over, that can easily be 1GB large. The transaction 
> manager will keep this in memory. It will also write this as an edit to the 
> transaction log.
> Assume it runs out of memory because the change set is too large. It crashes 
> and when it restarts, it will replay the log, load that huge change set 
> again, and crash again. 
> To prevent this kind of systemic failure, and to encourage developers to use 
> long transactions when performing many writes, we can introduce two new 
> properties in the configuration:
> - change set warn threshold: if a change set exceeds this size, a warning is 
> logged. 
> - change set reject threshold: if a change set exceeds this size, it is 
> rejected (canCommit will throw an exception) and that will fail the 
> transaction.
> Both thresholds should be Long.MAX_VALUE by default, to preserve existing 
> behavior after upgrade. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-tephra issue #48: [TEPHRA-241] Introduce a way to limit the size o...

2017-09-09 Thread anew
Github user anew commented on the issue:

https://github.com/apache/incubator-tephra/pull/48
  
Same travis failure. This appears to happen a lot more frequently with Java 
8. I will commit this now and try to fix the flaky test before 0.13 release.


---


[jira] [Commented] (TEPHRA-253) TransactionProcessorTest is sometimes flaky

2017-09-09 Thread Andreas Neumann (JIRA)

[ 
https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159795#comment-16159795
 ] 

Andreas Neumann commented on TEPHRA-253:


I suspect that this test is flaky because we call flush on the region very 
quickly after creating it. The coprocessor may not have read the transaction 
state at that time. Hence it would not remove invalid transactions recorded in 
that transaction state, and that fails the test. 

> TransactionProcessorTest is sometimes flaky
> ---
>
> Key: TEPHRA-253
> URL: https://issues.apache.org/jira/browse/TEPHRA-253
> Project: Tephra
>  Issue Type: Bug
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Poorna Chandra
>
> The test sometimes fails as follows:
> {noformat}
> Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< 
> FAILURE!
> testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 1.526 sec
> testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.053 sec
> testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.288 sec  <<< FAILURE!
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [3]; expected:<4> but was:<1>
>   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
>   at org.junit.Assert.internalArrayEquals(Assert.java:473)
>   at org.junit.Assert.assertArrayEquals(Assert.java:294)
>   at org.junit.Assert.assertArrayEquals(Assert.java:305)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190)
> {noformat}
> It is not clear what is causing this, most likely the region server did not 
> have an up-to-date transaction state snapshot at the time of the lfush (that 
> might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where 
> flush() has no effect because the region is already flushing, 
> Let's observe this and gather more information when/if it happens again. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-tephra pull request #54: wip

2017-09-09 Thread anew
GitHub user anew opened a pull request:

https://github.com/apache/incubator-tephra/pull/54

wip

on hold

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anew/incubator-tephra tephra-253

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-tephra/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #54


commit 140c1f0aa621fb7ab7a80c087b5be293d3b68035
Author: anew 
Date:   2017-09-09T06:43:55Z

wip




---