[jira] [Assigned] (KUDU-2618) Factor the amount of data into time-based flush decisions

2019-07-25 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2618:
---

Assignee: (was: Will Berkeley)

> Factor the amount of data into time-based flush decisions
> -
>
> Key: KUDU-2618
> URL: https://issues.apache.org/jira/browse/KUDU-2618
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Will Berkeley
>Priority: Major
>
> Pure time-based flush can cause small rowset problems when the rate of 
> inserts is so low that hardly any data accumulates before it is flushed.
> On the other hand, cribbing an example from Todd from the KUDU-1400 design 
> doc:
> bq. if you configure your TS to allow 100G of heap, and insert 30G of data 
> spread across 30 tablets (1G each tablet being lower than the default 
> size-based threshold), would you want it to ever flush to disk? or just sit 
> there in RAM? The restart could be relatively slow if it never flushed, and 
> also scans of MRS are slower than DRS.
> As Todd goes on to say
> bq. That said, we could probably make the "time-based flush" somehow related 
> to the amount of data, so that we wait a long time to flush if it's only 
> 10kb, but still flush relatively quickly if it's many MB.
> We should tune time-based flush to wait on average a shorter time to flush if 
> the amount to flush is enough for 1 or more "full-sized" diskrowsets than if 
> the flush is of less data than a full diskrowset.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (KUDU-2629) TestHybridTime is flaky

2019-06-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2629.
-
   Resolution: Fixed
 Assignee: Will Berkeley
Fix Version/s: 1.10.0

I fixed this test with eb9cc53095a53a9dabc985b2fab91f76be013d47.

> TestHybridTime is flaky
> ---
>
> Key: KUDU-2629
> URL: https://issues.apache.org/jira/browse/KUDU-2629
> Project: Kudu
>  Issue Type: Bug
>  Components: java, test
>Reporter: Andrew Wong
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: TEST-org.apache.kudu.client.TestHybridTime.xml
>
>
> I saw three back-to-back failures of TestHybridTime in which a scan returned 
> an unexpected number of rows. I've attached the XML for the test and its 
> retries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1521) Flakiness in TestAsyncKuduSession

2019-06-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-1521.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

I think I've cleaned up most of the flakiness here.

> Flakiness in TestAsyncKuduSession
> -
>
> Key: KUDU-1521
> URL: https://issues.apache.org/jira/browse/KUDU-1521
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.9.1
>Reporter: Adar Dembo
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: 
> org.apache.kudu.client.TestAsyncKuduSession-TableIsDeleted-output.txt, 
> org.apache.kudu.client.TestAsyncKuduSession-output.txt, 
> org.apache.kudu.client.TestAsyncKuduSession.test.log.xz
>
>
>  I've been trying to parse the various failures in 
> http://104.196.14.100/job/kudu-gerrit/2270/BUILD_TYPE=RELEASE. Here's what I 
> see in the test:
> The way test() tests AUTO_FLUSH_BACKGROUND is inherently flaky; a delay while 
> running test code will give the background flush task a chance to fire when 
> the test code doesn't expect it. I've seen this cause lead to no 
> PleaseThrottleException, but I suspect the first block of test code dealing 
> with background flushes is flaky too (since it's testing elapsed time).
> There's also some test failures that I can't figure out. I've pasted them 
> below for posterity:
> {noformat}
> 03:52:14 
> testGetTableLocationsErrorCauseSessionStuck(org.kududb.client.TestAsyncKuduSession)
>   Time elapsed: 100.009 sec  <<< ERROR!
> 03:52:14 java.lang.Exception: test timed out after 10 milliseconds
> 03:52:14  at java.lang.Object.wait(Native Method)
> 03:52:14  at java.lang.Object.wait(Object.java:503)
> 03:52:14  at com.stumbleupon.async.Deferred.doJoin(Deferred.java:1136)
> 03:52:14  at com.stumbleupon.async.Deferred.join(Deferred.java:1019)
> 03:52:14  at 
> org.kududb.client.TestAsyncKuduSession.testGetTableLocationsErrorCauseSessionStuck(TestAsyncKuduSession.java:133)
> 03:52:14 
> 03:52:14 
> testBatchErrorCauseSessionStuck(org.kududb.client.TestAsyncKuduSession)  Time 
> elapsed: 0.199 sec  <<< ERROR!
> 03:52:14 org.kududb.client.MasterErrorException: Server[Kudu Master - 
> 127.13.215.1:64030] NOT_FOUND[code 1]: The table was deleted: Table deleted 
> at 2016-07-09 03:50:24 UTC
> 03:52:14  at 
> org.kududb.client.TabletClient.dispatchMasterErrorOrReturnException(TabletClient.java:533)
> 03:52:14  at org.kududb.client.TabletClient.decode(TabletClient.java:463)
> 03:52:14  at org.kududb.client.TabletClient.decode(TabletClient.java:83)
> 03:52:14  at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500)
> 03:52:14  at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> 03:52:14  at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> 03:52:14  at 
> org.kududb.client.TabletClient.handleUpstream(TabletClient.java:638)
> 03:52:14  at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> 03:52:14  at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
> 03:52:14  at 
> org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184)
> 03:52:14  at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> 03:52:14  at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
> 03:52:14  at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
> 03:52:14  at 
> org.kududb.client.AsyncKuduClient$TabletClientPipeline.sendUpstream(AsyncKuduClient.java:1877)
> 03:52:14  at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> 03:52:14  at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> 03:52:14  at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> 03:52:14  at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
> 03:52:14  at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
> 03:52:14  at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
> 03:52:14  at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> 03:52:14  at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> 03:52:14  at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker

[jira] [Resolved] (KUDU-975) Review Java API for alter schema

2019-06-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-975.

   Resolution: Fixed
 Assignee: Will Berkeley
Fix Version/s: 1.8.0

Seems like this was done a long time ago with KUDU-861. New issues can be 
opened for alter API feature requests.

> Review Java API for alter schema
> 
>
> Key: KUDU-975
> URL: https://issues.apache.org/jira/browse/KUDU-975
> Project: Kudu
>  Issue Type: Improvement
>  Components: api, client
>Affects Versions: Private Beta
>Reporter: Todd Lipcon
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.8.0
>
>
> Shoudl review these APIs and make sure they are reasonable (and support 
> things like changing column encoding/compression)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1915) reduce exception spew in common Java client error conditions

2019-06-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-1915.
-
   Resolution: Fixed
 Assignee: Todd Lipcon
Fix Version/s: 1.8.0

Looks like the commits ead756844ce9ada904fcc3666df25692f63e76b8 
ce0db915787b58a79109e6faecc6f1daef9f2850 
cdc73b900608000ff2d2f8bc74c05893338454ba helped out a lot with this. I'd say 
this is fixed for now.

> reduce exception spew in common Java client error conditions
> 
>
> Key: KUDU-1915
> URL: https://issues.apache.org/jira/browse/KUDU-1915
> Project: Kudu
>  Issue Type: Improvement
>  Components: client, java
>Affects Versions: 1.3.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
>  Labels: usability
> Fix For: 1.8.0
>
>
> Currently in somewhat common conditions (eg trying to use the client without 
> krb5 credentials on a secure cluster) the Java client spews pages and pages 
> of exceptions. We should make a concerted effort to clean this up so that 
> users have a good experience with the client.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2659) java kudu-client session can be used again after close

2019-06-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2659.
-
   Resolution: Fixed
Fix Version/s: 1.9.0

9443a69b0d81c65a174874661e90a13e5792d55b addressed this. It concluded that it 
would be a breaking change to throw if an operation is applied to a closed 
session, so it logs at the warning level instead.

> java kudu-client session can be used again after close
> --
>
> Key: KUDU-2659
> URL: https://issues.apache.org/jira/browse/KUDU-2659
> Project: Kudu
>  Issue Type: Bug
>Reporter: KarlManong
>Priority: Major
> Fix For: 1.9.0
>
>
> java kudu-client session can be used again after close. But the client 
> removed it from the set.
> {code:java}
> // Some comments here
>   Insert insert = createBasicSchemaInsert(table, 0);
>   session.apply(insert);
>   session.close();
>   assertTrue(session.isClosed());
>   insert = createBasicSchemaInsert(table, 1);
>   session.apply(insert);
>   fail(); //except failed
> {code}
> see [github pull request 15|https://github.com/apache/kudu/pull/15]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2061) Java Client Not Honoring setIgnoreAllDuplicateRows When Inserting Duplicate Values

2019-06-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2061.
-
   Resolution: Fixed
 Assignee: Will Berkeley
Fix Version/s: 1.7.0

The examples were improved awhile ago when they were moved into the main repo. 
The examples now check for row errors after flushing.

> Java Client Not Honoring setIgnoreAllDuplicateRows When Inserting Duplicate 
> Values
> --
>
> Key: KUDU-2061
> URL: https://issues.apache.org/jira/browse/KUDU-2061
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 1.3.1, 1.5.0
>Reporter: Scott Black
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.7.0
>
> Attachments: Sample.java
>
>
> Duplicate values on insert are not causing warning/error to be returned when 
> setIgnoreAllDuplicateRows is set to false. This is silently causing data loss.
> Test case. Use the example code from 
> [https://github.com/cloudera/kudu-examples/blob/master/java/java-sample/src/main/java/org/kududb/examples/sample/Sample.java].
>  Change line 43 to insert an constant. 3 inserts will execute but only a 
> single row with no error is thrown. See KUDU-1563 as it seems all inserts are 
> now ignore inserts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2728) A TabletService queue overflow on a write causes a GetTableLocations call in the Java client

2019-06-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2728.
-
   Resolution: Fixed
 Assignee: Will Berkeley
Fix Version/s: 1.10.0

I later re-discovered this and then actually fixed it with 
2b5b7372f27a1009209bd30f86b2a725e9ba58eb.

> A TabletService queue overflow on a write causes a GetTableLocations call in 
> the Java client
> 
>
> Key: KUDU-2728
> URL: https://issues.apache.org/jira/browse/KUDU-2728
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.10.0
>
>
> If the Java client receives a ServiceUnavailable from the RPC layer (as 
> opposed to as a tablet server error), it treats that error like a "Tablet not 
> found error". See 
> https://github.com/apache/kudu/blob/branch-1.9.x/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java#L237
>  and 
> https://github.com/apache/kudu/blob/branch-1.9.x/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java#L410.
>  When a write operation sent to the tablet leader is rejected from the 
> service queue, this logic causes the Java client to lookup the locations for 
> the table again. This is wasteful, and can result in hundreds or thousands of 
> GTL calls to the master. Usually, this isn't a problem for the master, but 
> I've seen a case where floods of GTL calls for a table with 1000+ tablets 
> caused master service queue overflows, and triggered KUDU-2710. It's 
> wasteful, in any case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1270) java client: Confusing semantics for writers

2019-06-13 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863361#comment-16863361
 ] 

Will Berkeley commented on KUDU-1270:
-

[~mpercy] Do you have a suggestion for what should be done? I think the docs 
are clear on the different semantics. The confusing part is that one method 
changes its semantics based on the flush mode, but we are stuck with that 
because the API must remain compatible.

> java client: Confusing semantics for writers
> 
>
> Key: KUDU-1270
> URL: https://issues.apache.org/jira/browse/KUDU-1270
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: Public beta
>Reporter: Mike Percy
>Priority: Trivial
>
> The behavior of the Java client is pretty confusing when it comes to errors. 
> The javadoc for {{KuduSession.Flush()}} indicates that "if any errors 
> occurred" then an Exception will be thrown. However if there is a key error 
> then you have to look at the results that come back for errors.
> At the time of writing, this is what the API docs say:
> public OperationResponse apply(Operation operation) throws Exception
> Blocking call with a different behavior based on the flush mode. 
> PleaseThrottleException is managed by this method and will not be thrown, 
> unlike AsyncKuduSession.apply(org.kududb.client.Operation).
> AUTO_FLUSH_SYNC: the call returns when the operation is persisted, else it 
> throws an exception.
> AUTO_FLUSH_BACKGROUND: the call returns when the operation has been added to 
> the buffer. The operation's state is then unreachable, meaning that there's 
> no way to know if the operation is persisted. This call should normally 
> perform only fast in-memory operations but it may have to wait when the 
> buffer is full and there's another buffer being flushed.
> MANUAL_FLUSH: the call returns when the operation has been added to the 
> buffer, else it throws an exception such as a NonRecoverableException if the 
> buffer is full.
> Parameters:
> operation - operation to apply
> Returns:
> an OperationResponse for the applied Operation
> Throws:
> Exception - if anything went wrong
> public List flush()
>   throws Exception
> Blocking call that force flushes this session's buffers. Data is persisted 
> when this call returns, else it will throw an exception.
> Returns:
> a list of OperationResponse, one per operation that was flushed
> Throws:
> Exception - if anything went wrong. If it's an issue with some or all 
> batches, it will be of type DeferredGroupException.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1017) Simpler Java synchronous API for iteration

2019-06-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-1017.
-
   Resolution: Implemented
 Assignee: Grant Henke
Fix Version/s: 1.10.0

Grant made the KuduScanner iterable with 
5735f07f8b6aaf5134c66863519052936aa7487f. I think that improves ease-of-use 
enough to call this done.

> Simpler Java synchronous API for iteration
> --
>
> Key: KUDU-1017
> URL: https://issues.apache.org/jira/browse/KUDU-1017
> Project: Kudu
>  Issue Type: New Feature
>  Components: client
>Affects Versions: M5
>Reporter: Erick Tryzelaar
>Assignee: Grant Henke
>Priority: Major
>  Labels: hackathon-feedback
> Fix For: 1.10.0
>
>
> The Java API for synchronous iteration is currently a little more complicated 
> than it might need to be, at least for the simple use case. Right now a 
> {{KuduScanner}} is an iterator over {{RowResultIterator}} iterators, which 
> then need to be iterated over to produce {{RowResult}} objects. I expect many 
> users of this synchronous API would prefer an API that hides the intermediate 
> {{RowResultIterator}}, like 
> [here|http://github.mtv.cloudera.com/erickt/titan/blob/ea2683c92fd2dd79df0b6359f5d5520a78aea637/titan-kudu/src/main/java/com/cloudera/titan/diskstorage/kudu/KuduKeyValueStore.java#L340-L376].
>  Could we get something like this in the API?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2767) Java test TestAuthTokenReacquire is flaky

2019-06-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2767.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Fixed by 5533478b0b50bddd648334e69bad35cda321e9da.

> Java test TestAuthTokenReacquire is flaky
> -
>
> Key: KUDU-2767
> URL: https://issues.apache.org/jira/browse/KUDU-2767
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: Hao Hao
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.10.0
>
> Attachments: test-output.txt
>
>
> I saw TestAuthTokenReacquire failed with the following error:
> {noformat}
> Time: 23.362
> There was 1 failure:
> 1) testBasicMasterOperations(org.apache.kudu.client.TestAuthTokenReacquire)
> java.lang.AssertionError: test failed: unexpected errors
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.kudu.client.TestAuthTokenReacquire.testBasicMasterOperations(TestAuthTokenReacquire.java:153)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at 
> org.apache.kudu.test.junit.RetryRule$RetryStatement.doOneAttempt(RetryRule.java:195)
>   at 
> org.apache.kudu.test.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:212)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>   at org.junit.runner.JUnitCore.runMain(JUnitCore.java:77)
>   at org.junit.runner.JUnitCore.main(JUnitCore.java:36)
> FAILURES!!!
> Tests run: 2,  Failures: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2767) Java test TestAuthTokenReacquire is flaky

2019-06-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2767:
---

Assignee: Will Berkeley

> Java test TestAuthTokenReacquire is flaky
> -
>
> Key: KUDU-2767
> URL: https://issues.apache.org/jira/browse/KUDU-2767
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: Hao Hao
>Assignee: Will Berkeley
>Priority: Major
> Attachments: test-output.txt
>
>
> I saw TestAuthTokenReacquire failed with the following error:
> {noformat}
> Time: 23.362
> There was 1 failure:
> 1) testBasicMasterOperations(org.apache.kudu.client.TestAuthTokenReacquire)
> java.lang.AssertionError: test failed: unexpected errors
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.kudu.client.TestAuthTokenReacquire.testBasicMasterOperations(TestAuthTokenReacquire.java:153)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
>   at 
> org.apache.kudu.test.junit.RetryRule$RetryStatement.doOneAttempt(RetryRule.java:195)
>   at 
> org.apache.kudu.test.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:212)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runners.Suite.runChild(Suite.java:128)
>   at org.junit.runners.Suite.runChild(Suite.java:27)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
>   at org.junit.runner.JUnitCore.runMain(JUnitCore.java:77)
>   at org.junit.runner.JUnitCore.main(JUnitCore.java:36)
> FAILURES!!!
> Tests run: 2,  Failures: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2831) DistributedDataGeneratorTest.testGenerateRandomData is flaky

2019-05-30 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2831:
---

Assignee: Will Berkeley

> DistributedDataGeneratorTest.testGenerateRandomData is flaky
> 
>
> Key: KUDU-2831
> URL: https://issues.apache.org/jira/browse/KUDU-2831
> Project: Kudu
>  Issue Type: Bug
>  Components: spark, test
>Affects Versions: 1.10.0
>Reporter: Adar Dembo
>Assignee: Will Berkeley
>Priority: Major
>
> Saw this once last month and again today, so not super flaky but still worth 
> fixing:
> {noformat}
> 1) 
> testGenerateRandomData(org.apache.kudu.spark.tools.DistributedDataGeneratorTest)
> java.lang.AssertionError: expected:<100> but was:<99>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> org.apache.kudu.spark.tools.DistributedDataGeneratorTest.testGenerateRandomData(DistributedDataGeneratorTest.scala:58)
> {noformat}
> I talked about this with [~granthenke] when it last happened. The issue 
> appears to be in the LongAccumulator used to track collisions in the data 
> generator. Before the failure, the test logged this:
> {noformat}
> 02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:134) Rows written: 
> 99
> 02:22:39.533 [INFO - main] (DistributedDataGenerator.scala:135) Collisions: 1
> {noformat}
> The assert code looks like this:
> {noformat}
> val collisions = ss.sparkContext.longAccumulator("row_collisions").value
> // Collisions could cause the number of row to be less than the number 
> set.
> assertEquals(numRows - collisions, rdd.collect.length)
> {noformat}
> So the value of this LongAccumulator was zero even though there was one 
> collision. Our thinking was that accumulators like these were updated 
> asynchronously and so if we don't wait for the entire job to finish, we may 
> not be getting their up-to-date values at assertion time.
> We publish other LongAccumulators in kudu-spark, but AFAICT this is the only 
> one that is asserted on. Nevertheless, it would be great if we could solve 
> this in some generic way so that if someone wrote a test that used a 
> different LongAccumulator, the race could be avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (KUDU-2602) testRandomBackupAndRestore is flaky

2019-05-30 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reopened KUDU-2602:
-
  Assignee: Will Berkeley  (was: Grant Henke)

This test is also flaky because random generation may upsert duplicate rows, 
causing a discrepancy between the number of generated rows and the number of 
unique rows seen in Kudu.

> testRandomBackupAndRestore is flaky
> ---
>
> Key: KUDU-2602
> URL: https://issues.apache.org/jira/browse/KUDU-2602
> Project: Kudu
>  Issue Type: Bug
>Reporter: Hao Hao
>Assignee: Will Berkeley
>Priority: Major
> Fix For: NA
>
> Attachments: TEST-org.apache.kudu.backup.TestKuduBackup.xml
>
>
> Saw the following failure with testRandomBackupAndRestore:
> {noformat}
> java.lang.AssertionError: 
> expected:<21> but was:<20>
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:631)
> at 
> org.apache.kudu.backup.TestKuduBackup.testRandomBackupAndRestore(TestKuduBackup.scala:99)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
> at org.apache.kudu.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:72)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
> at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
> at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
> at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
> at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
> at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
> at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
> at 
> org.gradle.internal

[jira] [Created] (KUDU-2832) Clean up after a failed restore job

2019-05-29 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2832:
---

 Summary: Clean up after a failed restore job
 Key: KUDU-2832
 URL: https://issues.apache.org/jira/browse/KUDU-2832
 Project: Kudu
  Issue Type: Improvement
Reporter: Will Berkeley


If a restore job fails, it may leave a partially-restored table on the 
destination cluster. This will prevent a naive retry from succeeding. We should 
make more effort to clean up if a restore job fails, so that a simple retry of 
the same job might be able to succeed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2786) Parallelize tables for backup and restore

2019-05-24 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2786:
---

Assignee: Will Berkeley

> Parallelize tables for backup and restore 
> --
>
> Key: KUDU-2786
> URL: https://issues.apache.org/jira/browse/KUDU-2786
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Grant Henke
>Assignee: Will Berkeley
>Priority: Major
>  Labels: backup
>
> Currently the backup and restore jobs process tables serially. This works 
> well to ensure resources aren't over allocated upfront, but could be less 
> performant for cases where there are many small tables. Instead we could 
> parallelize the Spark jobs for each table. 
> It should be straightforward to use Scala futures to run multiple jobs in 
> parallel and check their status. We could add a configuration to cap the 
> maximum number of tables run at the same time, though maybe that isn't really 
> needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2787) Allow single table failures for backup and restore

2019-05-23 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2787:
---

Assignee: Will Berkeley

> Allow single table failures for backup and restore
> --
>
> Key: KUDU-2787
> URL: https://issues.apache.org/jira/browse/KUDU-2787
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Grant Henke
>Assignee: Will Berkeley
>Priority: Major
>  Labels: backup
>
> Currently the backup and restore jobs will fail if a single table backup or 
> restore fails. Instead we should capture this failure and let the other 
> tables continue to run. 
> In order to allow users to continue to fail fast, we may want this to be a 
> command line option. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2807) Possible crash when flush or compaction overlaps with another compaction

2019-05-13 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2807.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

f6f8bbf35aa33e668f9cc5ce9b1e80d202a7f736

> Possible crash when flush or compaction overlaps with another compaction
> 
>
> Key: KUDU-2807
> URL: https://issues.apache.org/jira/browse/KUDU-2807
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet
>Affects Versions: 1.9.0
>Reporter: Adar Dembo
>Assignee: Will Berkeley
>Priority: Blocker
> Fix For: 1.10.0
>
> Attachments: kudu-tserver.INFO.gz
>
>
> Manuel Sopena reported a crash like this in Slack:
> {noformat}
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> F0429 07:26:56.918041 34043 tablet.cc:2268] Check failed: lock.owns_lock() 
> RowSet(24130) unable to lock compact_flush_lock
> {noformat}
> It's hard to say exactly what's going on without more logging, but after 
> looking at the code in more detail, I think the culprit is [this 
> commit|https://github.com/apache/kudu/commit/d3684a7b2add8f06b7189adb9ce9222b8ae1eff5],
>  new in Kudu 1.9.0. To understand why it's problematic, we first need to 
> understand the locking invariant in play:
> # A thread must acquire the tablet's compact_select_lock_ in order to select 
> rowsets to compact.
> # Because of #1, it's safe to assume that, if a thread successfully acquired 
> a rowset's compact_flush_lock_ in the act of selecting it for compaction, it 
> can release and reacquire the lock without contention. More precisely, it can 
> release the compact_flush_lock_, then try-lock it, and the try-lock is 
> guaranteed to succeed. All compacting MM ops use a CHECK to enforce this 
> invariant.
> With that in mind, here's the problem: at the time that the call to 
> {{RowSetInfo::ComputeCdfAndCollectOrdered}} is made from 
> {{Tablet::AtomicSwapRowSetsUnlocked}}, the tablet's compact_select_lock_ is 
> not held. {{ComputeCdfAndCollectOrdered}} calls 
> {{RowSet::IsAvailableForCompaction}}, which try-locks the per-rowset 
> compact_flush_lock_. As a result, it's possible for a racing MM operation to 
> also call {{IsAvailableForCompaction}}, successfully try-lock the 
> compact_flush_lock_, release it, try-lock it again (as per the invariant 
> above), fail, and crash in the aforementioned CHECK.
> I don't think this can result in corruption as we crash rather than allowing 
> the MM op to proceed. But it's a bad race and a bad crash, so we should fix 
> it. Possibly producing a 1.9.1 release in the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2807) Possible crash when flush or compaction overlaps with another compaction

2019-05-06 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2807:
---

Assignee: Will Berkeley

> Possible crash when flush or compaction overlaps with another compaction
> 
>
> Key: KUDU-2807
> URL: https://issues.apache.org/jira/browse/KUDU-2807
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet
>Affects Versions: 1.9.0
>Reporter: Adar Dembo
>Assignee: Will Berkeley
>Priority: Blocker
>
> Manuel Sopena reported a crash like this in Slack:
> {noformat}
> Log line format: [IWEF]mmdd hh:mm:ss.uu threadid file:line] msg
> F0429 07:26:56.918041 34043 tablet.cc:2268] Check failed: lock.owns_lock() 
> RowSet(24130) unable to lock compact_flush_lock
> {noformat}
> It's hard to say exactly what's going on without more logging, but after 
> looking at the code in more detail, I think the culprit is [this 
> commit|https://github.com/apache/kudu/commit/d3684a7b2add8f06b7189adb9ce9222b8ae1eff5],
>  new in Kudu 1.9.0. To understand why it's problematic, we first need to 
> understand the locking invariant in play:
> # A thread must acquire the tablet's compact_select_lock_ in order to select 
> rowsets to compact.
> # Because of #1, it's safe to assume that, if a thread successfully acquired 
> a rowset's compact_flush_lock_ in the act of selecting it for compaction, it 
> can release and reacquire the lock without contention. More precisely, it can 
> release the compact_flush_lock_, then try-lock it, and the try-lock is 
> guaranteed to succeed. All compacting MM ops use a CHECK to enforce this 
> invariant.
> With that in mind, here's the problem: at the time that the call to 
> {{RowSetInfo::ComputeCdfAndCollectOrdered}} is made from 
> {{Tablet::AtomicSwapRowSetsUnlocked}}, the tablet's compact_select_lock_ is 
> not held. {{ComputeCdfAndCollectOrdered}} calls 
> {{RowSet::IsAvailableForCompaction}}, which try-locks the per-rowset 
> compact_flush_lock_. As a result, it's possible for a racing MM operation to 
> also call {{IsAvailableForCompaction}}, successfully try-lock the 
> compact_flush_lock_, release it, try-lock it again (as per the invariant 
> above), fail, and crash in the aforementioned CHECK.
> I don't think this can result in corruption as we crash rather than allowing 
> the MM op to proceed. But it's a bad race and a bad crash, so we should fix 
> it. Possibly producing a 1.9.1 release in the process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2816) Failure due to column already present in HmsSentryConfigurations.AlterTableRandomized

2019-05-06 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2816:
---

 Summary: Failure due to column already present in 
HmsSentryConfigurations.AlterTableRandomized
 Key: KUDU-2816
 URL: https://issues.apache.org/jira/browse/KUDU-2816
 Project: Kudu
  Issue Type: Bug
Reporter: Will Berkeley
 Attachments: alter_table-randomized-test.1.txt

{noformat}
F0504 12:41:37.638859   231 alter_table-randomized-test.cc:499] Check failed: 
_s.ok() Bad status: Already present: The column already exists: c310
*** Check failure stack trace: ***
*** Aborted at 1556973697 (unix time) try "date -d @1556973697" if you are 
using GNU date ***
PC: @ 0x7f698597bc37 gsignal
*** SIGABRT (@0x3e800e7) received by PID 231 (TID 0x7f69a02ef900) from PID 
231; stack trace: ***
@ 0x7f698d6c0330 (unknown) at ??:0
@ 0x7f698597bc37 gsignal at ??:0
@ 0x7f698597f028 abort at ??:0
@ 0x7f6988cbfa29 google::logging_fail() at ??:0
@ 0x7f6988cc131d google::LogMessage::Fail() at ??:0
@ 0x7f6988cc31dd google::LogMessage::SendToLog() at ??:0
@ 0x7f6988cc0e59 google::LogMessage::Flush() at ??:0
@ 0x7f6988cc3c7f google::LogMessageFatal::~LogMessageFatal() at ??:0
@   0x586325 kudu::MirrorTable::RandomAlterTable() at 
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/alter_table-randomized-test.cc:499
@   0x5805b4 
kudu::AlterTableRandomized_TestRandomSequence_Test::TestBody() at 
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/alter_table-randomized-test.cc:749
@ 0x7f698ada0b98 
testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0
@ 0x7f698ad8e1b2 testing::Test::Run() at ??:0
@ 0x7f698ad8e2f8 testing::TestInfo::Run() at ??:0
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2815) RaftConsensusNonVoterITest.PromoteAndDemote fails if manually-run election fails.

2019-05-06 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley updated KUDU-2815:

Attachment: raft_consensus_nonvoter-itest.txt

> RaftConsensusNonVoterITest.PromoteAndDemote fails if manually-run election 
> fails.
> -
>
> Key: KUDU-2815
> URL: https://issues.apache.org/jira/browse/KUDU-2815
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Priority: Major
> Attachments: raft_consensus_nonvoter-itest.txt
>
>
> RaftConsensusNonVoterITest.PromoteAndDemote disables normal leader elections 
> and runs an election manually, to avoid some previous flakiness. 
> Unfortunately, this introduces flakiness, because, rarely, the manual 
> election fails when the vote requests time out. The candidate concludes it 
> has lost the election, and then after that the two other voters vote yes.
> The timeout for vote requests is 170ms, which is pretty short. If it were 
> raised to, say, 5s, the test would probably not be flaky anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2815) RaftConsensusNonVoterITest.PromoteAndDemote fails if manually-run election fails.

2019-05-06 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2815:
---

 Summary: RaftConsensusNonVoterITest.PromoteAndDemote fails if 
manually-run election fails.
 Key: KUDU-2815
 URL: https://issues.apache.org/jira/browse/KUDU-2815
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Will Berkeley


RaftConsensusNonVoterITest.PromoteAndDemote disables normal leader elections 
and runs an election manually, to avoid some previous flakiness. Unfortunately, 
this introduces flakiness, because, rarely, the manual election fails when the 
vote requests time out. The candidate concludes it has lost the election, and 
then after that the two other voters vote yes.

The timeout for vote requests is 170ms, which is pretty short. If it were 
raised to, say, 5s, the test would probably not be flaky anymore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2812) Problem with error reporting in kudu-spark and kudu-backup

2019-05-02 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley updated KUDU-2812:

Summary: Problem with error reporting in kudu-spark and kudu-backup  (was: 
TOCTOU problem with error reporting in kudu-spark and kudu-backup)

> Problem with error reporting in kudu-spark and kudu-backup
> --
>
> Key: KUDU-2812
> URL: https://issues.apache.org/jira/browse/KUDU-2812
> Project: Kudu
>  Issue Type: Bug
>  Components: backup, spark
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Priority: Major
> Fix For: 1.10.0
>
>
> In KuduRestore.scala we have code like
> {noformat}
>   // Fail the task if there are any errors.
>   val errorCount = session.getPendingErrors.getRowErrors.length
>   if (errorCount > 0) {
> val errors =
>   
> session.getPendingErrors.getRowErrors.take(5).map(_.getErrorStatus).mkString
> throw new RuntimeException(
>   s"failed to write $errorCount rows from DataFrame to Kudu; 
> sample errors: $errors")
>   }
> {noformat}
> There's similar code in KuduContext.scala:
> {noformat}
>   val errorCount = pendingErrors.getRowErrors.length
>   if (errorCount > 0) {
> val errors =
>   pendingErrors.getRowErrors.take(5).map(_.getErrorStatus).mkString
> throw new RuntimeException(
>   s"failed to write $errorCount rows from DataFrame to Kudu; sample 
> errors: $errors")
>   }
> {noformat}
> I've seen the former fail to print any sample errors. Taking a reference to 
> {{session.getPendingErrors.getRowErrors}} and using that through fixes this, 
> so it seems like there's some TOCTOU problem that can occur, probably because 
> multiple batches can be in flight at once.
> The latter is most likely vulnerable to this as well.
> This issue made diagnosing KUDU-2809 harder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2812) TOCTOU problem with error reporting in kudu-spark and kudu-backup

2019-05-02 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2812:
---

 Summary: TOCTOU problem with error reporting in kudu-spark and 
kudu-backup
 Key: KUDU-2812
 URL: https://issues.apache.org/jira/browse/KUDU-2812
 Project: Kudu
  Issue Type: Bug
  Components: backup, spark
Affects Versions: 1.9.0
Reporter: Will Berkeley
 Fix For: 1.10.0


In KuduRestore.scala we have code like

{noformat}
  // Fail the task if there are any errors.
  val errorCount = session.getPendingErrors.getRowErrors.length
  if (errorCount > 0) {
val errors =
  
session.getPendingErrors.getRowErrors.take(5).map(_.getErrorStatus).mkString
throw new RuntimeException(
  s"failed to write $errorCount rows from DataFrame to Kudu; sample 
errors: $errors")
  }
{noformat}

There's similar code in KuduContext.scala:

{noformat}
  val errorCount = pendingErrors.getRowErrors.length
  if (errorCount > 0) {
val errors =
  pendingErrors.getRowErrors.take(5).map(_.getErrorStatus).mkString
throw new RuntimeException(
  s"failed to write $errorCount rows from DataFrame to Kudu; sample 
errors: $errors")
  }
{noformat}

I've seen the former fail to print any sample errors. Taking a reference to 
{{session.getPendingErrors.getRowErrors}} and using that through fixes this, so 
it seems like there's some TOCTOU problem that can occur, probably because 
multiple batches can be in flight at once.

The latter is most likely vulnerable to this as well.

This issue made diagnosing KUDU-2809 harder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2811) Fuz test needed for backup-restore

2019-05-02 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2811:
---

 Summary: Fuz test needed for backup-restore
 Key: KUDU-2811
 URL: https://issues.apache.org/jira/browse/KUDU-2811
 Project: Kudu
  Issue Type: Bug
  Components: backup
Affects Versions: 1.9.0
Reporter: Will Berkeley
 Fix For: 1.10.0


We need to fuzz test backup-restore by having a test that creates a table 
through a random sequence of operations while also randomly doing incremental 
backups. We should then check the restored table against the original table.

This would have caught KUDU-2809.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2810) Restore needs DELETE_IGNORE

2019-05-02 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2810:
---

 Summary: Restore needs DELETE_IGNORE
 Key: KUDU-2810
 URL: https://issues.apache.org/jira/browse/KUDU-2810
 Project: Kudu
  Issue Type: Bug
  Components: backup
Affects Versions: 1.9.0
Reporter: Will Berkeley
 Fix For: 1.10.0


If a restore task fails for any reason, and it's restoring an incremental with 
DELETE row actions, when the task is retried it will fail any deletes that 
happened on the previous task run. We need a DELETE_IGNORE write operation to 
handle this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2809) Incremental backup / diff scan does not handle rows that are inserted and deleted between two incrementals correctly

2019-05-02 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2809:
---

 Summary: Incremental backup / diff scan does not handle rows that 
are inserted and deleted between two incrementals correctly
 Key: KUDU-2809
 URL: https://issues.apache.org/jira/browse/KUDU-2809
 Project: Kudu
  Issue Type: Bug
  Components: backup
Affects Versions: 1.9.0
Reporter: Will Berkeley


I did the following sequence of operations:

# Insert 100 million rows
# Update 1 out of every 11 rows
# Make a full backup
# Insert 100 million more rows, after the original rows in keyspace
# Delete 1 out of every 23 rows
# Make an incremental backup

Restore failed to apply the incremental backup, failing with an error like

{noformat}
java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; 
sample errors:
{noformat}

Due to another bug, there's no sample errors, but after hacking around that 
bug, I found that the incremental contained a row with a DELETE action for a 
key that is not present in the full backup. That's because the row was inserted 
in step 4 and deleted in step 5, between backups.

We could fix this by
# Making diff scan not return a DELETE for such a row
# Implementing and using DELETE IGNORE in the restore job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2806) Row count mismatch in testForceIncrementalBackup

2019-04-29 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2806:
---

 Summary: Row count mismatch in testForceIncrementalBackup
 Key: KUDU-2806
 URL: https://issues.apache.org/jira/browse/KUDU-2806
 Project: Kudu
  Issue Type: Bug
  Components: backup
Affects Versions: 1.9.0
Reporter: Will Berkeley
 Attachments: test-output.txt

Full log attached.

{noformat}
testForceIncrementalBackup(org.apache.kudu.backup.TestKuduBackup)
java.lang.AssertionError: expected:<100> but was:<99>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at 
org.apache.kudu.backup.TestKuduBackup.validateBackup(TestKuduBackup.scala:384)
at 
org.apache.kudu.backup.TestKuduBackup.testForceIncrementalBackup(TestKuduBackup.scala:139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
...
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2805) ClientTest.TestServerTooBusyRetry fails due to TSAN thread limit

2019-04-29 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley updated KUDU-2805:

Attachment: client-test.tsanlimit.txt

> ClientTest.TestServerTooBusyRetry fails due to TSAN thread limit
> 
>
> Key: KUDU-2805
> URL: https://issues.apache.org/jira/browse/KUDU-2805
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Priority: Major
> Attachments: client-test.tsanlimit.txt
>
>
> I've seen a couple instances where ClientTest.TestServerTooBusyRetry fails 
> after hitting the TSAN thread limit, after seemingly being stuck for 10 
> minutes or so. The end of the logs look like
> {noformat}
> W0428 12:20:07.406752 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08000c2ba0 after lost signal to thread 8435
> W0428 12:20:07.412693 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b080019f2a0 after lost signal to thread 10185
> W0428 12:20:07.418191 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b080018f060 after lost signal to thread 10361
> W0428 12:20:23.873589 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08000fc360 after lost signal to thread 8435
> W0428 12:20:23.878401 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08000ccf20 after lost signal to thread 10185
> W0428 12:20:23.884522 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b0800051ae0 after lost signal to thread 10361
> W0428 12:22:03.715726 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08000f9280 after lost signal to thread 8435
> W0428 12:22:03.721261 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08001b0e40 after lost signal to thread 10185
> W0428 12:22:03.727725 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08000b7460 after lost signal to thread 10361
> W0428 12:22:11.928373 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b0800044be0 after lost signal to thread 8435
> W0428 12:22:11.933187 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b080018f3c0 after lost signal to thread 10185
> W0428 12:22:11.939275 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08001b3480 after lost signal to thread 10361
> ==8432==ThreadSanitizer: Thread limit (8128 threads) exceeded. Dying.
> {noformat}
> Some threads are unresponsive, even to the signals sent by the stack trace 
> collector thread. Unfortunately, there's nothing in the logs about those 
> threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2805) ClientTest.TestServerTooBusyRetry fails due to TSAN thread limit

2019-04-29 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2805:
---

 Summary: ClientTest.TestServerTooBusyRetry fails due to TSAN 
thread limit
 Key: KUDU-2805
 URL: https://issues.apache.org/jira/browse/KUDU-2805
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Will Berkeley


I've seen a couple instances where ClientTest.TestServerTooBusyRetry fails 
after hitting the TSAN thread limit, after seemingly being stuck for 10 minutes 
or so. The end of the logs look like

{noformat}
W0428 12:20:07.406752 10297 debug-util.cc:397] Leaking SignalData structure 
0x7b08000c2ba0 after lost signal to thread 8435
W0428 12:20:07.412693 10297 debug-util.cc:397] Leaking SignalData structure 
0x7b080019f2a0 after lost signal to thread 10185
W0428 12:20:07.418191 10297 debug-util.cc:397] Leaking SignalData structure 
0x7b080018f060 after lost signal to thread 10361
W0428 12:20:23.873589 10139 debug-util.cc:397] Leaking SignalData structure 
0x7b08000fc360 after lost signal to thread 8435
W0428 12:20:23.878401 10139 debug-util.cc:397] Leaking SignalData structure 
0x7b08000ccf20 after lost signal to thread 10185
W0428 12:20:23.884522 10139 debug-util.cc:397] Leaking SignalData structure 
0x7b0800051ae0 after lost signal to thread 10361
W0428 12:22:03.715726 10297 debug-util.cc:397] Leaking SignalData structure 
0x7b08000f9280 after lost signal to thread 8435
W0428 12:22:03.721261 10297 debug-util.cc:397] Leaking SignalData structure 
0x7b08001b0e40 after lost signal to thread 10185
W0428 12:22:03.727725 10297 debug-util.cc:397] Leaking SignalData structure 
0x7b08000b7460 after lost signal to thread 10361
W0428 12:22:11.928373 10139 debug-util.cc:397] Leaking SignalData structure 
0x7b0800044be0 after lost signal to thread 8435
W0428 12:22:11.933187 10139 debug-util.cc:397] Leaking SignalData structure 
0x7b080018f3c0 after lost signal to thread 10185
W0428 12:22:11.939275 10139 debug-util.cc:397] Leaking SignalData structure 
0x7b08001b3480 after lost signal to thread 10361
==8432==ThreadSanitizer: Thread limit (8128 threads) exceeded. Dying.
{noformat}

Some threads are unresponsive, even to the signals sent by the stack trace 
collector thread. Unfortunately, there's nothing in the logs about those 
threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2776) Java AUTO_FLUSH_BACKGROUND behaves like "AUTO_FLUSH_FOREGROUND" when tablet locations are cached

2019-04-16 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2776:
---

 Summary: Java AUTO_FLUSH_BACKGROUND behaves like 
"AUTO_FLUSH_FOREGROUND" when tablet locations are cached
 Key: KUDU-2776
 URL: https://issues.apache.org/jira/browse/KUDU-2776
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Will Berkeley
 Attachments: image-2019-04-16-14-37-07-908.png

!image-2019-04-16-14-37-07-908.png!

The above piece of a Java flamegraph shows the main application thread 
{{apply}}ing operations to a {{KuduSession}} in {{AUTO_FLUSH_BACKGROUND}} mode. 
The {{doFlush}} call is meant to set up a callbacks that actually send the rows 
to Kudu and that are triggered by the fulfillment of a Deferred for the tablet 
locations. However, when the tablet locations are cached, this Deferred can be 
fulfilled basically instantly, so by the time that the {{apply}}ing thread 
calls {{addCallbacks}}, the deferred is complete, and as an optimization the 
async library executes the callbacks on the thread that is adding them to the 
fulfilled Deferred. This means the {{apply}}ing thread executes the code that 
sends rows to Kudu, which is the opposite of how {{AUTO_FLUSH_BACKGROUND}} is 
meant to work.

We lose out on the ability to have multiple batches in flight at once, and we 
serialize the application logic, {{apply}}, and actually sending rows to Kudu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2762) Improve write transaction tracing

2019-04-16 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley updated KUDU-2762:

   Resolution: Fixed
Fix Version/s: 1.10.0
   Status: Resolved  (was: In Review)

> Improve write transaction tracing
> -
>
> Key: KUDU-2762
> URL: https://issues.apache.org/jira/browse/KUDU-2762
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Will Berkeley
>Assignee: Mitch Barnett
>Priority: Major
> Fix For: 1.10.0
>
>
> Here's a write transaction trace:
> {noformat}
> W0331 02:16:27.859648 27337 rpcz_store.cc:244] Call 
> kudu.tserver.TabletServerService.Write from 10.80.34.74:58250 (ReqId={client: 
> 6a904be8604b482989e3d1592f8824f2, seq_no=66108, attempt_no=1}) took 7855ms 
> (client timeout 1).
> W0331 02:16:27.859728 27337 rpcz_store.cc:248] Trace:
> 0331 02:16:20.003835 (+ 0us) service_pool.cc:159] Inserting onto call 
> queue
> 0331 02:16:20.003844 (+ 9us) service_pool.cc:218] Handling call
> 0331 02:16:27.859641 (+7855797us) inbound_call.cc:157] Queueing success 
> response
> Related trace 'txn':
> 0331 02:16:20.003988 (+ 0us) write_transaction.cc:101] PREPARE: Starting
> 0331 02:16:20.004087 (+99us) write_transaction.cc:268] Acquiring schema 
> lock in shared mode
> 0331 02:16:20.004088 (+ 1us) write_transaction.cc:271] Acquired schema 
> lock
> 0331 02:16:20.004088 (+ 0us) tablet.cc:400] PREPARE: Decoding operations
> 0331 02:16:20.004130 (+42us) tablet.cc:422] PREPARE: Acquiring locks for 
> 1 operations
> 0331 02:16:20.004137 (+ 7us) tablet.cc:426] PREPARE: locks acquired
> 0331 02:16:20.004138 (+ 1us) write_transaction.cc:126] PREPARE: finished.
> 0331 02:16:20.004154 (+16us) write_transaction.cc:136] Start()
> 0331 02:16:20.004157 (+ 3us) write_transaction.cc:141] Timestamp: P: 
> 1554016580004153 usec, L: 0
> 0331 02:16:20.004192 (+35us) log.cc:582] Serialized 3741 byte log entry
> 0331 02:16:27.859496 (+7855304us) write_transaction.cc:149] APPLY: Starting
> 0331 02:16:27.859608 (+   112us) tablet_metrics.cc:365] ProbeStats: 
> bloom_lookups=2,key_file_lookups=2,delta_file_lookups=4,mrs_lookups=0
> 0331 02:16:27.859614 (+ 6us) log.cc:582] Serialized 28 byte log entry
> 0331 02:16:27.859622 (+ 8us) write_transaction.cc:309] Releasing row and 
> schema locks
> 0331 02:16:27.859623 (+ 1us) write_transaction.cc:277] Released schema 
> lock
> 0331 02:16:27.859625 (+ 2us) write_transaction.cc:196] FINISH: updating 
> metrics
> Metrics: 
> {"tcmalloc_contention_cycles":7552,"child_traces":[["txn",{"apply.queue_time_us":7854429,"cfile_cache_hit":12,"cfile_cache_hit_bytes":72423,"delta_iterators_relevant":2,"num_ops":1,"prepare.queue_time_us":11,"prepare.run_cpu_time_us":226,"prepare.run_wall_time_us":225,"raft.queue_time_us":12,"raft.run_cpu_time_us":92,"raft.run_wall_time_us":91,"replication_time_us":898,"spinlock_wait_cycles":18688}]]}
> {noformat}
> It could use some polish. Here's a few things to fix:
> 1. Nit: the casing and punctuation of {{PREPARE: finished.}} and {{APPLY: 
> Starting}} is inconsistent.
> 2. {{PREPARE}} has a start and end; {{APPLY}} has a start but no end; 
> replication has neither. We should add the missing trace events.
> 3. nit: There's no need for the {{PREPARE:}} preamble to trace events within 
> {{PREPARE: Starting}} and {{PREPARE: finished.}}.
> It's potentially misleading to read this trace. Almost all of the time is 
> spent between the trace event reporting the local WAL was written and the 
> trace event reporting apply started. Naively, this should mean the operation 
> took a long time to replicate, as that's the most obvious thing that could 
> take a long time. However, the metrics show the time was taken waiting in the 
> apply queue, held up by some other slow activity. This operation was fast to 
> perform-- it took a long time because it was held up, idling, by something 
> else.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2762) Improve write transaction tracing

2019-04-03 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2762:
---

 Summary: Improve write transaction tracing
 Key: KUDU-2762
 URL: https://issues.apache.org/jira/browse/KUDU-2762
 Project: Kudu
  Issue Type: Improvement
Reporter: Will Berkeley


Here's a write transaction trace:

{noformat}
W0331 02:16:27.859648 27337 rpcz_store.cc:244] Call 
kudu.tserver.TabletServerService.Write from 10.80.34.74:58250 (ReqId={client: 
6a904be8604b482989e3d1592f8824f2, seq_no=66108, attempt_no=1}) took 7855ms 
(client timeout 1).
W0331 02:16:27.859728 27337 rpcz_store.cc:248] Trace:
0331 02:16:20.003835 (+ 0us) service_pool.cc:159] Inserting onto call queue
0331 02:16:20.003844 (+ 9us) service_pool.cc:218] Handling call
0331 02:16:27.859641 (+7855797us) inbound_call.cc:157] Queueing success response
Related trace 'txn':
0331 02:16:20.003988 (+ 0us) write_transaction.cc:101] PREPARE: Starting
0331 02:16:20.004087 (+99us) write_transaction.cc:268] Acquiring schema 
lock in shared mode
0331 02:16:20.004088 (+ 1us) write_transaction.cc:271] Acquired schema lock
0331 02:16:20.004088 (+ 0us) tablet.cc:400] PREPARE: Decoding operations
0331 02:16:20.004130 (+42us) tablet.cc:422] PREPARE: Acquiring locks for 1 
operations
0331 02:16:20.004137 (+ 7us) tablet.cc:426] PREPARE: locks acquired
0331 02:16:20.004138 (+ 1us) write_transaction.cc:126] PREPARE: finished.
0331 02:16:20.004154 (+16us) write_transaction.cc:136] Start()
0331 02:16:20.004157 (+ 3us) write_transaction.cc:141] Timestamp: P: 
1554016580004153 usec, L: 0
0331 02:16:20.004192 (+35us) log.cc:582] Serialized 3741 byte log entry
0331 02:16:27.859496 (+7855304us) write_transaction.cc:149] APPLY: Starting
0331 02:16:27.859608 (+   112us) tablet_metrics.cc:365] ProbeStats: 
bloom_lookups=2,key_file_lookups=2,delta_file_lookups=4,mrs_lookups=0
0331 02:16:27.859614 (+ 6us) log.cc:582] Serialized 28 byte log entry
0331 02:16:27.859622 (+ 8us) write_transaction.cc:309] Releasing row and 
schema locks
0331 02:16:27.859623 (+ 1us) write_transaction.cc:277] Released schema lock
0331 02:16:27.859625 (+ 2us) write_transaction.cc:196] FINISH: updating 
metrics
Metrics: 
{"tcmalloc_contention_cycles":7552,"child_traces":[["txn",{"apply.queue_time_us":7854429,"cfile_cache_hit":12,"cfile_cache_hit_bytes":72423,"delta_iterators_relevant":2,"num_ops":1,"prepare.queue_time_us":11,"prepare.run_cpu_time_us":226,"prepare.run_wall_time_us":225,"raft.queue_time_us":12,"raft.run_cpu_time_us":92,"raft.run_wall_time_us":91,"replication_time_us":898,"spinlock_wait_cycles":18688}]]}
{noformat}

It could use some polish. Here's a few things to fix:
1. Nit: the casing and punctuation of {{PREPARE: finished.}} and {{APPLY: 
Starting}} is inconsistent.
2. {{PREPARE}} has a start and end; {{APPLY}} has a start but no end; 
replication has neither. We should add the missing trace events.
3. nit: There's no need for the {{PREPARE:}} preamble to trace events within 
{{PREPARE: Starting}} and {{PREPARE: finished.}}.

It's potentially misleading to read this trace. Almost all of the time is spent 
between the trace event reporting the local WAL was written and the trace event 
reporting apply started. Naively, this should mean the operation took a long 
time to replicate, as that's the most obvious thing that could take a long 
time. However, the metrics show the time was taken waiting in the apply queue, 
held up by some other slow activity. This operation was fast to perform-- it 
took a long time because it was held up, idling, by something else.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2753) kudu cluster rebalance crashes with core dump

2019-03-29 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16805435#comment-16805435
 ] 

Will Berkeley commented on KUDU-2753:
-

For others who might run into this issue, a response from Cloudera can be found 
[on their community 
forums|https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Kudu-rebalance-crash/m-p/88462#M5486].

> kudu cluster rebalance crashes with core dump
> -
>
> Key: KUDU-2753
> URL: https://issues.apache.org/jira/browse/KUDU-2753
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.7.0
> Environment: kudu-master-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-client-devel-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-tserver-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-client0-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
>Reporter: Arseniy Tashoyan
>Assignee: Will Berkeley
>Priority: Major
> Fix For: n/a
>
>
> The utility crashes:
> {code}
> -bash-4.2$ kudu cluster rebalance host1,host2,host3
> terminate called after throwing an instance of 'std::regex_error'
>   what():  regex_error
> *** Aborted at 1553854510 (unix time) try "date -d @1553854510" if you are 
> using GNU date ***
> PC: @ 0x7f9287fd6207 __GI_raise
> *** SIGABRT (@0x3ca0006ab69) received by PID 437097 (TID 0x7f928a61ea00) from 
> PID 437097; stack trace: ***
> @ 0x7f9289fe1680 (unknown)
> @ 0x7f9287fd6207 __GI_raise
> @ 0x7f9287fd78f8 __GI_abort
> @ 0x7f92888e57d5 __gnu_cxx::__verbose_terminate_handler()
> @ 0x7f92888e3746 (unknown)
> @ 0x7f92888e3773 std::terminate()
> @ 0x7f92888e3993 __cxa_throw
> @ 0x7f9288938dd5 std::__throw_regex_error()
> @   0x931c32 std::__detail::_Compiler<>::_M_bracket_expression()
> @   0x931e3a std::__detail::_Compiler<>::_M_atom()
> @   0x932469 std::__detail::_Compiler<>::_M_alternative()
> @   0x9324c4 std::__detail::_Compiler<>::_M_alternative()
> @   0x932649 std::__detail::_Compiler<>::_M_disjunction()
> @   0x93297b std::__detail::_Compiler<>::_Compiler()
> @   0x932cb7 std::__detail::__compile<>()
> @   0x92bfc6 (unknown)
> @   0x92c664 std::_Function_handler<>::_M_invoke()
> @   0xde6672 kudu::tools::Action::Run()
> @   0x9957d7 kudu::tools::DispatchCommand()
> @   0x99619b kudu::tools::RunTool()
> @   0x8dee4d main
> @ 0x7f9287fc23d5 __libc_start_main
> @   0x9284b5 (unknown)
> Aborted (core dumped)
> {code}
> The same behavior when ports are specified: 
> 'host1:7150,host2:7150,host3:7150'. I cannot attach the core dump due to file 
> size limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2753) kudu cluster rebalance crashes with core dump

2019-03-29 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2753.
-
   Resolution: Won't Fix
Fix Version/s: n/a

This is a problem with the downstream Kudu vendor's code, specifically, with 
how it is using the std::regex library to do some version detection, which 
causes crashes on some platforms because the regex library was broken. You 
should approach Cloudera about a fix or workaround.

> kudu cluster rebalance crashes with core dump
> -
>
> Key: KUDU-2753
> URL: https://issues.apache.org/jira/browse/KUDU-2753
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.7.0
> Environment: kudu-master-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-client-devel-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-tserver-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-client0-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
>Reporter: Arseniy Tashoyan
>Assignee: Will Berkeley
>Priority: Major
> Fix For: n/a
>
>
> The utility crashes:
> {code}
> -bash-4.2$ kudu cluster rebalance host1,host2,host3
> terminate called after throwing an instance of 'std::regex_error'
>   what():  regex_error
> *** Aborted at 1553854510 (unix time) try "date -d @1553854510" if you are 
> using GNU date ***
> PC: @ 0x7f9287fd6207 __GI_raise
> *** SIGABRT (@0x3ca0006ab69) received by PID 437097 (TID 0x7f928a61ea00) from 
> PID 437097; stack trace: ***
> @ 0x7f9289fe1680 (unknown)
> @ 0x7f9287fd6207 __GI_raise
> @ 0x7f9287fd78f8 __GI_abort
> @ 0x7f92888e57d5 __gnu_cxx::__verbose_terminate_handler()
> @ 0x7f92888e3746 (unknown)
> @ 0x7f92888e3773 std::terminate()
> @ 0x7f92888e3993 __cxa_throw
> @ 0x7f9288938dd5 std::__throw_regex_error()
> @   0x931c32 std::__detail::_Compiler<>::_M_bracket_expression()
> @   0x931e3a std::__detail::_Compiler<>::_M_atom()
> @   0x932469 std::__detail::_Compiler<>::_M_alternative()
> @   0x9324c4 std::__detail::_Compiler<>::_M_alternative()
> @   0x932649 std::__detail::_Compiler<>::_M_disjunction()
> @   0x93297b std::__detail::_Compiler<>::_Compiler()
> @   0x932cb7 std::__detail::__compile<>()
> @   0x92bfc6 (unknown)
> @   0x92c664 std::_Function_handler<>::_M_invoke()
> @   0xde6672 kudu::tools::Action::Run()
> @   0x9957d7 kudu::tools::DispatchCommand()
> @   0x99619b kudu::tools::RunTool()
> @   0x8dee4d main
> @ 0x7f9287fc23d5 __libc_start_main
> @   0x9284b5 (unknown)
> Aborted (core dumped)
> {code}
> The same behavior when ports are specified: 
> 'host1:7150,host2:7150,host3:7150'. I cannot attach the core dump due to file 
> size limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2753) kudu cluster rebalance crashes with core dump

2019-03-29 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2753:
---

Assignee: Will Berkeley

> kudu cluster rebalance crashes with core dump
> -
>
> Key: KUDU-2753
> URL: https://issues.apache.org/jira/browse/KUDU-2753
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.7.0
> Environment: kudu-master-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-client-devel-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-tserver-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
> kudu-client0-1.7.0+cdh5.16.1+0-1.cdh5.16.1.p0.3.el7.x86_64
>Reporter: Arseniy Tashoyan
>Assignee: Will Berkeley
>Priority: Major
>
> The utility crashes:
> {code}
> -bash-4.2$ kudu cluster rebalance host1,host2,host3
> terminate called after throwing an instance of 'std::regex_error'
>   what():  regex_error
> *** Aborted at 1553854510 (unix time) try "date -d @1553854510" if you are 
> using GNU date ***
> PC: @ 0x7f9287fd6207 __GI_raise
> *** SIGABRT (@0x3ca0006ab69) received by PID 437097 (TID 0x7f928a61ea00) from 
> PID 437097; stack trace: ***
> @ 0x7f9289fe1680 (unknown)
> @ 0x7f9287fd6207 __GI_raise
> @ 0x7f9287fd78f8 __GI_abort
> @ 0x7f92888e57d5 __gnu_cxx::__verbose_terminate_handler()
> @ 0x7f92888e3746 (unknown)
> @ 0x7f92888e3773 std::terminate()
> @ 0x7f92888e3993 __cxa_throw
> @ 0x7f9288938dd5 std::__throw_regex_error()
> @   0x931c32 std::__detail::_Compiler<>::_M_bracket_expression()
> @   0x931e3a std::__detail::_Compiler<>::_M_atom()
> @   0x932469 std::__detail::_Compiler<>::_M_alternative()
> @   0x9324c4 std::__detail::_Compiler<>::_M_alternative()
> @   0x932649 std::__detail::_Compiler<>::_M_disjunction()
> @   0x93297b std::__detail::_Compiler<>::_Compiler()
> @   0x932cb7 std::__detail::__compile<>()
> @   0x92bfc6 (unknown)
> @   0x92c664 std::_Function_handler<>::_M_invoke()
> @   0xde6672 kudu::tools::Action::Run()
> @   0x9957d7 kudu::tools::DispatchCommand()
> @   0x99619b kudu::tools::RunTool()
> @   0x8dee4d main
> @ 0x7f9287fc23d5 __libc_start_main
> @   0x9284b5 (unknown)
> Aborted (core dumped)
> {code}
> The same behavior when ports are specified: 
> 'host1:7150,host2:7150,host3:7150'. I cannot attach the core dump due to file 
> size limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2734) RemoteKsckTest.TestClusterWithLocation is flaky

2019-03-15 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley updated KUDU-2734:

Code Review: https://gerrit.cloudera.org/#/c/12770/

The root cause of most flakiness is KUDU-2748.

> RemoteKsckTest.TestClusterWithLocation is flaky
> ---
>
> Key: KUDU-2734
> URL: https://issues.apache.org/jira/browse/KUDU-2734
> Project: Kudu
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 1.9.0
>Reporter: Mike Percy
>Assignee: Will Berkeley
>Priority: Major
>
> RemoteKsckTest.TestClusterWithLocation is flaky
> Alexey took a look at it and here is the analysis:
> In essence, due to slowness of TSAN builds, connection negotiation from kudu 
> CLI to one of master servers timed out, so one of the preconditions of the 
> test didn't meet.  The error output by the test was:
> {code:java}
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:523:
>  Failure
> Failed                                                                        
>   
> Bad status: Network error: failed to gather info from all masters: 1 of 3 had 
> errors
> {code}
> The corresponding error in the master's log was:
> {code:java}
> W0221 12:38:27.119146 31380 negotiation.cc:313] Failed RPC negotiation. 
> Trace:  
> 0221 12:38:23.949428 (+     0us) reactor.cc:583] Submitting negotiation task 
> for client connection to 127.25.42.190:51799
> 0221 12:38:25.362220 (+1412792us) negotiation.cc:98] Waiting for socket to 
> connect
> 0221 12:38:25.363489 (+  1269us) client_negotiation.cc:167] Beginning 
> negotiation
> 0221 12:38:25.369976 (+  6487us) client_negotiation.cc:244] Sending NEGOTIATE 
> NegotiatePB request
> 0221 12:38:25.431582 (+ 61606us) client_negotiation.cc:261] Received 
> NEGOTIATE NegotiatePB response
> 0221 12:38:25.431610 (+    28us) client_negotiation.cc:355] Received 
> NEGOTIATE response from server
> 0221 12:38:25.432659 (+  1049us) client_negotiation.cc:182] Negotiated 
> authn=SASL
> 0221 12:38:27.051125 (+1618466us) client_negotiation.cc:483] Received 
> TLS_HANDSHAKE response from server
> 0221 12:38:27.062085 (+ 10960us) client_negotiation.cc:471] Sending 
> TLS_HANDSHAKE message to server
> 0221 12:38:27.062132 (+    47us) client_negotiation.cc:244] Sending 
> TLS_HANDSHAKE NegotiatePB request
> 0221 12:38:27.064391 (+  2259us) negotiation.cc:304] Negotiation complete: 
> Timed out: Client connection negotiation failed: client connection to 
> 127.25.42.190:51799: BlockingWrite timed out
> {code}
> We are seeing this on the flaky test dashboard for both TSAN and ASAN builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2748) Leader master erroneously tries to tablet copy to a follower master due to race at startup

2019-03-15 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2748:
---

Assignee: Will Berkeley

> Leader master erroneously tries to tablet copy to a follower master due to 
> race at startup
> --
>
> Key: KUDU-2748
> URL: https://issues.apache.org/jira/browse/KUDU-2748
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Assignee: Will Berkeley
>Priority: Major
>
> I was investigating KUDU-2734 and ran into a weird situation. The test runs 
> with 3 masters and changes the value of a flag on the masters. To effect the 
> change, it restarts the masters. Suppose the masters are labelled A, B, and 
> C. Somewhat rarely (e.g. 8% of the time when run in TSAN with 8 stress 
> threads), the following happens:
> 1. A and B are restarted successfully. They form a quorum and elect a leader 
> (say A).
> 2. C is in the process of restarting. The ConsensusService is registered and 
> C is accepting RPCs.
> 3. A sends C an UpdateConsensus RPC. However, C is still in the process of 
> starting and has not yet initialized the systable. When C receives the 
> UpdateConsensus call, as a result it responds with TABLET_NOT_FOUND, even 
> though the proper response should be SERVICE_UNAVAILABLE.
> 4. A interprets TABLET_NOT_FOUND to mean that C needs to be copied to, and it 
> tries forever to tablet copy to C. The copies never start because tablet copy 
> is not implemented for masters.
> 5. C finishes its startup but does not receive UpdateConsensus from A because 
> A is sending StartTableCopy requests. C calls pre-elections endlessly.
> This effectively means the cluster is running with two masters until there is 
> a leadership change. This caused the flakiness of 
> KsckRemoteTest.TestClusterWithLocation because C never recognizes the 
> leadership of A, so Ksck master consensus checks fail.
> A regular tablet on a tablet server is not vulnerable to this. It's specific 
> to how the master starts up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2748) Leader master erroneously tries to tablet copy to a follower master due to race at startup

2019-03-15 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley updated KUDU-2748:

Code Review: https://gerrit.cloudera.org/#/c/12770/

> Leader master erroneously tries to tablet copy to a follower master due to 
> race at startup
> --
>
> Key: KUDU-2748
> URL: https://issues.apache.org/jira/browse/KUDU-2748
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Assignee: Will Berkeley
>Priority: Major
>
> I was investigating KUDU-2734 and ran into a weird situation. The test runs 
> with 3 masters and changes the value of a flag on the masters. To effect the 
> change, it restarts the masters. Suppose the masters are labelled A, B, and 
> C. Somewhat rarely (e.g. 8% of the time when run in TSAN with 8 stress 
> threads), the following happens:
> 1. A and B are restarted successfully. They form a quorum and elect a leader 
> (say A).
> 2. C is in the process of restarting. The ConsensusService is registered and 
> C is accepting RPCs.
> 3. A sends C an UpdateConsensus RPC. However, C is still in the process of 
> starting and has not yet initialized the systable. When C receives the 
> UpdateConsensus call, as a result it responds with TABLET_NOT_FOUND, even 
> though the proper response should be SERVICE_UNAVAILABLE.
> 4. A interprets TABLET_NOT_FOUND to mean that C needs to be copied to, and it 
> tries forever to tablet copy to C. The copies never start because tablet copy 
> is not implemented for masters.
> 5. C finishes its startup but does not receive UpdateConsensus from A because 
> A is sending StartTableCopy requests. C calls pre-elections endlessly.
> This effectively means the cluster is running with two masters until there is 
> a leadership change. This caused the flakiness of 
> KsckRemoteTest.TestClusterWithLocation because C never recognizes the 
> leadership of A, so Ksck master consensus checks fail.
> A regular tablet on a tablet server is not vulnerable to this. It's specific 
> to how the master starts up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2748) Leader master erroneously tries to tablet copy to a follower master due to race at startup

2019-03-15 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2748:
---

 Summary: Leader master erroneously tries to tablet copy to a 
follower master due to race at startup
 Key: KUDU-2748
 URL: https://issues.apache.org/jira/browse/KUDU-2748
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Will Berkeley


I was investigating KUDU-2734 and ran into a weird situation. The test runs 
with 3 masters and changes the value of a flag on the masters. To effect the 
change, it restarts the masters. Suppose the masters are labelled A, B, and C. 
Somewhat rarely (e.g. 8% of the time when run in TSAN with 8 stress threads), 
the following happens:

1. A and B are restarted successfully. They form a quorum and elect a leader 
(say A).
2. C is in the process of restarting. The ConsensusService is registered and C 
is accepting RPCs.
3. A sends C an UpdateConsensus RPC. However, C is still in the process of 
starting and has not yet initialized the systable. When C receives the 
UpdateConsensus call, as a result it responds with TABLET_NOT_FOUND, even 
though the proper response should be SERVICE_UNAVAILABLE.
4. A interprets TABLET_NOT_FOUND to mean that C needs to be copied to, and it 
tries forever to tablet copy to C. The copies never start because tablet copy 
is not implemented for masters.
5. C finishes its startup but does not receive UpdateConsensus from A because A 
is sending StartTableCopy requests. C calls pre-elections endlessly.

This effectively means the cluster is running with two masters until there is a 
leadership change. This caused the flakiness of 
KsckRemoteTest.TestClusterWithLocation because C never recognizes the 
leadership of A, so Ksck master consensus checks fail.

A regular tablet on a tablet server is not vulnerable to this. It's specific to 
how the master starts up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2735) RemoteKsckTest.TestClusterWithLocation is flaky

2019-03-15 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2735:
---

Assignee: Will Berkeley

> RemoteKsckTest.TestClusterWithLocation is flaky
> ---
>
> Key: KUDU-2735
> URL: https://issues.apache.org/jira/browse/KUDU-2735
> Project: Kudu
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 1.9.0
>Reporter: Mike Percy
>Assignee: Will Berkeley
>Priority: Major
> Fix For: n/a
>
>
> RemoteKsckTest.TestClusterWithLocation is flaky
> Alexey took a look at it and here is the analysis:
> In essence, due to slowness of TSAN builds, connection negotiation from kudu 
> CLI to one of master servers timed out, so one of the preconditions of the 
> test didn't meet.  The error output by the test was:
> {code:java}
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:523:
>  Failure
> Failed                                                                        
>   
> Bad status: Network error: failed to gather info from all masters: 1 of 3 had 
> errors
> {code}
> The corresponding error in the master's log was:
> {code:java}
> W0221 12:38:27.119146 31380 negotiation.cc:313] Failed RPC negotiation. 
> Trace:  
> 0221 12:38:23.949428 (+     0us) reactor.cc:583] Submitting negotiation task 
> for client connection to 127.25.42.190:51799
> 0221 12:38:25.362220 (+1412792us) negotiation.cc:98] Waiting for socket to 
> connect
> 0221 12:38:25.363489 (+  1269us) client_negotiation.cc:167] Beginning 
> negotiation
> 0221 12:38:25.369976 (+  6487us) client_negotiation.cc:244] Sending NEGOTIATE 
> NegotiatePB request
> 0221 12:38:25.431582 (+ 61606us) client_negotiation.cc:261] Received 
> NEGOTIATE NegotiatePB response
> 0221 12:38:25.431610 (+    28us) client_negotiation.cc:355] Received 
> NEGOTIATE response from server
> 0221 12:38:25.432659 (+  1049us) client_negotiation.cc:182] Negotiated 
> authn=SASL
> 0221 12:38:27.051125 (+1618466us) client_negotiation.cc:483] Received 
> TLS_HANDSHAKE response from server
> 0221 12:38:27.062085 (+ 10960us) client_negotiation.cc:471] Sending 
> TLS_HANDSHAKE message to server
> 0221 12:38:27.062132 (+    47us) client_negotiation.cc:244] Sending 
> TLS_HANDSHAKE NegotiatePB request
> 0221 12:38:27.064391 (+  2259us) negotiation.cc:304] Negotiation complete: 
> Timed out: Client connection negotiation failed: client connection to 
> 127.25.42.190:51799: BlockingWrite timed out
> {code}
> We are seeing this on the flaky test dashboard for both TSAN and ASAN builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2736) RemoteKsckTest.TestClusterWithLocation is flaky

2019-03-15 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2736:
---

Assignee: Will Berkeley

> RemoteKsckTest.TestClusterWithLocation is flaky
> ---
>
> Key: KUDU-2736
> URL: https://issues.apache.org/jira/browse/KUDU-2736
> Project: Kudu
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 1.9.0
>Reporter: Mike Percy
>Assignee: Will Berkeley
>Priority: Major
> Fix For: n/a
>
>
> RemoteKsckTest.TestClusterWithLocation is flaky
> Alexey took a look at it and here is the analysis:
> In essence, due to slowness of TSAN builds, connection negotiation from kudu 
> CLI to one of master servers timed out, so one of the preconditions of the 
> test didn't meet.  The error output by the test was:
> {code:java}
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:523:
>  Failure
> Failed                                                                        
>   
> Bad status: Network error: failed to gather info from all masters: 1 of 3 had 
> errors
> {code}
> The corresponding error in the master's log was:
> {code:java}
> W0221 12:38:27.119146 31380 negotiation.cc:313] Failed RPC negotiation. 
> Trace:  
> 0221 12:38:23.949428 (+     0us) reactor.cc:583] Submitting negotiation task 
> for client connection to 127.25.42.190:51799
> 0221 12:38:25.362220 (+1412792us) negotiation.cc:98] Waiting for socket to 
> connect
> 0221 12:38:25.363489 (+  1269us) client_negotiation.cc:167] Beginning 
> negotiation
> 0221 12:38:25.369976 (+  6487us) client_negotiation.cc:244] Sending NEGOTIATE 
> NegotiatePB request
> 0221 12:38:25.431582 (+ 61606us) client_negotiation.cc:261] Received 
> NEGOTIATE NegotiatePB response
> 0221 12:38:25.431610 (+    28us) client_negotiation.cc:355] Received 
> NEGOTIATE response from server
> 0221 12:38:25.432659 (+  1049us) client_negotiation.cc:182] Negotiated 
> authn=SASL
> 0221 12:38:27.051125 (+1618466us) client_negotiation.cc:483] Received 
> TLS_HANDSHAKE response from server
> 0221 12:38:27.062085 (+ 10960us) client_negotiation.cc:471] Sending 
> TLS_HANDSHAKE message to server
> 0221 12:38:27.062132 (+    47us) client_negotiation.cc:244] Sending 
> TLS_HANDSHAKE NegotiatePB request
> 0221 12:38:27.064391 (+  2259us) negotiation.cc:304] Negotiation complete: 
> Timed out: Client connection negotiation failed: client connection to 
> 127.25.42.190:51799: BlockingWrite timed out
> {code}
> We are seeing this on the flaky test dashboard for both TSAN and ASAN builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2734) RemoteKsckTest.TestClusterWithLocation is flaky

2019-03-15 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2734:
---

Assignee: Will Berkeley

> RemoteKsckTest.TestClusterWithLocation is flaky
> ---
>
> Key: KUDU-2734
> URL: https://issues.apache.org/jira/browse/KUDU-2734
> Project: Kudu
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 1.9.0
>Reporter: Mike Percy
>Assignee: Will Berkeley
>Priority: Major
>
> RemoteKsckTest.TestClusterWithLocation is flaky
> Alexey took a look at it and here is the analysis:
> In essence, due to slowness of TSAN builds, connection negotiation from kudu 
> CLI to one of master servers timed out, so one of the preconditions of the 
> test didn't meet.  The error output by the test was:
> {code:java}
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:523:
>  Failure
> Failed                                                                        
>   
> Bad status: Network error: failed to gather info from all masters: 1 of 3 had 
> errors
> {code}
> The corresponding error in the master's log was:
> {code:java}
> W0221 12:38:27.119146 31380 negotiation.cc:313] Failed RPC negotiation. 
> Trace:  
> 0221 12:38:23.949428 (+     0us) reactor.cc:583] Submitting negotiation task 
> for client connection to 127.25.42.190:51799
> 0221 12:38:25.362220 (+1412792us) negotiation.cc:98] Waiting for socket to 
> connect
> 0221 12:38:25.363489 (+  1269us) client_negotiation.cc:167] Beginning 
> negotiation
> 0221 12:38:25.369976 (+  6487us) client_negotiation.cc:244] Sending NEGOTIATE 
> NegotiatePB request
> 0221 12:38:25.431582 (+ 61606us) client_negotiation.cc:261] Received 
> NEGOTIATE NegotiatePB response
> 0221 12:38:25.431610 (+    28us) client_negotiation.cc:355] Received 
> NEGOTIATE response from server
> 0221 12:38:25.432659 (+  1049us) client_negotiation.cc:182] Negotiated 
> authn=SASL
> 0221 12:38:27.051125 (+1618466us) client_negotiation.cc:483] Received 
> TLS_HANDSHAKE response from server
> 0221 12:38:27.062085 (+ 10960us) client_negotiation.cc:471] Sending 
> TLS_HANDSHAKE message to server
> 0221 12:38:27.062132 (+    47us) client_negotiation.cc:244] Sending 
> TLS_HANDSHAKE NegotiatePB request
> 0221 12:38:27.064391 (+  2259us) negotiation.cc:304] Negotiation complete: 
> Timed out: Client connection negotiation failed: client connection to 
> 127.25.42.190:51799: BlockingWrite timed out
> {code}
> We are seeing this on the flaky test dashboard for both TSAN and ASAN builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KUDU-2576) TlsSocketTest.TestRecvFailure is flaky

2019-03-15 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16793811#comment-16793811
 ] 

Will Berkeley edited comment on KUDU-2576 at 3/15/19 5:49 PM:
--

Various issues resolved by

{noformat}
04e584c62 Increase timeout in tls_socket-test
bedc701f2 Check result status of Socket::GetPeerAddress in TlsSocket::Recv
df4fd91fd KUDU-2576: TlsSocketTest.TestRecvFailure is flaky
{noformat}


was (Author: wdberkeley):
Various issues resolved by

{{noformat}}
04e584c62 Increase timeout in tls_socket-test
bedc701f2 Check result status of Socket::GetPeerAddress in TlsSocket::Recv
df4fd91fd KUDU-2576: TlsSocketTest.TestRecvFailure is flaky
{{noformat}}

> TlsSocketTest.TestRecvFailure is flaky
> --
>
> Key: KUDU-2576
> URL: https://issues.apache.org/jira/browse/KUDU-2576
> Project: Kudu
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
>Reporter: Adar Dembo
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.10.0
>
>
> This test seems destined to be flaky in TSAN environments.
> The initial sleep is there so that the stop signal to EchoServer is sent 
> while it's blocked inside the echo loop. That appears to be how we can safely 
> assert that one write and one recv both succeed, while the second recv fails.
> However, it's possible for EchoServer to be so slow to start that 100 ms 
> isn't enough, and the stop signal reaches it before it enters the loop. Then 
> the first write will fail like this:
> {noformat}
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/security/tls_socket-test.cc:230
> Failed
> Bad status: Network error: BlockingWrite error: failed to write to TLS 
> socket: Connection reset by peer
> {noformat}
> Alexey said he'd take a look at this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2576) TlsSocketTest.TestRecvFailure is flaky

2019-03-15 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2576.
-
   Resolution: Fixed
Fix Version/s: 1.10.0

Various issues resolved by

{{noformat}}
04e584c62 Increase timeout in tls_socket-test
bedc701f2 Check result status of Socket::GetPeerAddress in TlsSocket::Recv
df4fd91fd KUDU-2576: TlsSocketTest.TestRecvFailure is flaky
{{noformat}}

> TlsSocketTest.TestRecvFailure is flaky
> --
>
> Key: KUDU-2576
> URL: https://issues.apache.org/jira/browse/KUDU-2576
> Project: Kudu
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
>Reporter: Adar Dembo
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.10.0
>
>
> This test seems destined to be flaky in TSAN environments.
> The initial sleep is there so that the stop signal to EchoServer is sent 
> while it's blocked inside the echo loop. That appears to be how we can safely 
> assert that one write and one recv both succeed, while the second recv fails.
> However, it's possible for EchoServer to be so slow to start that 100 ms 
> isn't enough, and the stop signal reaches it before it enters the loop. Then 
> the first write will fail like this:
> {noformat}
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/security/tls_socket-test.cc:230
> Failed
> Bad status: Network error: BlockingWrite error: failed to write to TLS 
> socket: Connection reset by peer
> {noformat}
> Alexey said he'd take a look at this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2576) TlsSocketTest.TestRecvFailure is flaky

2019-03-14 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley updated KUDU-2576:

Code Review: http://gerrit.cloudera.org:8080/12758

> TlsSocketTest.TestRecvFailure is flaky
> --
>
> Key: KUDU-2576
> URL: https://issues.apache.org/jira/browse/KUDU-2576
> Project: Kudu
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
>Reporter: Adar Dembo
>Assignee: Will Berkeley
>Priority: Major
>
> This test seems destined to be flaky in TSAN environments.
> The initial sleep is there so that the stop signal to EchoServer is sent 
> while it's blocked inside the echo loop. That appears to be how we can safely 
> assert that one write and one recv both succeed, while the second recv fails.
> However, it's possible for EchoServer to be so slow to start that 100 ms 
> isn't enough, and the stop signal reaches it before it enters the loop. Then 
> the first write will fail like this:
> {noformat}
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/security/tls_socket-test.cc:230
> Failed
> Bad status: Network error: BlockingWrite error: failed to write to TLS 
> socket: Connection reset by peer
> {noformat}
> Alexey said he'd take a look at this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2576) TlsSocketTest.TestRecvFailure is flaky

2019-03-14 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2576:
---

Assignee: Will Berkeley  (was: Alexey Serbin)

> TlsSocketTest.TestRecvFailure is flaky
> --
>
> Key: KUDU-2576
> URL: https://issues.apache.org/jira/browse/KUDU-2576
> Project: Kudu
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.8.0
>Reporter: Adar Dembo
>Assignee: Will Berkeley
>Priority: Major
>
> This test seems destined to be flaky in TSAN environments.
> The initial sleep is there so that the stop signal to EchoServer is sent 
> while it's blocked inside the echo loop. That appears to be how we can safely 
> assert that one write and one recv both succeed, while the second recv fails.
> However, it's possible for EchoServer to be so slow to start that 100 ms 
> isn't enough, and the stop signal reaches it before it enters the loop. Then 
> the first write will fail like this:
> {noformat}
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/security/tls_socket-test.cc:230
> Failed
> Bad status: Network error: BlockingWrite error: failed to write to TLS 
> socket: Connection reset by peer
> {noformat}
> Alexey said he'd take a look at this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2741) Failure in TestMergeIterator.TestDeDupGhostRows

2019-03-11 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2741:
---

 Summary: Failure in TestMergeIterator.TestDeDupGhostRows
 Key: KUDU-2741
 URL: https://issues.apache.org/jira/browse/KUDU-2741
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Will Berkeley


Test log of reproducible failure below:

{noformat}
$ bin/generic_iterators-test --gtest_filter="*DeDup*" 
--gtest_random_seed=1615295598
Note: Google Test filter = *DeDup*
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from TestMergeIterator
[ RUN  ] TestMergeIterator.TestDeDupGhostRows
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0311 13:16:42.837129 199316928 test_util.cc:212] Using random seed: 1078076534
I0311 13:16:42.839583 199316928 generic_iterators-test.cc:317] Time spent 
sorting the expected results: real 0.000s user 0.000s sys 0.000s
I0311 13:16:42.839709 199316928 generic_iterators-test.cc:321] Time spent 
shuffling the inputs: real 0.000s user 0.000s sys 0.000s
I0311 13:16:42.839901 199316928 generic_iterators-test.cc:346] Predicate: val 
>=  AND val < 
../../src/kudu/common/generic_iterators-test.cc:366: Failure
  Expected: expected[total_idx]
  Which is: 10264066
To be equal to: row_val
  Which is: 10282492
Yielded out of order at idx 1823
I0311 13:16:42.848778 199316928 generic_iterators-test.cc:348] Time spent 
iterating merged lists: real 0.009s   user 0.009s sys 0.000s
../../src/kudu/common/generic_iterators-test.cc:414: Failure
Expected: TestMerge(kIntSchemaWithVCol, match_all_pred, true, true) doesn't 
generate new fatal failures in the current thread.
  Actual: it does.
[  FAILED  ] TestMergeIterator.TestDeDupGhostRows (11 ms)
[--] 1 test from TestMergeIterator (11 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (12 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] TestMergeIterator.TestDeDupGhostRows
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2731) Getting column schema information from KuduSchema requires copying a KuduColumnSchema object

2019-03-05 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2731:
---

 Summary: Getting column schema information from KuduSchema 
requires copying a KuduColumnSchema object
 Key: KUDU-2731
 URL: https://issues.apache.org/jira/browse/KUDU-2731
 Project: Kudu
  Issue Type: Improvement
Affects Versions: 1.9.0
Reporter: Will Berkeley


I'm looking at a CPU profile of Impala inserting into Kudu. 
{{KuduTableSink::Send}} has code that schematically does the following:

{noformat}
for each row in the batch
  for each column
if (schema.Column(col_idx).isNullable()) {
  write->mutable_row()->SetNull(col);
}
  }
}
{noformat}

See 
[kudu-table-sink.cc|https://github.com/apache/impala/blob/branch-3.1.0/be/src/exec/kudu-table-sink.cc#L236].
 However, {{KuduSchema::Column}} copies the column schema and returns it by 
value, so the if statement constructs and destroys a column schema object just 
to check if the column is nullable.

This is by far the biggest user of CPU in the Impala process (35% or so). The 
workload might be I/O bound writing to Kudu anyway, though. Nevertheless, we 
should provide a way to avoid this copying in the API, either by adding a 
method like

{noformat}
class KuduSchema {
  const KuduColumnSchema& get_column(int idx);
}
{noformat}

or a method like

{noformat}
class KuduSchema {
  bool is_column_nullable(int idx);
}
{noformat}

The former is the most flexible while the latter frees the client from worrying 
about holding the ref longer than the KuduColumnSchema object lives. We might 
need to add a number of methods similar to the latter method to cover other 
potentially useful things like checking encoding, type, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2731) Getting column schema information from KuduSchema requires copying a KuduColumnSchema object

2019-03-05 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785000#comment-16785000
 ] 

Will Berkeley commented on KUDU-2731:
-

See also IMPALA-8284.

Andrew pointed out that for a public API maybe returning a 
{{KuduColumnSchema*}} would be better.

> Getting column schema information from KuduSchema requires copying a 
> KuduColumnSchema object
> 
>
> Key: KUDU-2731
> URL: https://issues.apache.org/jira/browse/KUDU-2731
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Priority: Major
>
> I'm looking at a CPU profile of Impala inserting into Kudu. 
> {{KuduTableSink::Send}} has code that schematically does the following:
> {noformat}
> for each row in the batch
>   for each column
> if (schema.Column(col_idx).isNullable()) {
>   write->mutable_row()->SetNull(col);
> }
>   }
> }
> {noformat}
> See 
> [kudu-table-sink.cc|https://github.com/apache/impala/blob/branch-3.1.0/be/src/exec/kudu-table-sink.cc#L236].
>  However, {{KuduSchema::Column}} copies the column schema and returns it by 
> value, so the if statement constructs and destroys a column schema object 
> just to check if the column is nullable.
> This is by far the biggest user of CPU in the Impala process (35% or so). The 
> workload might be I/O bound writing to Kudu anyway, though. Nevertheless, we 
> should provide a way to avoid this copying in the API, either by adding a 
> method like
> {noformat}
> class KuduSchema {
>   const KuduColumnSchema& get_column(int idx);
> }
> {noformat}
> or a method like
> {noformat}
> class KuduSchema {
>   bool is_column_nullable(int idx);
> }
> {noformat}
> The former is the most flexible while the latter frees the client from 
> worrying about holding the ref longer than the KuduColumnSchema object lives. 
> We might need to add a number of methods similar to the latter method to 
> cover other potentially useful things like checking encoding, type, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2728) A TabletService queue overflow on a write causes a GetTableLocations call in the Java client

2019-03-04 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2728:
---

 Summary: A TabletService queue overflow on a write causes a 
GetTableLocations call in the Java client
 Key: KUDU-2728
 URL: https://issues.apache.org/jira/browse/KUDU-2728
 Project: Kudu
  Issue Type: Improvement
Affects Versions: 1.9.0
Reporter: Will Berkeley


If the Java client receives a ServiceUnavailable from the RPC layer (as opposed 
to as a tablet server error), it treats that error like a "Tablet not found 
error". See 
https://github.com/apache/kudu/blob/branch-1.9.x/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java#L237
 and 
https://github.com/apache/kudu/blob/branch-1.9.x/java/kudu-client/src/main/java/org/apache/kudu/client/RpcProxy.java#L410.
 When a write operation sent to the tablet leader is rejected from the service 
queue, this logic causes the Java client to lookup the locations for the table 
again. This is wasteful, and can result in hundreds or thousands of GTL calls 
to the master. Usually, this isn't a problem for the master, but I've seen a 
case where floods of GTL calls for a table with 1000+ tablets caused master 
service queue overflows, and triggered KUDU-2710. It's wasteful, in any case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2727) Contention on the Raft consensus lock can cause tablet service queue overflows

2019-03-04 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2727:
---

 Summary: Contention on the Raft consensus lock can cause tablet 
service queue overflows
 Key: KUDU-2727
 URL: https://issues.apache.org/jira/browse/KUDU-2727
 Project: Kudu
  Issue Type: Improvement
Reporter: Will Berkeley


Here's stacks illustrating the phenomenon:

{noformat}
  tids=[2201]
0x379ba0f710 
   0x1fb951a base::internal::SpinLockDelay()
   0x1fb93b7 base::SpinLock::SlowLock()
0xb4e68e kudu::consensus::Peer::SignalRequest()
0xb9c0df kudu::consensus::PeerManager::SignalRequest()
0xb8c178 kudu::consensus::RaftConsensus::Replicate()
0xaab816 kudu::tablet::TransactionDriver::Prepare()
0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
   0x1fa37ed kudu::ThreadPool::DispatchThread()
   0x1f9c2a1 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
  tids=[4515]
0x379ba0f710 
   0x1fb951a base::internal::SpinLockDelay()
   0x1fb93b7 base::SpinLock::SlowLock()
0xb74c60 kudu::consensus::RaftConsensus::NotifyCommitIndex()
0xb59307 kudu::consensus::PeerMessageQueue::NotifyObserversTask()
0xb54058 
_ZN4kudu8internal7InvokerILi2ENS0_9BindStateINS0_15RunnableAdapterIMNS_9consensus16PeerMessageQueueEFvRKSt8functionIFvPNS4_24PeerMessageQueueObserverEEFvPS5_SC_EFvNS0_17UnretainedWrapperIS5_EEZNS5_34NotifyObserversOfCommitIndexChangeElEUlS8_E_EEESH_E3RunEPNS0_13BindStateBaseE
   0x1fa37ed kudu::ThreadPool::DispatchThread()
   0x1f9c2a1 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
  tids=[22185,22194,22193,22188,22187,22186]
0x379ba0f710 
   0x1fb951a base::internal::SpinLockDelay()
   0x1fb93b7 base::SpinLock::SlowLock()
0xb8bff8 
kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
0xaaaef9 kudu::tablet::TransactionDriver::ExecuteAsync()
0xaa3742 kudu::tablet::TabletReplica::SubmitWrite()
0x92812d kudu::tserver::TabletServiceImpl::Write()
   0x1e28f3c kudu::rpc::GeneratedServiceIf::Handle()
   0x1e2986a kudu::rpc::ServicePool::RunThread()
   0x1f9c2a1 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
  tids=[22192,22191]
0x379ba0f710 
   0x1fb951a base::internal::SpinLockDelay()
   0x1fb93b7 base::SpinLock::SlowLock()
   0x1e13dec kudu::rpc::ResultTracker::TrackRpc()
   0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle()
   0x1e2986a kudu::rpc::ServicePool::RunThread()
   0x1f9c2a1 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
  tids=[4426]
0x379ba0f710 
   0x206d3d0 
   0x212fd25 google::protobuf::Message::SpaceUsedLong()
   0x211dee4 
google::protobuf::internal::GeneratedMessageReflection::SpaceUsedLong()
0xb6658e kudu::consensus::LogCache::AppendOperations()
0xb5c539 kudu::consensus::PeerMessageQueue::AppendOperations()
0xb5c7c7 kudu::consensus::PeerMessageQueue::AppendOperation()
0xb7c675 
kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked()
0xb8c147 kudu::consensus::RaftConsensus::Replicate()
0xaab816 kudu::tablet::TransactionDriver::Prepare()
0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
   0x1fa37ed kudu::ThreadPool::DispatchThread()
   0x1f9c2a1 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
{noformat}

{{kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()}} needs to take 
the lock to check the term and the Raft role. When many RPCs come in for the 
same tablet, the contention can hog service threads and cause queue overflows 
on busy systems.

Yugabyte switched their equivalent lock to be an atomic that allows them to 
read the term and role wait-free.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2720) Improve concurrency of ResultTracker

2019-03-04 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783880#comment-16783880
 ] 

Will Berkeley commented on KUDU-2720:
-

A different stack illustrating a service queue overflow caused by contention in 
the ResultTracker:

{noformat}
Stacks at 0301 13:48:00.422678 (service queue overflowed for 
kudu.tserver.TabletServerService):
  tids=[3063]
0x379ba0f710 
   0x1fb951a base::internal::SpinLockDelay()
   0x1fb93b7 base::SpinLock::SlowLock()
   0x1e12070 kudu::rpc::ResultTracker::IsCurrentDriver()
0xaab426 kudu::tablet::TransactionDriver::Prepare()
0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
   0x1fa37ed kudu::ThreadPool::DispatchThread()
   0x1f9c2a1 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
  tids=[22185,22194,22193,22192,22191,22190,22186,22187,22189]
0x379ba0f710 
   0x1fb951a base::internal::SpinLockDelay()
   0x1fb93b7 base::SpinLock::SlowLock()
   0x1e13dec kudu::rpc::ResultTracker::TrackRpc()
   0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle()
   0x1e2986a kudu::rpc::ServicePool::RunThread()
   0x1f9c2a1 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
{noformat}

> Improve concurrency of ResultTracker
> 
>
> Key: KUDU-2720
> URL: https://issues.apache.org/jira/browse/KUDU-2720
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.10.0
>Reporter: Will Berkeley
>Priority: Major
>
> Running a workload that's pushing many small batches from many clients, I see 
> a lot of contention on the spinlock in the ResultTracker:
> {noformat}
> Stacks at 0228 14:19:29.339088 (service queue overflowed for 
> kudu.tserver.TabletServerService):
>   tids=[17223]
> 0x379ba0f710 
> 0x89ee80 
>0x1fb8f72 base::internal::SpinLockDelay()
>0x1fb8ea7 base::SpinLock::SlowLock()
>0x1e138dc kudu::rpc::ResultTracker::TrackRpc()
>0x1e289e5 kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2935a kudu::rpc::ServicePool::RunThread()
>0x1f9bd91 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
> ...
>   tids=[5695,5673]
> 0x379ba0f710 
>0x1fb900a base::internal::SpinLockDelay()
>0x1fb8ea7 base::SpinLock::SlowLock()
>0x1e11b60 kudu::rpc::ResultTracker::IsCurrentDriver()
> 0xaaaf16 kudu::tablet::TransactionDriver::Prepare()
> 0xaabbdd kudu::tablet::TransactionDriver::PrepareTask()
>0x1fa32dd kudu::ThreadPool::DispatchThread()
>0x1f9bd91 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   
> tids=[5689,5696,5693,5692,5691,5690,5698,5688,5681,5682,5683,5685,5686,5687,5700,5669,5668,5667,5714,5704,5703,5702,5701,5697,5670,5665,5699,5664,5671,5672,5680]
> 0x379ba0f710 
>0x1fb900a base::internal::SpinLockDelay()
>0x1fb8ea7 base::SpinLock::SlowLock()
>0x1e11bcc kudu::rpc::ResultTracker::RecordCompletionAndRespond()
>0x1e15e6c kudu::rpc::RpcContext::RespondSuccess()
> 0xaad024 kudu::tablet::TransactionDriver::Finalize()
> 0xaad531 kudu::tablet::TransactionDriver::ApplyTask()
>0x1fa32dd kudu::ThreadPool::DispatchThread()
>0x1f9bd91 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
> {noformat}
> The lock in this case is being held by
> {noformat}
>   tids=[5679]
> 0x379ba0f710 
>0x212f81b google::protobuf::Message::SpaceUsedLong()
>0x1e11f2f kudu::rpc::ResultTracker::RecordCompletionAndRespond()
>0x1e15e6c kudu::rpc::RpcContext::RespondSuccess()
> 0xaad024 kudu::tablet::TransactionDriver::Finalize()
> 0xaad531 kudu::tablet::TransactionDriver::ApplyTask()
>0x1fa32dd kudu::ThreadPool::DispatchThread()
>0x1f9bd91 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
> {noformat}
> KUDU-1622 contained some suggestions for improving the ResultTracker. Some 
> were implemented, but maybe we should consider implementing other suggestions 
> there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2726) Very large tablets defeat budgeted compaction

2019-03-04 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783840#comment-16783840
 ] 

Will Berkeley commented on KUDU-2726:
-

This can be worked around by increasing the compaction budget 
{{--tablet_compaction_budget_mb}}.

It's also possible to work around it by decreasing the minimum improvement 
score, but I don't recommend that because KUDU-1400 compaction depends on a 
careful balance between the minimum score and the coefficient balancing the 
KUDU-1400 component of the score and the "regular" rowset height reduction 
score.

> Very large tablets defeat budgeted compaction
> -
>
> Key: KUDU-2726
> URL: https://issues.apache.org/jira/browse/KUDU-2726
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Priority: Major
>
> On very large tablets (50GB+), despite being very uncompacted with a large 
> average rowset height, a default budget (128MB) worth of compaction may not 
> reduce average rowset height enough to pass the minimum threshold. Thus the 
> tablet stays uncompacted forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2726) Very large tablets defeat budgeted compaction

2019-03-04 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2726:
---

 Summary: Very large tablets defeat budgeted compaction
 Key: KUDU-2726
 URL: https://issues.apache.org/jira/browse/KUDU-2726
 Project: Kudu
  Issue Type: Improvement
Affects Versions: 1.9.0
Reporter: Will Berkeley


On very large tablets (50GB+), despite being very uncompacted with a large 
average rowset height, a default budget (128MB) worth of compaction may not 
reduce average rowset height enough to pass the minimum threshold. Thus the 
tablet stays uncompacted forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2725) RollingDiskRowSetWriter create rowsets that are bigger than the target rowset size

2019-03-04 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783828#comment-16783828
 ] 

Will Berkeley commented on KUDU-2725:
-

One can workaround this problem by increasing the target rowset size 
{{--budgeted_compaction_target_rowset_size}}.

> RollingDiskRowSetWriter create rowsets that are bigger than the target rowset 
> size
> --
>
> Key: KUDU-2725
> URL: https://issues.apache.org/jira/browse/KUDU-2725
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Priority: Major
>
> The diskrowset writer create rowsets that are bigger than the target rowset 
> size, with the excess proportional to the number of columns that compress 
> poorly. For example, modifying loadgen to create a table with 280 columns and 
> then using the {{--use_random}} flag, I saw rowsets that were in excess of 
> 80MB. This is a problem because the budget for compactions is 128MB, so 
> rowsets that are that big can never participate in a compaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KUDU-2725) RollingDiskRowSetWriter create rowsets that are bigger than the target rowset size

2019-03-04 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783828#comment-16783828
 ] 

Will Berkeley edited comment on KUDU-2725 at 3/4/19 10:14 PM:
--

One can workaround this problem by increasing the compaction budget 
{{--tablet_compaction_budget_mb}}.


was (Author: wdberkeley):
One can workaround this problem by increasing the target rowset size 
{{--budgeted_compaction_target_rowset_size}}.

> RollingDiskRowSetWriter create rowsets that are bigger than the target rowset 
> size
> --
>
> Key: KUDU-2725
> URL: https://issues.apache.org/jira/browse/KUDU-2725
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Priority: Major
>
> The diskrowset writer create rowsets that are bigger than the target rowset 
> size, with the excess proportional to the number of columns that compress 
> poorly. For example, modifying loadgen to create a table with 280 columns and 
> then using the {{--use_random}} flag, I saw rowsets that were in excess of 
> 80MB. This is a problem because the budget for compactions is 128MB, so 
> rowsets that are that big can never participate in a compaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2725) RollingDiskRowSetWriter create rowsets that are bigger than the target rowset size

2019-03-04 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2725:
---

 Summary: RollingDiskRowSetWriter create rowsets that are bigger 
than the target rowset size
 Key: KUDU-2725
 URL: https://issues.apache.org/jira/browse/KUDU-2725
 Project: Kudu
  Issue Type: Improvement
Affects Versions: 1.9.0
Reporter: Will Berkeley


The diskrowset writer create rowsets that are bigger than the target rowset 
size, with the excess proportional to the number of columns that compress 
poorly. For example, modifying loadgen to create a table with 280 columns and 
then using the {{--use_random}} flag, I saw rowsets that were in excess of 
80MB. This is a problem because the budget for compactions is 128MB, so rowsets 
that are that big can never participate in a compaction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KUDU-1865) Create fast path for RespondSuccess() in KRPC

2019-02-28 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781146#comment-16781146
 ] 

Will Berkeley edited comment on KUDU-1865 at 3/1/19 1:02 AM:
-

Just wanted to bump this JIRA to indicate that this is still an issue, 
especially (unsurprisingly) in workloads that send many small RPCs. I see it 
holding up many threads executed transactions during many TabletServerService 
queue overflows in some workloads I am running.


was (Author: wdberkeley):
Just wanted to bump this JIRA to indicate that this is still an issue, 
especially (unsurprisingly) in workloads that send many small RPCs. I see it 
being the main contributor to many TabletServerService queue overflows in some 
workloads I am running.

> Create fast path for RespondSuccess() in KRPC
> -
>
> Key: KUDU-1865
> URL: https://issues.apache.org/jira/browse/KUDU-1865
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Sailesh Mukil
>Priority: Major
>  Labels: perfomance, rpc
> Attachments: alloc-pattern.py, cross-thread.txt
>
>
> A lot of RPCs just respond with RespondSuccess() which returns the exact 
> payload every time. This takes the same path as any other response by 
> ultimately calling Connection::QueueResponseForCall() which has a few small 
> allocations. These small allocations (and their corresponding deallocations) 
> are called quite frequently (once for every IncomingCall) and end up taking 
> quite some time in the kernel (traversing the free list, spin locks etc.)
> This was found when [~mmokhtar] ran some profiles on Impala over KRPC on a 20 
> node cluster and found the following:
> The exact % of time spent is hard to quantify from the profiles, but these 
> were the among the top 5 of the slowest stacks:
> {code:java}
> impalad ! tcmalloc::CentralFreeList::ReleaseToSpans - [unknown source file]
> impalad ! tcmalloc::CentralFreeList::ReleaseListToSpans + 0x1a - [unknown 
> source file]
> impalad ! tcmalloc::CentralFreeList::InsertRange + 0x3b - [unknown source 
> file]
> impalad ! tcmalloc::ThreadCache::ReleaseToCentralCache + 0x103 - [unknown 
> source file]
> impalad ! tcmalloc::ThreadCache::Scavenge + 0x3e - [unknown source file]
> impalad ! operator delete + 0x329 - [unknown source file]
> impalad ! __gnu_cxx::new_allocator::deallocate + 0x4 - 
> new_allocator.h:110
> impalad ! std::_Vector_base std::allocator>::_M_deallocate + 0x5 - stl_vector.h:178
> impalad ! ~_Vector_base + 0x4 - stl_vector.h:160
> impalad ! ~vector - stl_vector.h:425    'slices' vector
> impalad ! kudu::rpc::Connection::QueueResponseForCall + 0xac - 
> connection.cc:433
> impalad ! kudu::rpc::InboundCall::Respond + 0xfa - inbound_call.cc:133
> impalad ! kudu::rpc::InboundCall::RespondSuccess + 0x43 - inbound_call.cc:77
> impalad ! kudu::rpc::RpcContext::RespondSuccess + 0x1f7 - rpc_context.cc:66
> ..
> {code}
> {code:java}
> impalad ! tcmalloc::CentralFreeList::FetchFromOneSpans - [unknown source file]
> impalad ! tcmalloc::CentralFreeList::RemoveRange + 0xc0 - [unknown source 
> file]
> impalad ! tcmalloc::ThreadCache::FetchFromCentralCache + 0x62 - [unknown 
> source file]
> impalad ! operator new + 0x297 - [unknown source file]<--- Creating 
> new 'OutboundTransferTask' object.
> impalad ! kudu::rpc::Connection::QueueResponseForCall + 0x76 - 
> connection.cc:432
> impalad ! kudu::rpc::InboundCall::Respond + 0xfa - inbound_call.cc:133
> impalad ! kudu::rpc::InboundCall::RespondSuccess + 0x43 - inbound_call.cc:77
> impalad ! kudu::rpc::RpcContext::RespondSuccess + 0x1f7 - rpc_context.cc:66
> ...
> {code}
> Even creating and deleting the 'RpcContext' takes a lot of time:
> {code:java}
> impalad ! tcmalloc::CentralFreeList::ReleaseToSpans - [unknown source file]
> impalad ! tcmalloc::CentralFreeList::ReleaseListToSpans + 0x1a - [unknown 
> source file]
> impalad ! tcmalloc::CentralFreeList::InsertRange + 0x3b - [unknown source 
> file]
> impalad ! tcmalloc::ThreadCache::ReleaseToCentralCache + 0x103 - [unknown 
> source file]
> impalad ! tcmalloc::ThreadCache::Scavenge + 0x3e - [unknown source file]
> impalad ! operator delete + 0x329 - [unknown source file]
> impalad ! impala::TransmitDataResponsePb::~TransmitDataResponsePb + 0x16 - 
> impala_internal_service.pb.cc:1221
> impalad ! impala::TransmitDataResponsePb::~TransmitDataResponsePb + 0x8 - 
> impala_internal_service.pb.cc:1222
> impalad ! kudu::DefaultDeleter::operator() + 0x5 - 
> gscoped_ptr.h:145
> impalad ! ~gscoped_ptr_impl + 0x9 - gscoped_ptr.h:228
> impalad ! ~gscoped_ptr - gscoped_ptr.h:318
> impalad ! kudu::rpc::RpcContext::~RpcContext + 0x1e - rpc_context.cc:53   
> <-
> impalad ! kudu::rpc::RpcContext::RespondSuccess + 0x1ff

[jira] [Commented] (KUDU-1865) Create fast path for RespondSuccess() in KRPC

2019-02-28 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781149#comment-16781149
 ] 

Will Berkeley commented on KUDU-1865:
-

Here's some of the relevant stacks:

This thread is holding the lock for the central cache:
{noformat}
tids=[5700]
0x379ba0f710 
0x9bc86d tcmalloc::ThreadCache::ReleaseToCentralCache()
0x9bcb4f tcmalloc::ThreadCache::Scavenge()
0xa86534 
_ZNSt6vectorISt4pairIPN4kudu6tablet6RowSetEiESaIS5_EE19_M_emplace_back_auxIJRS4_RiEEEvDpOT_
0xa742cd 
_ZNSt17_Function_handlerIFvPN4kudu6tablet6RowSetEiEZNS1_6Tablet17BulkCheckPresenceEPKNS0_2fs9IOContextEPNS1_21WriteTransactionStateEEUlS3_iE2_E9_M_invokeERKSt9_Any_dataS3_i
0xaee074 
_ZNK4kudu22interval_tree_internal6ITNodeINS_6tablet20RowSetIntervalTraitsEE31ForEachIntervalContainingPointsIZNKS2_10RowSetTree27ForEachRowSetContainingKeysERKSt6vectorINS_5SliceESa
IS8_EERKSt8functionIFvPNS2_6RowSetEiEEEUlRKNS2_12_GLOBAL__N_111QueryStructEPNS2_16RowSetWithBoundsEE_N9__gnu_cxx17__normal_iteratorIPSM_S7_ISL_SaISL_EEvT0_SX_RKT_
0xaedd03 
_ZNK4kudu22interval_tree_internal6ITNodeINS_6tablet20RowSetIntervalTraitsEE31ForEachIntervalContainingPointsIZNKS2_10RowSetTree27ForEachRowSetContainingKeysERKSt6vectorINS_5SliceESa
IS8_EERKSt8functionIFvPNS2_6RowSetEiEEEUlRKNS2_12_GLOBAL__N_111QueryStructEPNS2_16RowSetWithBoundsEE_N9__gnu_cxx17__normal_iteratorIPSM_S7_ISL_SaISL_EEvT0_SX_RKT_
0xaedd03 
_ZNK4kudu22interval_tree_internal6ITNodeINS_6tablet20RowSetIntervalTraitsEE31ForEachIntervalContainingPointsIZNKS2_10RowSetTree27ForEachRowSetContainingKeysERKSt6vectorINS_5SliceESa
IS8_EERKSt8functionIFvPNS2_6RowSetEiEEEUlRKNS2_12_GLOBAL__N_111QueryStructEPNS2_16RowSetWithBoundsEE_N9__gnu_cxx17__normal_iteratorIPSM_S7_ISL_SaISL_EEvT0_SX_RKT_
0xaee1b3 
_ZNK4kudu22interval_tree_internal6ITNodeINS_6tablet20RowSetIntervalTraitsEE31ForEachIntervalContainingPointsIZNKS2_10RowSetTree27ForEachRowSetContainingKeysERKSt6vectorINS_5SliceESa
IS8_EERKSt8functionIFvPNS2_6RowSetEiEEEUlRKNS2_12_GLOBAL__N_111QueryStructEPNS2_16RowSetWithBoundsEE_N9__gnu_cxx17__normal_iteratorIPSM_S7_ISL_SaISL_EEvT0_SX_RKT_
0xaee3a3 kudu::tablet::RowSetTree::ForEachRowSetContainingKeys()
0xa80c17 kudu::tablet::Tablet::BulkCheckPresence()
0xa8108a kudu::tablet::Tablet::ApplyRowOperations()
0xab4e7a kudu::tablet::WriteTransaction::Apply()
0xaad2b5 kudu::tablet::TransactionDriver::ApplyTask()
   0x1fa32dd kudu::ThreadPool::DispatchThread()
   0x1f9bd91 kudu::Thread::SuperviseThread()
{noformat}

Here's a bunch of threads backed up waiting on it:
{noformat}
tids=[1391]
0x379ba0f710 
0x9c9dad tcmalloc::internal::SpinLockDelay()
0x9c1929 SpinLock::SlowLock()
0x9c1ee8 tcmalloc::CentralFreeList::InsertRange()
0x9bc834 tcmalloc::ThreadCache::ReleaseToCentralCache()
0x9bcbd5 tcmalloc::ThreadCache::ListTooLong()
   0x1e6dbd8 
google::protobuf::internal::RepeatedPtrFieldBase::Destroy<>()
   0x1e6dcae kudu::tablet::TxResultPB::~TxResultPB()
   0x1db60da kudu::consensus::CommitMsg::~CommitMsg()
0xbc9feb 
google::protobuf::internal::RepeatedPtrFieldBase::Destroy<>()
0xbca0de kudu::log::LogEntryBatchPB::~LogEntryBatchPB()
0xbb300c kudu::log::Log::AppendThread::HandleGroup()
0xbb395e kudu::log::Log::AppendThread::DoWork()
   0x1fa32dd kudu::ThreadPool::DispatchThread()
   0x1f9bd91 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
  tids=[5695]
0x379ba0f710 
0x9c9dad tcmalloc::internal::SpinLockDelay()
0x9c1929 SpinLock::SlowLock()
0x9c2391 tcmalloc::CentralFreeList::RemoveRange()
0x9bc675 tcmalloc::ThreadCache::FetchFromCentralCache()
   0x1ec4f81 std::vector<>::_M_emplace_back_aux<>()
   0x1ec4b27 kudu::RowOperationsPBDecoder::DecodeOperations()
0xa78388 kudu::tablet::Tablet::DecodeWriteOperations()
0xab3d39 kudu::tablet::WriteTransaction::Prepare()
0xaaae45 kudu::tablet::TransactionDriver::Prepare()
0xaabbdd kudu::tablet::TransactionDriver::PrepareTask()
   0x1fa32dd kudu::ThreadPool::DispatchThread()
   0x1f9bd91 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
tids=[5661]
0x379ba0f710 
0x9c9e07 tcmalloc::internal::SpinLockWake()
0x9c2373 tcmalloc::CentralFreeList::RemoveRange()
0x9bc675 tcmalloc::ThreadCache::FetchFromCentralCache()
   0x1e94853 kudu::EncodedKeyBuilder::BuildEncodedKey()
   0x1e9535c kudu::EncodedKey::FromContiguousRow()
0xa6b77e kudu::tablet::Tablet::AcquireLockForOp()

[jira] [Commented] (KUDU-1865) Create fast path for RespondSuccess() in KRPC

2019-02-28 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781146#comment-16781146
 ] 

Will Berkeley commented on KUDU-1865:
-

Just wanted to bump this JIRA to indicate that this is still an issue, 
especially (unsurprisingly) in workloads that send many small RPCs. I see it 
being the main contributor to many TabletServerService queue overflows in some 
workloads I am running.

> Create fast path for RespondSuccess() in KRPC
> -
>
> Key: KUDU-1865
> URL: https://issues.apache.org/jira/browse/KUDU-1865
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Sailesh Mukil
>Priority: Major
>  Labels: perfomance, rpc
> Attachments: alloc-pattern.py, cross-thread.txt
>
>
> A lot of RPCs just respond with RespondSuccess() which returns the exact 
> payload every time. This takes the same path as any other response by 
> ultimately calling Connection::QueueResponseForCall() which has a few small 
> allocations. These small allocations (and their corresponding deallocations) 
> are called quite frequently (once for every IncomingCall) and end up taking 
> quite some time in the kernel (traversing the free list, spin locks etc.)
> This was found when [~mmokhtar] ran some profiles on Impala over KRPC on a 20 
> node cluster and found the following:
> The exact % of time spent is hard to quantify from the profiles, but these 
> were the among the top 5 of the slowest stacks:
> {code:java}
> impalad ! tcmalloc::CentralFreeList::ReleaseToSpans - [unknown source file]
> impalad ! tcmalloc::CentralFreeList::ReleaseListToSpans + 0x1a - [unknown 
> source file]
> impalad ! tcmalloc::CentralFreeList::InsertRange + 0x3b - [unknown source 
> file]
> impalad ! tcmalloc::ThreadCache::ReleaseToCentralCache + 0x103 - [unknown 
> source file]
> impalad ! tcmalloc::ThreadCache::Scavenge + 0x3e - [unknown source file]
> impalad ! operator delete + 0x329 - [unknown source file]
> impalad ! __gnu_cxx::new_allocator::deallocate + 0x4 - 
> new_allocator.h:110
> impalad ! std::_Vector_base std::allocator>::_M_deallocate + 0x5 - stl_vector.h:178
> impalad ! ~_Vector_base + 0x4 - stl_vector.h:160
> impalad ! ~vector - stl_vector.h:425    'slices' vector
> impalad ! kudu::rpc::Connection::QueueResponseForCall + 0xac - 
> connection.cc:433
> impalad ! kudu::rpc::InboundCall::Respond + 0xfa - inbound_call.cc:133
> impalad ! kudu::rpc::InboundCall::RespondSuccess + 0x43 - inbound_call.cc:77
> impalad ! kudu::rpc::RpcContext::RespondSuccess + 0x1f7 - rpc_context.cc:66
> ..
> {code}
> {code:java}
> impalad ! tcmalloc::CentralFreeList::FetchFromOneSpans - [unknown source file]
> impalad ! tcmalloc::CentralFreeList::RemoveRange + 0xc0 - [unknown source 
> file]
> impalad ! tcmalloc::ThreadCache::FetchFromCentralCache + 0x62 - [unknown 
> source file]
> impalad ! operator new + 0x297 - [unknown source file]<--- Creating 
> new 'OutboundTransferTask' object.
> impalad ! kudu::rpc::Connection::QueueResponseForCall + 0x76 - 
> connection.cc:432
> impalad ! kudu::rpc::InboundCall::Respond + 0xfa - inbound_call.cc:133
> impalad ! kudu::rpc::InboundCall::RespondSuccess + 0x43 - inbound_call.cc:77
> impalad ! kudu::rpc::RpcContext::RespondSuccess + 0x1f7 - rpc_context.cc:66
> ...
> {code}
> Even creating and deleting the 'RpcContext' takes a lot of time:
> {code:java}
> impalad ! tcmalloc::CentralFreeList::ReleaseToSpans - [unknown source file]
> impalad ! tcmalloc::CentralFreeList::ReleaseListToSpans + 0x1a - [unknown 
> source file]
> impalad ! tcmalloc::CentralFreeList::InsertRange + 0x3b - [unknown source 
> file]
> impalad ! tcmalloc::ThreadCache::ReleaseToCentralCache + 0x103 - [unknown 
> source file]
> impalad ! tcmalloc::ThreadCache::Scavenge + 0x3e - [unknown source file]
> impalad ! operator delete + 0x329 - [unknown source file]
> impalad ! impala::TransmitDataResponsePb::~TransmitDataResponsePb + 0x16 - 
> impala_internal_service.pb.cc:1221
> impalad ! impala::TransmitDataResponsePb::~TransmitDataResponsePb + 0x8 - 
> impala_internal_service.pb.cc:1222
> impalad ! kudu::DefaultDeleter::operator() + 0x5 - 
> gscoped_ptr.h:145
> impalad ! ~gscoped_ptr_impl + 0x9 - gscoped_ptr.h:228
> impalad ! ~gscoped_ptr - gscoped_ptr.h:318
> impalad ! kudu::rpc::RpcContext::~RpcContext + 0x1e - rpc_context.cc:53   
> <-
> impalad ! kudu::rpc::RpcContext::RespondSuccess + 0x1ff - rpc_context.cc:67
> {code}
> The above show that creating these small objects under moderately heavy load 
> results in heavy contention in the kernel. We will benefit a lot if we create 
> a fast path for 'RespondSuccess'.
> My suggestion is to create all these small objects at once along with the 
> 'InboundCall' object while it is being created

[jira] [Created] (KUDU-2720) Improve concurrency of ResultTracker

2019-02-28 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2720:
---

 Summary: Improve concurrency of ResultTracker
 Key: KUDU-2720
 URL: https://issues.apache.org/jira/browse/KUDU-2720
 Project: Kudu
  Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: Will Berkeley


Running a workload that's pushing many small batches from many clients, I see a 
lot of contention on the spinlock in the ResultTracker:

{noformat}
Stacks at 0228 14:19:29.339088 (service queue overflowed for 
kudu.tserver.TabletServerService):
  tids=[17223]
0x379ba0f710 
0x89ee80 
   0x1fb8f72 base::internal::SpinLockDelay()
   0x1fb8ea7 base::SpinLock::SlowLock()
   0x1e138dc kudu::rpc::ResultTracker::TrackRpc()
   0x1e289e5 kudu::rpc::GeneratedServiceIf::Handle()
   0x1e2935a kudu::rpc::ServicePool::RunThread()
   0x1f9bd91 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
...
  tids=[5695,5673]
0x379ba0f710 
   0x1fb900a base::internal::SpinLockDelay()
   0x1fb8ea7 base::SpinLock::SlowLock()
   0x1e11b60 kudu::rpc::ResultTracker::IsCurrentDriver()
0xaaaf16 kudu::tablet::TransactionDriver::Prepare()
0xaabbdd kudu::tablet::TransactionDriver::PrepareTask()
   0x1fa32dd kudu::ThreadPool::DispatchThread()
   0x1f9bd91 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
  
tids=[5689,5696,5693,5692,5691,5690,5698,5688,5681,5682,5683,5685,5686,5687,5700,5669,5668,5667,5714,5704,5703,5702,5701,5697,5670,5665,5699,5664,5671,5672,5680]
0x379ba0f710 
   0x1fb900a base::internal::SpinLockDelay()
   0x1fb8ea7 base::SpinLock::SlowLock()
   0x1e11bcc kudu::rpc::ResultTracker::RecordCompletionAndRespond()
   0x1e15e6c kudu::rpc::RpcContext::RespondSuccess()
0xaad024 kudu::tablet::TransactionDriver::Finalize()
0xaad531 kudu::tablet::TransactionDriver::ApplyTask()
   0x1fa32dd kudu::ThreadPool::DispatchThread()
   0x1f9bd91 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
{noformat}

The lock in this case is being held by

{noformat}
  tids=[5679]
0x379ba0f710 
   0x212f81b google::protobuf::Message::SpaceUsedLong()
   0x1e11f2f kudu::rpc::ResultTracker::RecordCompletionAndRespond()
   0x1e15e6c kudu::rpc::RpcContext::RespondSuccess()
0xaad024 kudu::tablet::TransactionDriver::Finalize()
0xaad531 kudu::tablet::TransactionDriver::ApplyTask()
   0x1fa32dd kudu::ThreadPool::DispatchThread()
   0x1f9bd91 kudu::Thread::SuperviseThread()
0x379ba079d1 start_thread
0x379b6e88fd clone
{noformat}

KUDU-1622 contained some suggestions for improving the ResultTracker. Some were 
implemented, but maybe we should consider implementing other suggestions there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2705) /scans should show more useful time metrics

2019-02-26 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2705:
---

Assignee: Will Berkeley

> /scans should show more useful time metrics
> ---
>
> Key: KUDU-2705
> URL: https://issues.apache.org/jira/browse/KUDU-2705
> Project: Kudu
>  Issue Type: Bug
>  Components: supportability, ui
>Affects Versions: 1.9.0
>Reporter: Adar Dembo
>Assignee: Will Berkeley
>Priority: Major
>  Labels: newbie
>
> Currently /scans shows the monotonic time since a scan started as well as its 
> "duration", which is the total amount of wall time for which a scanner was 
> live.
> We should also publish other useful time metrics, like:
>  * cumulative CPU time
>  * cumulative I/O time
>  * a second wall time, including all time spent by the tserver servicing a 
> scan (maybe that's just a sum of CPU and I/O time above?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2710) Retries of scanner keep alive requests are broken in the Java client

2019-02-25 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777453#comment-16777453
 ] 

Will Berkeley commented on KUDU-2710:
-

I should also mention that when the invariant check fails, it throws an 
exception in the HashedWheelTimer's thread, which the timer swallows. This 
means the keep alive retry doesn't ever get a response, so it will get stuck. 
I've seen this cause Spark tasks to become completely, permanently stuck.

> Retries of scanner keep alive requests are broken in the Java client
> 
>
> Key: KUDU-2710
> URL: https://issues.apache.org/jira/browse/KUDU-2710
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Assignee: Grant Henke
>Priority: Major
>
> KuduRpc implements a default `partitionKey` method:
> {noformat}
> /**
>* Returns the partition key this RPC is for, or {@code null} if the RPC is
>* not tablet specific.
>* 
>* DO NOT MODIFY THE CONTENTS OF THE RETURNED ARRAY.
>*/
>   byte[] partitionKey() {
> return null;
>   }
> {noformat}
> Subclasses override this method to indicate the start key of the tablet they 
> should be sent to, and the Java client uses this, in part, to select which 
> tserver to send retries to. The default implementation returns {{null}}, 
> which is a special value that is only valid as a partition key for the master 
> table. The keep alive RPC does not override this method, so it uses the 
> default implementation.
> When {{KuduScanner#keepAlive}} is called, the initial keep alive RPC does not 
> use {{partitionKey}}, so it works OK. However, retries go through standard 
> retry logic, which calls {{delayedSendRpcToTablet}}, which calls 
> {{sendRpcToTablet}} after a delay and on a timer thread. In 
> {{sendRpcToTablet}} we call {{getTableLocationEntry}} with a null 
> {{partitionkey}}, because the RPC never set one. That results in 
> {{cache.get(partitionKey)}} throwing an exception (usually) because there are 
> multiple entries in the cache for the table, but the {{null}} partition key 
> makes the lookup act like it is looking up the master table, so the invariant 
> check for the master table {{Preconditions.checkState(entries.size() <= 1)}} 
> fails.
> As a workaround, users can set {{keepAlivePeriodMs}} on {{KuduReadOptions}} 
> to something very large like {{Long.MAX_VALUE}}; or, if using the default 
> source, pass the {{kudu.keepAlivePeriodMs}} spark config with a very large 
> value. Note that there also has to be something causing keep alive requests 
> to fail and retry, and this is relatively rare (in my experience).
> To fix, we'll need to make sure that keep alive RPCs act like scan RPCs, and 
> are always retried on the same server as the one currently open for scanning 
> (or no op if there is no such server).
> Also, it's not wise to keep the default implementation in KuduRpc-- 
> subclasses ought to have to make an explicit choice about the default 
> partition key, which is a proxy for which tablet they will go to.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KUDU-2710) Retries of scanner keep alive requests are broken in the Java client

2019-02-25 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777453#comment-16777453
 ] 

Will Berkeley edited comment on KUDU-2710 at 2/26/19 1:07 AM:
--

I should also mention that when the invariant check fails, it throws an 
exception in the HashedWheelTimer's thread, which the timer swallows (but also 
logs). This means the keep alive retry doesn't ever get a response, so it will 
get stuck. I've seen this cause Spark tasks to become completely, permanently 
stuck.

The exception is:

{noformat}
2019-02-15 18:53:02,797 [Hashed wheel timer #1] WARN  
org.apache.kudu.shaded.org.jboss.netty.util.HashedWheelTimer  - An exception 
was thrown by TimerTask.
java.lang.IllegalStateException
at 
org.apache.kudu.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:494)
at 
org.apache.kudu.client.TableLocationsCache.get(TableLocationsCache.java:64)
at 
org.apache.kudu.client.AsyncKuduClient.getTableLocationEntry(AsyncKuduClient.java:2016)
at 
org.apache.kudu.client.AsyncKuduClient.sendRpcToTablet(AsyncKuduClient.java:1075)
at 
org.apache.kudu.client.AsyncKuduClient$3RetryTimer.run(AsyncKuduClient.java:1809)
at 
org.apache.kudu.shaded.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:556)
at 
org.apache.kudu.shaded.org.jboss.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:632)
at 
org.apache.kudu.shaded.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:369)
at 
org.apache.kudu.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at java.lang.Thread.run(Thread.java:748)
{noformat}


was (Author: wdberkeley):
I should also mention that when the invariant check fails, it throws an 
exception in the HashedWheelTimer's thread, which the timer swallows (but also 
logs). This means the keep alive retry doesn't ever get a response, so it will 
get stuck. I've seen this cause Spark tasks to become completely, permanently 
stuck.

> Retries of scanner keep alive requests are broken in the Java client
> 
>
> Key: KUDU-2710
> URL: https://issues.apache.org/jira/browse/KUDU-2710
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Assignee: Grant Henke
>Priority: Critical
>
> KuduRpc implements a default `partitionKey` method:
> {noformat}
> /**
>* Returns the partition key this RPC is for, or {@code null} if the RPC is
>* not tablet specific.
>* 
>* DO NOT MODIFY THE CONTENTS OF THE RETURNED ARRAY.
>*/
>   byte[] partitionKey() {
> return null;
>   }
> {noformat}
> Subclasses override this method to indicate the start key of the tablet they 
> should be sent to, and the Java client uses this, in part, to select which 
> tserver to send retries to. The default implementation returns {{null}}, 
> which is a special value that is only valid as a partition key for the master 
> table. The keep alive RPC does not override this method, so it uses the 
> default implementation.
> When {{KuduScanner#keepAlive}} is called, the initial keep alive RPC does not 
> use {{partitionKey}}, so it works OK. However, retries go through standard 
> retry logic, which calls {{delayedSendRpcToTablet}}, which calls 
> {{sendRpcToTablet}} after a delay and on a timer thread. In 
> {{sendRpcToTablet}} we call {{getTableLocationEntry}} with a null 
> {{partitionkey}}, because the RPC never set one. That results in 
> {{cache.get(partitionKey)}} throwing an exception (usually) because there are 
> multiple entries in the cache for the table, but the {{null}} partition key 
> makes the lookup act like it is looking up the master table, so the invariant 
> check for the master table {{Preconditions.checkState(entries.size() <= 1)}} 
> fails.
> As a workaround, users can set {{keepAlivePeriodMs}} on {{KuduReadOptions}} 
> to something very large like {{Long.MAX_VALUE}}; or, if using the default 
> source, pass the {{kudu.keepAlivePeriodMs}} spark config with a very large 
> value. Note that there also has to be something causing keep alive requests 
> to fail and retry, and this is relatively rare (in my experience).
> To fix, we'll need to make sure that keep alive RPCs act like scan RPCs, and 
> are always retried on the same server as the one currently open for scanning 
> (or no op if there is no such server).
> Also, it's not wise to keep the default implementation in KuduRpc-- 
> subclasses ought to have to make an explicit choice about the default 
> partition key, which is a proxy for which tablet they will go to.



--
This message was sent by Atlassian JIRA
(v7.6.3#7

[jira] [Comment Edited] (KUDU-2710) Retries of scanner keep alive requests are broken in the Java client

2019-02-25 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777453#comment-16777453
 ] 

Will Berkeley edited comment on KUDU-2710 at 2/26/19 1:07 AM:
--

I should also mention that when the invariant check fails, it throws an 
exception in the HashedWheelTimer's thread, which the timer swallows (but also 
logs). This means the keep alive retry doesn't ever get a response, so it will 
get stuck. I've seen this cause Spark tasks to become completely, permanently 
stuck.


was (Author: wdberkeley):
I should also mention that when the invariant check fails, it throws an 
exception in the HashedWheelTimer's thread, which the timer swallows. This 
means the keep alive retry doesn't ever get a response, so it will get stuck. 
I've seen this cause Spark tasks to become completely, permanently stuck.

> Retries of scanner keep alive requests are broken in the Java client
> 
>
> Key: KUDU-2710
> URL: https://issues.apache.org/jira/browse/KUDU-2710
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Assignee: Grant Henke
>Priority: Major
>
> KuduRpc implements a default `partitionKey` method:
> {noformat}
> /**
>* Returns the partition key this RPC is for, or {@code null} if the RPC is
>* not tablet specific.
>* 
>* DO NOT MODIFY THE CONTENTS OF THE RETURNED ARRAY.
>*/
>   byte[] partitionKey() {
> return null;
>   }
> {noformat}
> Subclasses override this method to indicate the start key of the tablet they 
> should be sent to, and the Java client uses this, in part, to select which 
> tserver to send retries to. The default implementation returns {{null}}, 
> which is a special value that is only valid as a partition key for the master 
> table. The keep alive RPC does not override this method, so it uses the 
> default implementation.
> When {{KuduScanner#keepAlive}} is called, the initial keep alive RPC does not 
> use {{partitionKey}}, so it works OK. However, retries go through standard 
> retry logic, which calls {{delayedSendRpcToTablet}}, which calls 
> {{sendRpcToTablet}} after a delay and on a timer thread. In 
> {{sendRpcToTablet}} we call {{getTableLocationEntry}} with a null 
> {{partitionkey}}, because the RPC never set one. That results in 
> {{cache.get(partitionKey)}} throwing an exception (usually) because there are 
> multiple entries in the cache for the table, but the {{null}} partition key 
> makes the lookup act like it is looking up the master table, so the invariant 
> check for the master table {{Preconditions.checkState(entries.size() <= 1)}} 
> fails.
> As a workaround, users can set {{keepAlivePeriodMs}} on {{KuduReadOptions}} 
> to something very large like {{Long.MAX_VALUE}}; or, if using the default 
> source, pass the {{kudu.keepAlivePeriodMs}} spark config with a very large 
> value. Note that there also has to be something causing keep alive requests 
> to fail and retry, and this is relatively rare (in my experience).
> To fix, we'll need to make sure that keep alive RPCs act like scan RPCs, and 
> are always retried on the same server as the one currently open for scanning 
> (or no op if there is no such server).
> Also, it's not wise to keep the default implementation in KuduRpc-- 
> subclasses ought to have to make an explicit choice about the default 
> partition key, which is a proxy for which tablet they will go to.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2710) Retries of scanner keep alive requests are broken in the Java client

2019-02-25 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2710:
---

Assignee: (was: Will Berkeley)

Thanks to [~granthenke] for figuring out this bug.

> Retries of scanner keep alive requests are broken in the Java client
> 
>
> Key: KUDU-2710
> URL: https://issues.apache.org/jira/browse/KUDU-2710
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Priority: Major
>
> KuduRpc implements a default `partitionKey` method:
> {noformat}
> /**
>* Returns the partition key this RPC is for, or {@code null} if the RPC is
>* not tablet specific.
>* 
>* DO NOT MODIFY THE CONTENTS OF THE RETURNED ARRAY.
>*/
>   byte[] partitionKey() {
> return null;
>   }
> {noformat}
> Subclasses override this method to indicate the start key of the tablet they 
> should be sent to, and the Java client uses this, in part, to select which 
> tserver to send retries to. The default implementation returns {{null}}, 
> which is a special value that is only valid as a partition key for the master 
> table. The keep alive RPC does not override this method, so it uses the 
> default implementation.
> When {{KuduScanner#keepAlive}} is called, the initial keep alive RPC does not 
> use {{partitionKey}}, so it works OK. However, retries go through standard 
> retry logic, which calls {{delayedSendRpcToTablet}}, which calls 
> {{sendRpcToTablet}} after a delay and on a timer thread. In 
> {{sendRpcToTablet}} we call {{getTableLocationEntry}} with a null 
> {{partitionkey}}, because the RPC never set one. That results in 
> {{cache.get(partitionKey)}} throwing an exception (usually) because there are 
> multiple entries in the cache for the table, but the {{null}} partition key 
> makes the lookup act like it is looking up the master table, so the invariant 
> check for the master table {{Preconditions.checkState(entries.size() <= 1)}} 
> fails.
> As a workaround, users can set {{keepAlivePeriodMs}} on {{KuduReadOptions}} 
> to something very large like {{Long.MAX_VALUE}}; or, if using the default 
> source, pass the {{kudu.keepAlivePeriodMs}} spark config with a very large 
> value. Note that there also has to be something causing keep alive requests 
> to fail and retry, and this is relatively rare (in my experience).
> To fix, we'll need to make sure that keep alive RPCs act like scan RPCs, and 
> are always retried on the same server as the one currently open for scanning 
> (or no op if there is no such server).
> Also, it's not wise to keep the default implementation in KuduRpc-- 
> subclasses ought to have to make an explicit choice about the default 
> partition key, which is a proxy for which tablet they will go to.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2710) Retries of scanner keep alive requests are broken in the Java client

2019-02-25 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2710:
---

 Summary: Retries of scanner keep alive requests are broken in the 
Java client
 Key: KUDU-2710
 URL: https://issues.apache.org/jira/browse/KUDU-2710
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Will Berkeley
Assignee: Will Berkeley


KuduRpc implements a default `partitionKey` method:

{noformat}
/**
   * Returns the partition key this RPC is for, or {@code null} if the RPC is
   * not tablet specific.
   * 
   * DO NOT MODIFY THE CONTENTS OF THE RETURNED ARRAY.
   */
  byte[] partitionKey() {
return null;
  }
{noformat}

Subclasses override this method to indicate the start key of the tablet they 
should be sent to, and the Java client uses this, in part, to select which 
tserver to send retries to. The default implementation returns {{null}}, which 
is a special value that is only valid as a partition key for the master table. 
The keep alive RPC does not override this method, so it uses the default 
implementation.

When {{KuduScanner#keepAlive}} is called, the initial keep alive RPC does not 
use {{partitionKey}}, so it works OK. However, retries go through standard 
retry logic, which calls {{delayedSendRpcToTablet}}, which calls 
{{sendRpcToTablet}} after a delay and on a timer thread. In {{sendRpcToTablet}} 
we call {{getTableLocationEntry}} with a null {{partitionkey}}, because the RPC 
never set one. That results in {{cache.get(partitionKey)}} throwing an 
exception (usually) because there are multiple entries in the cache for the 
table, but the {{null}} partition key makes the lookup act like it is looking 
up the master table, so the invariant check for the master table 
{{Preconditions.checkState(entries.size() <= 1)}} fails.

As a workaround, users can set {{keepAlivePeriodMs}} on {{KuduReadOptions}} to 
something very large like {{Long.MAX_VALUE}}; or, if using the default source, 
pass the {{kudu.keepAlivePeriodMs}} spark config with a very large value. Note 
that there also has to be something causing keep alive requests to fail and 
retry, and this is relatively rare (in my experience).

To fix, we'll need to make sure that keep alive RPCs act like scan RPCs, and 
are always retried on the same server as the one currently open for scanning 
(or no op if there is no such server).

Also, it's not wise to keep the default implementation in KuduRpc-- subclasses 
ought to have to make an explicit choice about the default partition key, which 
is a proxy for which tablet they will go to.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2708) Possible contention creating temporary files while flushing cmeta during an election storm

2019-02-21 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774594#comment-16774594
 ] 

Will Berkeley commented on KUDU-2708:
-

Alexey's probably right that it's mostly about doing I/O under the lock.

>From assorted investigations, this appears to be the biggest culprit in 
>service queue overflows.

What's not clear is how the initial elections start-- this phenomenon relies on 
voting in a (non-pre-) election, so that cmeta must be flushed to record the 
vote. So what triggered the original pre-elections, and what caused their 
candidates to win, triggering a regular election?

> Possible contention creating temporary files while flushing cmeta during an 
> election storm
> --
>
> Key: KUDU-2708
> URL: https://issues.apache.org/jira/browse/KUDU-2708
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Will Berkeley
>Priority: Major
>
> Doing investigation into consensus queue overflows that happen under heavy 
> write load, I noticed 6/10 service threads at the time of overflow have 
> stacks like
> {noformat}
> 0x3b6720f710 
>0x1fb900a base::internal::SpinLockDelay()
>0x1fb8ea7 base::SpinLock::SlowLock()
> 0xb82e25 kudu::consensus::RaftConsensus::RequestVote()
> 0x931555 
> kudu::tserver::ConsensusServiceImpl::RequestConsensusVote()
>0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2935a kudu::rpc::ServicePool::RunThread()
>0x1f9bd91 kudu::Thread::SuperviseThread()
> 0x3b672079d1 start_thread
> 0x3b66ee88fd clone
> {noformat}
> They are waiting on some tablet's Raft consensus instance's {{lock_}} in 
> order to vote. Looking into what might be holding that lock, I see stacks like
> {noformat}
> 0x3b6720f710 
> 0x3b66edb2ed __GI_open64
> 0x3b66e63caa __gen_tempname
>0x1f1cf35 kudu::(anonymous namespace)::PosixEnv::MkTmpFile()
>0x1f1f662 kudu::(anonymous namespace)::PosixEnv::NewTempRWFile()
>0x1f8305e kudu::pb_util::WritePBContainerToPath()
> 0xb47932 kudu::consensus::ConsensusMetadata::Flush()
> 0xb74164 
> kudu::consensus::RaftConsensus::SetVotedForCurrentTermUnlocked()
> 0xb783aa 
> kudu::consensus::RaftConsensus::RequestVoteRespondVoteGranted()
> 0xb836a1 kudu::consensus::RaftConsensus::RequestVote()
> 0x931555 
> kudu::tserver::ConsensusServiceImpl::RequestConsensusVote()
>0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2935a kudu::rpc::ServicePool::RunThread()
>0x1f9bd91 kudu::Thread::SuperviseThread()
> 0x3b672079d1 start_thread
> 0x3b66ee88fd clone
> {noformat}
> Doing some junior spelunking into glibc code, one hypothesis is that we are 
> generating lots of collisions of proposed temporary file names in the cmeta 
> folder because many threads are attempting to flush cmeta at once. The glibc 
> code looks like
> Maybe we could put the thread id into the temporary file name when a thread 
> does a cmeta flush.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2708) Possible contention creating temporary files while flushing cmeta during an election storm

2019-02-21 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2708:
---

 Summary: Possible contention creating temporary files while 
flushing cmeta during an election storm
 Key: KUDU-2708
 URL: https://issues.apache.org/jira/browse/KUDU-2708
 Project: Kudu
  Issue Type: Improvement
Reporter: Will Berkeley


Doing investigation into consensus queue overflows that happen under heavy 
write load, I noticed 6/10 service threads at the time of overflow have stacks 
like

{noformat}
0x3b6720f710 
   0x1fb900a base::internal::SpinLockDelay()
   0x1fb8ea7 base::SpinLock::SlowLock()
0xb82e25 kudu::consensus::RaftConsensus::RequestVote()
0x931555 kudu::tserver::ConsensusServiceImpl::RequestConsensusVote()
   0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle()
   0x1e2935a kudu::rpc::ServicePool::RunThread()
   0x1f9bd91 kudu::Thread::SuperviseThread()
0x3b672079d1 start_thread
0x3b66ee88fd clone
{noformat}

They are waiting on some tablet's Raft consensus instance's {{lock_}} in order 
to vote. Looking into what might be holding that lock, I see stacks like

{noformat}
0x3b6720f710 
0x3b66edb2ed __GI_open64
0x3b66e63caa __gen_tempname
   0x1f1cf35 kudu::(anonymous namespace)::PosixEnv::MkTmpFile()
   0x1f1f662 kudu::(anonymous namespace)::PosixEnv::NewTempRWFile()
   0x1f8305e kudu::pb_util::WritePBContainerToPath()
0xb47932 kudu::consensus::ConsensusMetadata::Flush()
0xb74164 
kudu::consensus::RaftConsensus::SetVotedForCurrentTermUnlocked()
0xb783aa 
kudu::consensus::RaftConsensus::RequestVoteRespondVoteGranted()
0xb836a1 kudu::consensus::RaftConsensus::RequestVote()
0x931555 kudu::tserver::ConsensusServiceImpl::RequestConsensusVote()
   0x1e28a2c kudu::rpc::GeneratedServiceIf::Handle()
   0x1e2935a kudu::rpc::ServicePool::RunThread()
   0x1f9bd91 kudu::Thread::SuperviseThread()
0x3b672079d1 start_thread
0x3b66ee88fd clone
{noformat}

Doing some junior spelunking into glibc code, one hypothesis is that we are 
generating lots of collisions of proposed temporary file names in the cmeta 
folder because many threads are attempting to flush cmeta at once. The glibc 
code looks like

Maybe we could put the thread id into the temporary file name when a thread 
does a cmeta flush.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2707) Improve the performance of the block cache under contention

2019-02-21 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2707:
---

 Summary: Improve the performance of the block cache under 
contention
 Key: KUDU-2707
 URL: https://issues.apache.org/jira/browse/KUDU-2707
 Project: Kudu
  Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: Will Berkeley


While looking at a random write workload where flushes outpace compactions 
(i.e. the typical case when inserting as fast as possible), there are 
occasional consensus service queue overflows. Analyzing the stacks of the 
service threads when this occurs (using the diagnostics log), I see many stacks 
like

{noformat}
0x3b6720f710 
   0x1fb900a base::internal::SpinLockDelay()
   0x1fb8ea7 base::SpinLock::SlowLock()
   0x1ef7394 kudu::(anonymous namespace)::ShardedLRUCache::Lookup()
   0x1ce379f kudu::cfile::BlockCache::Lookup()
   0x1cec948 kudu::cfile::CFileReader::ReadBlock()
   0x1ce5d36 kudu::cfile::BloomFileReader::CheckKeyPresent()
0xb311a1 kudu::tablet::CFileSet::CheckRowPresent()
0xac46c4 kudu::tablet::DiskRowSet::CheckRowPresent()
0xa6b017 
_ZZN4kudu6tablet6Tablet17BulkCheckPresenceEPKNS_2fs9IOContextEPNS0_21WriteTransactionStateEENKUlvE1_clEv
0xa7427e 
_ZNSt17_Function_handlerIFvPN4kudu6tablet6RowSetEiEZNS1_6Tablet17BulkCheckPresenceEPKNS0_2fs9IOContextEPNS1_21WriteTransactionStateEEUlS3_iE2_E9_M_invokeERKSt9_Any_dataS3_i
0xaee074 
_ZNK4kudu22interval_tree_internal6ITNodeINS_6tablet20RowSetIntervalTraitsEE31ForEachIntervalContainingPointsIZNKS2_10RowSetTree27ForEachRowSetContainingKeysERKSt6vectorINS_5SliceESaIS8_EERKSt8functionIFvPNS2_6RowSetEiEEEUlRKNS2_12_GLOBAL__N_111QueryStructEPNS2_16RowSetWithBoundsEE_N9__gnu_cxx17__normal_iteratorIPSM_S7_ISL_SaISL_EEvT0_SX_RKT_
0xaee1b3 
_ZNK4kudu22interval_tree_internal6ITNodeINS_6tablet20RowSetIntervalTraitsEE31ForEachIntervalContainingPointsIZNKS2_10RowSetTree27ForEachRowSetContainingKeysERKSt6vectorINS_5SliceESaIS8_EERKSt8functionIFvPNS2_6RowSetEiEEEUlRKNS2_12_GLOBAL__N_111QueryStructEPNS2_16RowSetWithBoundsEE_N9__gnu_cxx17__normal_iteratorIPSM_S7_ISL_SaISL_EEvT0_SX_RKT_
0xaee3a3 kudu::tablet::RowSetTree::ForEachRowSetContainingKeys()
0xa80c17 kudu::tablet::Tablet::BulkCheckPresence()
0xa8108a kudu::tablet::Tablet::ApplyRowOperations()
{noformat}

Note that the slow step in writes for these workloads is generally CPU usage in 
the apply phase, once they have been running for a while.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2704) Rowsets that are much bigger than the target size discourage compactions

2019-02-20 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2704:
---

 Summary: Rowsets that are much bigger than the target size 
discourage compactions
 Key: KUDU-2704
 URL: https://issues.apache.org/jira/browse/KUDU-2704
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Will Berkeley
Assignee: Will Berkeley


In KUDU-2701, I fixed a KUDU-1400-related compaction loop where the size used 
for compaction was the base data and redos, which caused situations where 
compacting rowsets that looked small but weren't was effectively a no-op, 
resulting in a compaction loop. Now, rowset count / KUDU-1400 compactions use 
the whole rowset size. While testing something on a table with 279 columns, I 
noticed that almost all rowsets were being flushed at a size of 80-90MB and, 
even though the tablet height was increasing rapidly and above 20, almost no 
compactions were happening. Looking into it, when the total size of the rowset 
is far above the target size, we assign a big negative score to including the 
rowset in a compaction, since the score is proportional to 1 - size/target 
size. This problem always existed, it just got worse because the size now 
includes more things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2701) Compaction loop resulting from small rowset compaction policy (KUDU-1400)

2019-02-13 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767697#comment-16767697
 ] 

Will Berkeley commented on KUDU-2701:
-

I confirmed it's possible to workaround this by setting 
{{--compaction_small_rowset_tradeoff=0}}. This will disable all KUDU-1400 
"small rowsets" compaction.

> Compaction loop resulting from small rowset compaction policy (KUDU-1400)
> -
>
> Key: KUDU-2701
> URL: https://issues.apache.org/jira/browse/KUDU-2701
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet
>Affects Versions: 1.9.0
>Reporter: Will Berkeley
>Assignee: Will Berkeley
>Priority: Blocker
>
> I saw an instance of rowset compaction looping, accomplishing nothing but 
> wearing out the disk. The offending compaction's logging is
> {noformat}
> I0213 14:23:51.062127  9268 maintenance_manager.cc:306] P 
> 09d6bf7a02124145b43f43cb7a667b3d: Scheduling 
> CompactRowSetsOp(39aba63834b441e0b26d7aa5949f92ae): perf score=0.025948
> I0213 14:23:55.630163  8904 maintenance_manager.cc:501] P 
> 09d6bf7a02124145b43f43cb7a667b3d: 
> CompactRowSetsOp(39aba63834b441e0b26d7aa5949f92ae) complete. Timing: real 
> 12.968s   user 10.285ssys 0.578s Metrics: 
> {"bytes_written":235680573,"cfile_cache_hit":32,"cfile_cache_hit_bytes":44035,"cfile_cache_miss":5750,"cfile_cache_miss_bytes":246197139,"cfile_init":24,"data
>  dirs.queue_time_us":162819,"data dirs.run_cpu_time_us":1864,"data 
> dirs.run_wall_time_us":1461840,"drs_written":8,"fdatasync":25,"fdatasync_us":1217638,"lbm_read_time_us":79002,"lbm_reads_lt_1ms":5846,"lbm_write_time_us":1043597,"lbm_writes_1-10_ms":2,"lbm_writes_10-100_ms":10,"lbm_writes_gt_100_ms":1,"lbm_writes_lt_1ms":30276,"mutex_wait_us":1,"num_input_rowsets":8,"rows_written":46640210,"spinlock_wait_cycles":1664,"thread_start_us":453,"threads_started":14,"wal-append.queue_time_us":49}
> {noformat}
> The situation is that about 8 16MB rowsets are calculated to compact to 4 
> 32MB rowsets, but get written out as 8 16MB rowsets. The trick is that the 
> rowsets *aren't* 16MB...they are really 32MB-- 16MB is the size of the base 
> data and redos, which is what we use as the "size" for compaction purposes.
> Probably, we should use "everything" ({{Rowset::OnDiskSize()}}) for the 
> KUDU-1400 calculations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2701) Compaction loop resulting from small rowset compaction policy (KUDU-1400)

2019-02-13 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2701:
---

 Summary: Compaction loop resulting from small rowset compaction 
policy (KUDU-1400)
 Key: KUDU-2701
 URL: https://issues.apache.org/jira/browse/KUDU-2701
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Will Berkeley
Assignee: Will Berkeley


I saw an instance of rowset compaction looping, accomplishing nothing but 
wearing out the disk. The offending compaction's logging is

{noformat}
I0213 14:23:51.062127  9268 maintenance_manager.cc:306] P 
09d6bf7a02124145b43f43cb7a667b3d: Scheduling 
CompactRowSetsOp(39aba63834b441e0b26d7aa5949f92ae): perf score=0.025948
I0213 14:23:55.630163  8904 maintenance_manager.cc:501] P 
09d6bf7a02124145b43f43cb7a667b3d: 
CompactRowSetsOp(39aba63834b441e0b26d7aa5949f92ae) complete. Timing: real 
12.968s   user 10.285ssys 0.578s Metrics: 
{"bytes_written":235680573,"cfile_cache_hit":32,"cfile_cache_hit_bytes":44035,"cfile_cache_miss":5750,"cfile_cache_miss_bytes":246197139,"cfile_init":24,"data
 dirs.queue_time_us":162819,"data dirs.run_cpu_time_us":1864,"data 
dirs.run_wall_time_us":1461840,"drs_written":8,"fdatasync":25,"fdatasync_us":1217638,"lbm_read_time_us":79002,"lbm_reads_lt_1ms":5846,"lbm_write_time_us":1043597,"lbm_writes_1-10_ms":2,"lbm_writes_10-100_ms":10,"lbm_writes_gt_100_ms":1,"lbm_writes_lt_1ms":30276,"mutex_wait_us":1,"num_input_rowsets":8,"rows_written":46640210,"spinlock_wait_cycles":1664,"thread_start_us":453,"threads_started":14,"wal-append.queue_time_us":49}
{noformat}

The situation is that about 8 16MB rowsets are calculated to compact to 4 32MB 
rowsets, but get written out as 8 16MB rowsets. The trick is that the rowsets 
*aren't* 16MB...they are really 32MB-- 16MB is the size of the base data and 
redos, which is what we use as the "size" for compaction purposes.

Probably, we should use "everything" ({{Rowset::OnDiskSize()}}) for the 
KUDU-1400 calculations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2693) Buffer DiskRowSet flushes to more efficiently write many columns

2019-02-11 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley updated KUDU-2693:

Code Review: https://gerrit.cloudera.org/#/c/12425/

> Buffer DiskRowSet flushes to more efficiently write many columns
> 
>
> Key: KUDU-2693
> URL: https://issues.apache.org/jira/browse/KUDU-2693
> Project: Kudu
>  Issue Type: Improvement
>  Components: fs, tablet
>Affects Versions: 1.9.0
>Reporter: Mike Percy
>Assignee: Todd Lipcon
>Priority: Major
>
> When looking at a trace of some MRS flushes on a table with 280 columns, it 
> was observed that during the course of the flush some 695 fdatasync() calls 
> occurred.
> One possible way to minimize the number of fsync calls would be to flush 
> directly to memory buffers first, determine the ideal layout on disk for the 
> flushed blocks (possibly striped across one log block container per data 
> disk) and then potentially write the data out to the containers in parallel. 
> This would require some memory buffer space to be reserved per maintenance 
> manager thread, possibly 64MB since the DRS roll size is 32MB.
> According to Todd we could probably do it all in LogBlockManager by adding a 
> new flag to CreateBlockOptions that says whether to buffer or something like 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2693) Buffer DiskRowSet flushes to more efficiently write many columns

2019-02-11 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2693:
---

Assignee: Todd Lipcon

> Buffer DiskRowSet flushes to more efficiently write many columns
> 
>
> Key: KUDU-2693
> URL: https://issues.apache.org/jira/browse/KUDU-2693
> Project: Kudu
>  Issue Type: Improvement
>  Components: fs, tablet
>Affects Versions: 1.9.0
>Reporter: Mike Percy
>Assignee: Todd Lipcon
>Priority: Major
>
> When looking at a trace of some MRS flushes on a table with 280 columns, it 
> was observed that during the course of the flush some 695 fdatasync() calls 
> occurred.
> One possible way to minimize the number of fsync calls would be to flush 
> directly to memory buffers first, determine the ideal layout on disk for the 
> flushed blocks (possibly striped across one log block container per data 
> disk) and then potentially write the data out to the containers in parallel. 
> This would require some memory buffer space to be reserved per maintenance 
> manager thread, possibly 64MB since the DRS roll size is 32MB.
> According to Todd we could probably do it all in LogBlockManager by adding a 
> new flag to CreateBlockOptions that says whether to buffer or something like 
> that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2677) Implement new gflag for backup history retention

2019-01-31 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757725#comment-16757725
 ] 

Will Berkeley commented on KUDU-2677:
-

We decided it doesn't make sense to separate the latest time when history is 
scannable and the latest time before it can be GC'd. In other words, this JIRA 
just requests a flag so that {{AHM = now - max(old_flag, new_flag)}}.

> Implement new gflag for backup history retention
> 
>
> Key: KUDU-2677
> URL: https://issues.apache.org/jira/browse/KUDU-2677
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Will Berkeley
>Priority: Major
>  Labels: backup
>
> Implement separate gflag (perhaps --backup_max_incremental_age_sec) for 
> non-snapshot history retention separate from --tablet_history_max_age_sec; 
> Actual retention will be maximum of these two flags, but regular scans still 
> may only scan after --tablet_history_max_age_sec. This makes it possible to 
> implement snapshots in future releases by retiring or changing the semantics 
> of the --tablet_history_max_age_sec flag.
> Overall this help support backup jobs that run for an extended period of 
> time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2677) Implement new gflag for backup history retention

2019-01-29 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2677:
---

Assignee: Will Berkeley

> Implement new gflag for backup history retention
> 
>
> Key: KUDU-2677
> URL: https://issues.apache.org/jira/browse/KUDU-2677
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Will Berkeley
>Priority: Major
>  Labels: backup
>
> Implement separate gflag (perhaps --backup_max_incremental_age_sec) for 
> non-snapshot history retention separate from --tablet_history_max_age_sec; 
> Actual retention will be maximum of these two flags, but regular scans still 
> may only scan after --tablet_history_max_age_sec. This makes it possible to 
> implement snapshots in future releases by retiring or changing the semantics 
> of the --tablet_history_max_age_sec flag.
> Overall this help support backup jobs that run for an extended period of 
> time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2662) TestKuduClient.testClientLocation is flaky

2019-01-28 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2662.
-
   Resolution: Fixed
Fix Version/s: 1.9.0

Fixed in 1f1f875cfb6a26b55821edd236f81f28ce06759b

> TestKuduClient.testClientLocation is flaky
> --
>
> Key: KUDU-2662
> URL: https://issues.apache.org/jira/browse/KUDU-2662
> Project: Kudu
>  Issue Type: Bug
>  Components: java, test
>Reporter: Andrew Wong
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: TEST-org.apache.kudu.client.TestKuduClient.xml
>
>
> I've seen a few failures of TestKuduClient.testClientLocation with retries 
> turned on. I've attached a log. Here is the error of interest:
>  
> org.junit.ComparisonFailure: expected:<[/L0]> but was:<[]>
>   at org.junit.Assert.assertEquals(Assert.java:115)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.kudu.client.TestKuduClient.testClientLocation(TestKuduClient.java:1174)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2676) [Backup] Support restoring tables over the maximum allowed replicas

2019-01-28 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754441#comment-16754441
 ] 

Will Berkeley commented on KUDU-2676:
-

We'll probably need to mimic the steps needed to create such a table. For 
example, if at most 60 tablets can be created at once, and a table has 12 hash 
partitions and 12 range partitions, the table must have been enlarged to its 
present number of tablets (144) by net adding 84 range partitions over time. So 
to restore it, we would start with 5 range partitions (5 * 12 = 60), then add 
in extra range partitions. This could be done all at once, when the restore 
table is created at the beginning of the restore process for the table, or we 
could amortize the cost of creating the tablets by adding a new range partition 
as work restoring other range partitions finishes and we want to kick off tasks 
to restore more tablets.

> [Backup] Support restoring tables over the maximum allowed replicas
> ---
>
> Key: KUDU-2676
> URL: https://issues.apache.org/jira/browse/KUDU-2676
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Grant Henke
>Priority: Major
>  Labels: backup
>
> Currently it is possible to backup a table that has more partitions than are 
> allowed at create time. 
> This results in the restore job failing with the following exception:
> {noformat}
> 19/01/24 08:17:14 INFO backup.KuduRestore$: Restoring from path: 
> hdfs:///user/ghenke/kudu-backup-tests/20190124-080741
> Exception in thread "main" org.apache.kudu.client.NonRecoverableException: 
> the requested number of tablet replicas is over the maximum permitted at 
> creation time (
> 450), additional tablets may be added by adding range partitions to the table 
> post-creation
> at 
> org.apache.kudu.client.KuduException.transformException(KuduException.java:110)
> at 
> org.apache.kudu.client.KuduClient.joinAndHandleException(KuduClient.java:365)
> at org.apache.kudu.client.KuduClient.createTable(KuduClient.java:109)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2675) RegexpKuduOperationsProducerParseErrorTest fails with OOM in testBadColumnValueThrowsExceptionDefaultConfig

2019-01-28 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2675:
---

 Summary: RegexpKuduOperationsProducerParseErrorTest fails with OOM 
in testBadColumnValueThrowsExceptionDefaultConfig
 Key: KUDU-2675
 URL: https://issues.apache.org/jira/browse/KUDU-2675
 Project: Kudu
  Issue Type: Bug
Reporter: Will Berkeley


There weren't any logs beyond that the wrong exception was thrown:

{noformat}
Expected: (an instance of org.apache.flume.FlumeException and exception with 
message a string containing "Raw value '1,1000,string' couldn't be parsed 
to type Type: int8 for column 'byteFld'" and an instance of 
org.apache.flume.FlumeException and exception with message a string containing 
"Raw value '1,1000,string' couldn't be parsed to type Type: int8 for 
column 'byteFld'" and an instance of org.apache.flume.FlumeException and 
exception with message a string containing "Raw value '1,1000,string' 
couldn't be parsed to type Type: int8 for column 'byteFld'")
 but: an instance of org.apache.flume.FlumeException 
 is a 
java.lang.OutOfMemoryError
Stacktrace was: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3326)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
at java.lang.StringBuilder.append(StringBuilder.java:136)
at 
org.apache.kudu.test.CapturingLogAppender.append(CapturingLogAppender.java:53)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:562)
at 
org.apache.kudu.test.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:80)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.junit.Assert.assertThat(Assert.java:956)
at org.junit.Assert.assertThat(Assert.java:923)
at 
org.junit.rules.ExpectedException.handleException(ExpectedException.java:252)
at 
org.junit.rules.ExpectedException.access$000(ExpectedException.java:106)
at 
org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:241)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)

[jira] [Commented] (KUDU-1736) kudu crash in debug build: unordered undo delta

2019-01-28 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754245#comment-16754245
 ] 

Will Berkeley commented on KUDU-1736:
-

This still happens.

 [^mt-tablet-test.3.txt] 

> kudu crash in debug build: unordered undo delta
> ---
>
> Key: KUDU-1736
> URL: https://issues.apache.org/jira/browse/KUDU-1736
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet
>Reporter: zhangsong
>Priority: Critical
> Attachments: mt-tablet-test-20171123.txt.xz, mt-tablet-test.3.txt, 
> mt-tablet-test.txt, mt-tablet-test.txt.gz
>
>
> in jd cluster we met kudu-tserver crash with fatal messages described as 
> follow:
> Check failed: last_key_.CompareTo(key) <= 0 must insert undo deltas in 
> sorted order (ascending key, then descending ts): got key (row 
> 1422@tx6052042821982183424) after (row 1422@tx6052042821953155072)
> This is a dcheck which should not failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-1736) kudu crash in debug build: unordered undo delta

2019-01-28 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley updated KUDU-1736:

Attachment: mt-tablet-test.3.txt

> kudu crash in debug build: unordered undo delta
> ---
>
> Key: KUDU-1736
> URL: https://issues.apache.org/jira/browse/KUDU-1736
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet
>Reporter: zhangsong
>Priority: Critical
> Attachments: mt-tablet-test-20171123.txt.xz, mt-tablet-test.3.txt, 
> mt-tablet-test.txt, mt-tablet-test.txt.gz
>
>
> in jd cluster we met kudu-tserver crash with fatal messages described as 
> follow:
> Check failed: last_key_.CompareTo(key) <= 0 must insert undo deltas in 
> sorted order (ascending key, then descending ts): got key (row 
> 1422@tx6052042821982183424) after (row 1422@tx6052042821953155072)
> This is a dcheck which should not failed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1868) Java client mishandles socket read timeouts for scans

2019-01-25 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752770#comment-16752770
 ] 

Will Berkeley commented on KUDU-1868:
-

I've done a good amount of investigation into this now.

First of all, here's a changelist that adds two tests to the Java client that 
expose problems related to how timeouts work in the Java client.

{noformat}
commit 190aedeef663212cd0ce37d45e5cde35e0de39e8
Author: Will Berkeley 
Date:   Fri Jan 25 14:09:54 2019 -0800

KUDU-1868 tests

Change-Id: I8d823b63ac0a41cc5e42b63a7c19e0ef777e1dea

diff --git 
a/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java 
b/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
index b849e9f63..33e280021 100644
--- a/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
+++ b/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
@@ -61,6 +61,7 @@ import 
org.apache.kudu.test.KuduTestHarness.TabletServerConfig;
 import org.apache.kudu.test.ClientTestUtil;
 import org.apache.kudu.util.TimestampUtil;
 import org.junit.Before;
+import org.junit.Ignore;
 import org.junit.Rule;
 import org.junit.Test;
 
@@ -306,6 +307,51 @@ public class TestKuduClient {
 }
   }
 
+  /*
+   * This is a repro for KUDU-1868. When scanning, the server might be slow to 
respond to an
+   * individual TabletServerService.Scan request. If it takes longer than the 
socket read timeout
+   * to respond, and there isn't other traffic on the connection, the netty 
Channel may send a
+   * socket read timeout event. This will cause the scan request to be 
retried. If the scan request
+   * is not the first scan request, this will cause the server to see the same 
call id for the
+   * same scanner twice, causing the "Invalid call sequence ID in scan 
request" exception.
+   */
+  @Test(timeout = 10)
+  @Ignore
+  @TabletServerConfig(flags = { 
"--scanner_inject_latency_on_subsequent_batches=1000" })
+  public void testKUDU1868() throws Exception {
+// Create a basic table and load it with data.
+int numRows = 1000;
+KuduTable table = client.createTable(
+TABLE_NAME,
+basicSchema,
+new CreateTableOptions().addHashPartitions(ImmutableList.of("key"), 
2));
+KuduSession session = client.newSession();
+for (int i = 0; i < numRows; i++) {
+  Insert insert = createBasicSchemaInsert(table, i);
+  session.apply(insert);
+}
+
+// Make a new client with a socket read timeout much shorter than how long 
it will take the
+// server to respond to continue scan requests, and use it to scan the 
table created above.
+KuduClient shortRecvTimeoutClient =
+new KuduClient.KuduClientBuilder(client.getMasterAddressesAsString())
+.defaultSocketReadTimeoutMs(500)
+.build();
+
shortRecvTimeoutClient.updateLastPropagatedTimestamp(client.getLastPropagatedTimestamp());
+KuduTable shortRecvTimeoutTable = 
shortRecvTimeoutClient.openTable(TABLE_NAME);
+// Set a small batch size so there will be data for multiple roundtrips.
+KuduScanner scanner = shortRecvTimeoutClient
+.newScannerBuilder(shortRecvTimeoutTable)
+.batchSizeBytes(100)
+.build();
+
+// The first request that create the scanner will not hand and will be 
fine.
+scanner.nextRows();
+
+// The second will result in "Invalid call sequence ID in scan request", 
but it should be fine.
+scanner.nextRows();
+  }
+
   /**
* Test creating a table with columns with different combinations of NOT 
NULL and
* default values, inserting rows, and checking the results are as expected.
diff --git 
a/java/kudu-client/src/test/java/org/apache/kudu/client/TestTimeouts.java 
b/java/kudu-client/src/test/java/org/apache/kudu/client/TestTimeouts.java
index db3dcbd56..57595f292 100644
--- a/java/kudu-client/src/test/java/org/apache/kudu/client/TestTimeouts.java
+++ b/java/kudu-client/src/test/java/org/apache/kudu/client/TestTimeouts.java
@@ -19,10 +19,12 @@ package org.apache.kudu.client;
 import static org.apache.kudu.test.ClientTestUtil.createBasicSchemaInsert;
 import static org.apache.kudu.test.ClientTestUtil.getBasicCreateTableOptions;
 import static org.apache.kudu.test.ClientTestUtil.getBasicSchema;
+import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertTrue;
 import static org.junit.Assert.fail;
 
 import org.apache.kudu.test.KuduTestHarness;
+import org.apache.kudu.test.KuduTestHarness.TabletServerConfig;
 import org.junit.Rule;
 import org.junit.Test;
 
@@ -59,7 +61,7 @@ public class TestTimeouts {
 
 KuduSession lowTimeoutSession = lowTimeoutsClient.newSession();
 
-OperationResponse response = 
lowTimeoutSession.apply(createBasicSchemaInsert(table, 1));
+OperationResponse response = 
lowTimeoutSession.apply(createBasicSchemaInsert(table, 0

[jira] [Commented] (KUDU-1868) Java client mishandles socket read timeouts for scans

2019-01-25 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16752602#comment-16752602
 ] 

Will Berkeley commented on KUDU-1868:
-

Sort of. The socket read timeout is a property of the connection, so it applies 
to all RPCs that go over the connection. Ergo, if a client is using different 
timeouts for different operations, we can't match the socket read timeout to 
all the different timeouts. But if the same timeout is set on all RPCs then 
setting the socket read timeout to that timeout is reasonable.

The Java client depends on having a socket read timeout to handle timeouts when 
the server hangs and doesn't respond. Without it, if a server just doesn't 
respond to an rpc, that rpc will hang forever. I have a test showing this. I'm 
not 100% sure what happens if there is parallel traffic on one connection- if 
one RPC is hanging, but others are passing on the connection and resetting the 
last recv time, I assume the hanging RPC will not time out. I should enhance my 
test to show this.

We should instead have a mechanism to track each call's timeout and actively 
time it out when it times out, instead of relying on an event in the netty 
Channel. This might be tricky in practice because one operation that looks like 
one RPC and that has one timeout may actually be a series of RPCs of unknown 
length going to different servers.

> Java client mishandles socket read timeouts for scans
> -
>
> Key: KUDU-1868
> URL: https://issues.apache.org/jira/browse/KUDU-1868
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.2.0
>Reporter: Jean-Daniel Cryans
>Assignee: Will Berkeley
>Priority: Major
>
> Scan calls from the Java client that take more than the socket read timeout 
> get retried (unless the operation timeout has expired) instead of being 
> killed. Users will see this:
> {code}
> org.apache.kudu.client.NonRecoverableException: Invalid call sequence ID in 
> scan request
> {code}
> Note that the right behavior here would still end up killing the scanner, so 
> this is really a problem the user has to deal with! It's usually caused by 
> slow IO, combined with very selection scans.
> Workaround: set defaultSocketReadTimeoutMs higher, ideally equal to 
> defaultOperationTimeoutMs (the defaults are 10 and 30 seconds respectively). 
> But really the user should investigate why single the scans are so slow.
> One potentially easy fix to this is to handle retries differently for 
> scanners so that the user gets nicer exception. A harder fix is to handle 
> socket read timeouts completely differently, basically it should be per-RPC 
> and not per TabletClient like it is right now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-1868) Java client mishandles socket read timeouts for scans

2019-01-24 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-1868:
---

Assignee: Will Berkeley

> Java client mishandles socket read timeouts for scans
> -
>
> Key: KUDU-1868
> URL: https://issues.apache.org/jira/browse/KUDU-1868
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.2.0
>Reporter: Jean-Daniel Cryans
>Assignee: Will Berkeley
>Priority: Major
>
> Scan calls from the Java client that take more than the socket read timeout 
> get retried (unless the operation timeout has expired) instead of being 
> killed. Users will see this:
> {code}
> org.apache.kudu.client.NonRecoverableException: Invalid call sequence ID in 
> scan request
> {code}
> Note that the right behavior here would still end up killing the scanner, so 
> this is really a problem the user has to deal with! It's usually caused by 
> slow IO, combined with very selection scans.
> Workaround: set defaultSocketReadTimeoutMs higher, ideally equal to 
> defaultOperationTimeoutMs (the defaults are 10 and 30 seconds respectively). 
> But really the user should investigate why single the scans are so slow.
> One potentially easy fix to this is to handle retries differently for 
> scanners so that the user gets nicer exception. A harder fix is to handle 
> socket read timeouts completely differently, basically it should be per-RPC 
> and not per TabletClient like it is right now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2664) Tablet server crashed when running kudu remote_replica unsafe_change

2019-01-18 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley updated KUDU-2664:

Description: 
While trying to reproduce a different issue, I ran the following command

{noformat}
for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 
3ccbce6a3116487cbcc79ab4280a2ee5 6ca21fa7dcf54761a5ec7017ff101a68 
454b53ed77bd458a81a7710c892f214b; done
{noformat}

and encountered the following tablet server crash

{noformat}
F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 
3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 
FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: 
kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b
*** Check failure stack trace: ***
@0x10c91247f  google::LogMessageFatal::~LogMessageFatal()
@0x10c90f259  google::LogMessageFatal::~LogMessageFatal()
@0x108b74c05  
kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked()
@0x108b6c180  kudu::consensus::RaftConsensus::UpdateReplica()
@0x108b6b459  kudu::consensus::RaftConsensus::Update()
@0x107cf5106  kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
@0x10b53b87d  
kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()()
@0x10b53b819  
_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextvDpOT_
@0x10b53b6a9  std::__1::__function::__func<>::operator()()
@0x10b843e07  std::__1::function<>::operator()()
@0x10b843a1a  kudu::rpc::GeneratedServiceIf::Handle()
@0x10b846cb6  kudu::rpc::ServicePool::RunThread()
@0x10b849aa9  boost::_mfi::mf0<>::operator()()
@0x10b849a10  boost::_bi::list1<>::operator()<>()
@0x10b8499ba  boost::_bi::bind_t<>::operator()()
@0x10b84979d  
boost::detail::function::void_function_obj_invoker0<>::invoke()
@0x10b7bb1fa  boost::function0<>::operator()()
@0x10c2cc2f5  kudu::Thread::SuperviseThread()
@ 0x7fff5dc09305  _pthread_body
@ 0x7fff5dc0c26f  _pthread_start
@ 0x7fff5dc08415  thread_start
{noformat}

The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at 
address 127.0.0.1:7250, and I was trying to kick out one of the three replicas 
while fishing for a repro of the other issue.

I couldn't get the crash to happen again and I wasn't able to capture a 
minidump or core dump...and I accidentally deleted the logs, so I'm afraid the 
above is all there is to go on.

It's expected that funny stuff could happen when using unsafe_change_config-- 
it's unsafe. But it shouldn't be possible to crash the tablet server with it.

  was:
While trying to reproduce a different issue, I ran the following command

{noformat}
for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 
3ccbce6a3116487cbcc79ab4280a2ee5
{noformat}

and encountered the following tablet server crash

{noformat}
F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 
3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 
FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: 
kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b
*** Check failure stack trace: ***
@0x10c91247f  google::LogMessageFatal::~LogMessageFatal()
@0x10c90f259  google::LogMessageFatal::~LogMessageFatal()
@0x108b74c05  
kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked()
@0x108b6c180  kudu::consensus::RaftConsensus::UpdateReplica()
@0x108b6b459  kudu::consensus::RaftConsensus::Update()
@0x107cf5106  kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
@0x10b53b87d  
kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()()
@0x10b53b819  
_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextvDpOT_
@0x10b53b6a9  std::__1::__function::__func<>::operator()()
@0x10b843e07  std::__1::function<>::operator()()
@0x10b843a1a  kudu::rpc::GeneratedServiceIf::Handle()
@0x10b846cb6  kudu::rpc::ServicePool::RunThread()
@0x10b849aa9  boost::_mfi::mf0<>::operator()()
@0x10b849a10  boost::_bi::list1<>::operator()<>()
@0x10b8499ba  boost::_bi::bind_t<>::operator()()
@0x10b84979d  
boost::detail::function::void_function_obj_invoker0<>::invoke()
@0x10b7bb1fa  boost::function0<>::operator()()
@0x10c2cc2f5  

[jira] [Created] (KUDU-2664) Tablet server crashed when running kudu remote_replica unsafe_change

2019-01-18 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2664:
---

 Summary: Tablet server crashed when running kudu remote_replica 
unsafe_change
 Key: KUDU-2664
 URL: https://issues.apache.org/jira/browse/KUDU-2664
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Will Berkeley


While trying to reproduce a different issue, I ran the following command

{noformat}
for i in 0 1; do bin/kudu remote_replica unsafe_change_config 127.0.0.1:7250 
3ccbce6a3116487cbcc79ab4280a2ee5
{noformat}

and encountered the following tablet server crash

{noformat}
F0118 10:45:42.696043 280514560 raft_consensus.cc:1286] T 
3ccbce6a3116487cbcc79ab4280a2ee5 P 6ca21fa7dcf54761a5ec7017ff101a68 [term 6 
FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: 
kudu-tools, new leader UUID: 454b53ed77bd458a81a7710c892f214b
*** Check failure stack trace: ***
@0x10c91247f  google::LogMessageFatal::~LogMessageFatal()
@0x10c90f259  google::LogMessageFatal::~LogMessageFatal()
@0x108b74c05  
kudu::consensus::RaftConsensus::CheckLeaderRequestUnlocked()
@0x108b6c180  kudu::consensus::RaftConsensus::UpdateReplica()
@0x108b6b459  kudu::consensus::RaftConsensus::Update()
@0x107cf5106  kudu::tserver::ConsensusServiceImpl::UpdateConsensus()
@0x10b53b87d  
kudu::consensus::ConsensusServiceIf::ConsensusServiceIf()::$_1::operator()()
@0x10b53b819  
_ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN4kudu9consensus18ConsensusServiceIfC1ERK13scoped_refptrINS3_12MetricEntityEERKS6_INS3_3rpc13ResultTrackerEEE3$_1PKN6google8protobuf7MessageEPSK_PNSB_10RpcContextvDpOT_
@0x10b53b6a9  std::__1::__function::__func<>::operator()()
@0x10b843e07  std::__1::function<>::operator()()
@0x10b843a1a  kudu::rpc::GeneratedServiceIf::Handle()
@0x10b846cb6  kudu::rpc::ServicePool::RunThread()
@0x10b849aa9  boost::_mfi::mf0<>::operator()()
@0x10b849a10  boost::_bi::list1<>::operator()<>()
@0x10b8499ba  boost::_bi::bind_t<>::operator()()
@0x10b84979d  
boost::detail::function::void_function_obj_invoker0<>::invoke()
@0x10b7bb1fa  boost::function0<>::operator()()
@0x10c2cc2f5  kudu::Thread::SuperviseThread()
@ 0x7fff5dc09305  _pthread_body
@ 0x7fff5dc0c26f  _pthread_start
@ 0x7fff5dc08415  thread_start
{noformat}

The target of the config change was TS 6ca21fa7dcf54761a5ec7017ff101a68 at 
address 127.0.0.1:7250, and I was trying to kick out one of the three replicas 
while fishing for a repro of the other issue.

I couldn't get the crash to happen again and I wasn't able to capture a 
minidump or core dump...and I accidentally deleted the logs, so I'm afraid the 
above is all there is to go on.

It's expected that funny stuff could happen when using unsafe_change_config-- 
it's unsafe. But it shouldn't be possible to crash the tablet server with it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2662) TestKuduClient.testClientLocation is flaky

2019-01-17 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2662:
---

Assignee: Will Berkeley

> TestKuduClient.testClientLocation is flaky
> --
>
> Key: KUDU-2662
> URL: https://issues.apache.org/jira/browse/KUDU-2662
> Project: Kudu
>  Issue Type: Bug
>  Components: java, test
>Reporter: Andrew Wong
>Assignee: Will Berkeley
>Priority: Major
> Attachments: TEST-org.apache.kudu.client.TestKuduClient.xml
>
>
> I've seen a few failures of TestKuduClient.testClientLocation with retries 
> turned on. I've attached a log. Here is the error of interest:
>  
> org.junit.ComparisonFailure: expected:<[/L0]> but was:<[]>
>   at org.junit.Assert.assertEquals(Assert.java:115)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.kudu.client.TestKuduClient.testClientLocation(TestKuduClient.java:1174)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:483)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2047) Lazy cfile open and maintenance op stat caching cause fruitful delta compaction ops to never run

2019-01-10 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2047:
---

Assignee: Will Berkeley

> Lazy cfile open and maintenance op stat caching cause fruitful delta 
> compaction ops to never run
> 
>
> Key: KUDU-2047
> URL: https://issues.apache.org/jira/browse/KUDU-2047
> Project: Kudu
>  Issue Type: Bug
>  Components: perf, tablet
>Affects Versions: 1.4.0
>Reporter: Todd Lipcon
>Assignee: Will Berkeley
>Priority: Major
>
> I was just looking at a cluster which has a large amount of REDO data on some 
> of its tablets, and wasn't sure why it wasn't ever compacting it. The issue 
> appears to be the following:
> - in DiskRowSet::DeltaStoresCompactionPerfImprovementScore(), we call through 
> to GetColumnIdsWithUpdates() to see which columns may need compaction
> -- if the REDO delta block is not open (eg when the server has recently 
> started), this will skip the unopened delta file stats and not include them 
> in the result
> -- we thus determine that the compaction is not fruitful
> This was a conscious decision to avoid the MM from eagerly opening every 
> delta on its first pass through computing compaction stats. We figured that, 
> if it were worth compacting, then probably someone would scan the data, 
> forcing the deltas to get opened and thus made eligible for compaction.
> However, the MM tries to be smart about caching the statistics (see 
> e7fe0c1a94cac364522c09b8208c98480947d794). In particular, if it sees that the 
> tablet has not run any flushes or compactions, it won't bother to recalculate 
> the stats, assuming they haven't changed.
> So, if you have a completely read-only tablet with some uncompacted deltas, 
> the MM op will never run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-1535) Add rack awareness

2019-01-10 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-1535.
-
   Resolution: Done
Fix Version/s: 1.9.0

This is done besides some documentation.

The first iteration supports assigning locations to tablet servers and clients, 
enforcing that a placement policy that a majority of replicas is not placed in 
the same location/rack, rebalancing replicas to re-establish the placement 
policy, rebalancing for replica count compatible with the placement policy, and 
allowing clients to choose to scan preferentially from tablet servers in the 
same location.

> Add rack awareness
> --
>
> Key: KUDU-1535
> URL: https://issues.apache.org/jira/browse/KUDU-1535
> Project: Kudu
>  Issue Type: New Feature
>  Components: master
>Reporter: Jean-Daniel Cryans
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.9.0
>
>
> Kudu currently doesn't have the concept of rack awareness, so any kind of 
> rack failure can result in data loss (assuming that Kudu has tablet servers 
> in multiple racks).
> This changes how the master picks hosts to send new tablets to, and we also 
> need to implement a way for the master to map hostname names to racks (could 
> be similar to Hadoop).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-2648) compaction does not run

2019-01-10 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley reassigned KUDU-2648:
---

Assignee: Will Berkeley

> compaction does not run
> ---
>
> Key: KUDU-2648
> URL: https://issues.apache.org/jira/browse/KUDU-2648
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet
>Affects Versions: 1.7.0
> Environment: 3 master nodes, 4c32g, ubuntu16.04
> 3 data nodes, 8c64g, 1.8T ssd, ubuntu16.04
>Reporter: huajian
>Assignee: Will Berkeley
>Priority: Major
>  Labels: compact
>
> Here is a table : project_construction_record, 62 columns, 170k records, no 
> partition
> The table has many crud operations every day
> I run a simple sql on it (using impala): 
>  
> {code:java}
> SELECT * FROM project_construction_record ORDER BY id LIMIT 1{code}
> it takes 7 seconds
> By checking the profile, I found this:
> {quote}
> h4. KUDU_SCAN_NODE (id=0) (6.06秒)
>  * BytesRead: *0 字节*
>  * CollectionItemsRead: *0*
>  * InactiveTotalTime: *0纳秒*
>  * KuduRemoteScanTokens: *0*
>  * NumScannerThreadsStarted: *1*
>  * PeakMemoryUsage: *3.4 兆字节*
>  * RowsRead: *177,007*
>  * RowsReturned: *177,007*
>  * RowsReturnedRate: *29188/秒*
>  * ScanRangesComplete: *1*
>  * ScannerThreadsInvoluntaryContextSwitches: *0*
>  * ScannerThreadsTotalWallClockTime: *6.09秒*
>  ** MaterializeTupleTime(*): *6.06秒*
>  ** ScannerThreadsSysTime: *48毫秒*
>  ** ScannerThreadsUserTime: *172毫秒*{quote}
> So i check the scan of this sql, and found this:
> |column|cells read|bytes read|blocks read|
> |id|176.92k|1.91M|19.96k|
> |org_id|176.92k|1.91M|19.96k|
> |work_date|176.92k|2.03M|19.96k|
> |description|176.92k|1.21M|19.96k|
> |user_name|176.92k|775.9K|19.96k|
> |spot_name|176.92k|825.8K|19.96k|
> |spot_start_pile|176.92k|778.7K|19.96k|
> |spot_end_pile|176.92k|780.4K|19.96k|
> |..|..|..|..|
> There are so many blocks read.
> Then I run the _*kudu fs list*_ command, and I got a 70M report data, here is 
> the bottom:
>  
> {code:java}
> 0b6ac30b449043a68905e02b797144fc | 25024 | 40310988 | column
>  0b6ac30b449043a68905e02b797144fc | 25024 | 40310989 | column
>  0b6ac30b449043a68905e02b797144fc | 25024 | 40310990 | column
>  0b6ac30b449043a68905e02b797144fc | 25024 | 40310991 | column
>  0b6ac30b449043a68905e02b797144fc | 25024 | 40310992 | column
>  0b6ac30b449043a68905e02b797144fc | 25024 | 40310993 | column
>  0b6ac30b449043a68905e02b797144fc | 25024 | 40310996 | undo
>  0b6ac30b449043a68905e02b797144fc | 25024 | 40310994 | bloom
>  0b6ac30b449043a68905e02b797144fc | 25024 | 40310995 | adhoc-index{code}
>  
> there are 25024 rowsets, and more than 1m blocks in the tablet
> I left the maintenance and the compact flags by default, only change the 
> tablet_history_max_age_sec to one day:
>  
>  
> {code:java}
> --maintenance_manager_history_size=8
> --maintenance_manager_num_threads=1
> --maintenance_manager_polling_interval_ms=250
> --budgeted_compaction_target_rowset_size=33554432
> --compaction_approximation_ratio=1.049523162842
> --compaction_minimum_improvement=0.009997764825821
> --deltafile_default_block_size=32768
> --deltafile_default_compression_codec=lz4
> --default_composite_key_index_block_size_bytes=4096
> --tablet_delta_store_major_compact_min_ratio=0.1000149011612
> --tablet_delta_store_minor_compact_max=1000
> --mrs_use_codegen=true
> --compaction_policy_dump_svgs_pattern=
> --enable_undo_delta_block_gc=true
> --fault_crash_before_flush_tablet_meta_after_compaction=0
> --fault_crash_before_flush_tablet_meta_after_flush_mrs=0
> --max_cell_size_bytes=65536
> --max_encoded_key_size_bytes=16384
> --tablet_bloom_block_size=4096
> --tablet_bloom_target_fp_rate=9.997473787516e-05
> --tablet_compaction_budget_mb=128
> --tablet_history_max_age_sec=86400{code}
> So my question is, *why the compaction does not run? is it a bug? and what 
> can i do to compact manually?* 
> It is a production enviroment, and many other tables have same issue, the 
> performance is getting slower and slower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2306) ksck should also detect bad master replicas

2019-01-10 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2306.
-
   Resolution: Done
Fix Version/s: 1.8.0

Added in 91dd09051.

> ksck should also detect bad master replicas
> ---
>
> Key: KUDU-2306
> URL: https://issues.apache.org/jira/browse/KUDU-2306
> Project: Kudu
>  Issue Type: Improvement
>  Components: ksck, ops-tooling
>Affects Versions: 1.6.0
>Reporter: Mike Percy
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.8.0
>
>
> ksck should detect bad master replicas, not just bad tserver tablet replicas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2386) Show experimental and unsafe flags in ksck output

2019-01-10 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2386.
-
   Resolution: Done
Fix Version/s: 1.8.0

Added in 66e41bda1.

> Show experimental and unsafe flags in ksck output
> -
>
> Key: KUDU-2386
> URL: https://issues.apache.org/jira/browse/KUDU-2386
> Project: Kudu
>  Issue Type: Improvement
>  Components: ksck, supportability
>Reporter: Grant Henke
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.8.0
>
>
> Often the first step in diagnosing a Kudu issue is to run the ksck tool. 
> Because using experimental or unsafe flags can be a main culprit in 
> unexpected behavior we should output a warning in the ksck output that they 
> are set and potentially list them. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2571) Allow ksck checksum to continue as long as it is making progress

2019-01-10 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2571.
-
   Resolution: Done
Fix Version/s: 1.8.0

In 0d4740b7692c0cb5c39cca73276b395921bd3ae4 (the fix for KUDU-2179), I upped 
the default timeout to 24 hours, with an additional idle timeout that makes 
sure the checksum is making progress. I think that's fair to call this done.

> Allow ksck checksum to continue as long as it is making progress
> 
>
> Key: KUDU-2571
> URL: https://issues.apache.org/jira/browse/KUDU-2571
> Project: Kudu
>  Issue Type: Improvement
>  Components: ksck
>Reporter: Will Berkeley
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.8.0
>
>
> ksck's checksumming routine times out after 1 hour, by default, but I've seen 
> that be too short a time. We can bump the timeout, but to find out if a 
> timeout is too short you have to hit it, then restart the whole checksum, or 
> manually figure out with tablets still need to be checksummed. We should 
> allow the checksum to continue as long as it periodically makes progress.
> Probably also need to address KUDU-2179 for this to be helpful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KUDU-2527) Add Describe Table Tool

2019-01-10 Thread Will Berkeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Berkeley resolved KUDU-2527.
-
   Resolution: Done
Fix Version/s: 1.9.0

Added in 614b446e1f9291189657b265830d05d0395c5d1c.

> Add Describe Table Tool
> ---
>
> Key: KUDU-2527
> URL: https://issues.apache.org/jira/browse/KUDU-2527
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Will Berkeley
>Priority: Major
> Fix For: 1.9.0
>
>
> Add a tool to describe a table on the cli with similar information shown in 
> the table web ui. Perhaps include a verbosity flag or the option to provide 
> what "columns" of information to include. 
> Example: 
> {code}
> kudu table describe   ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2661) Add a tool to reconstruct master data from the tablet servers

2019-01-09 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738806#comment-16738806
 ] 

Will Berkeley commented on KUDU-2661:
-

WIP tool available here: https://gerrit.cloudera.org/#/c/9490/

If you are thinking about using this uncommitted, experimental tool, make sure 
there is no other route to recovery first, and know that there is no guarantee 
that the tool will work.

> Add a tool to reconstruct master data from the tablet servers
> -
>
> Key: KUDU-2661
> URL: https://issues.apache.org/jira/browse/KUDU-2661
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Will Berkeley
>Assignee: Will Berkeley
>Priority: Major
>
> On rare occasions, a cluster might lose all its masters permanently, 
> especially if there is only a single master. This would mean a total loss of 
> the whole cluster unless the user is willing to use tools to dump the data 
> from each tablet server, save it, and reprocess it back into a useful form 
> again. However, in many cases, it's actually possible to reconstruct a 
> workable master state from information on the tablet servers. A tool to do 
> this could be a lifesaver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2661) Add a tool to reconstruct master data from the tablet servers

2019-01-09 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2661:
---

 Summary: Add a tool to reconstruct master data from the tablet 
servers
 Key: KUDU-2661
 URL: https://issues.apache.org/jira/browse/KUDU-2661
 Project: Kudu
  Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Will Berkeley
Assignee: Will Berkeley


On rare occasions, a cluster might lose all its masters permanently, especially 
if there is only a single master. This would mean a total loss of the whole 
cluster unless the user is willing to use tools to dump the data from each 
tablet server, save it, and reprocess it back into a useful form again. 
However, in many cases, it's actually possible to reconstruct a workable master 
state from information on the tablet servers. A tool to do this could be a 
lifesaver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2658) Failure in write_throttling-itest

2019-01-09 Thread Will Berkeley (JIRA)
Will Berkeley created KUDU-2658:
---

 Summary: Failure in write_throttling-itest
 Key: KUDU-2658
 URL: https://issues.apache.org/jira/browse/KUDU-2658
 Project: Kudu
  Issue Type: Bug
Affects Versions: 1.8.0
Reporter: Will Berkeley
 Attachments: write_throttling-itest.txt

{noformat}
I0109 00:00:39.419872  9190 write_throttling-itest.cc:108] Iteration 0 qps: 
122.07
/home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/write_throttling-itest.cc:109:
 Failure
Expected: (qps) <= (TARGET_QPS * 1.2f), actual: 122.07 vs 120
{noformat}

Full test log attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-2655) Add metrics for metadata directory I/O

2019-01-07 Thread Will Berkeley (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736159#comment-16736159
 ] 

Will Berkeley commented on KUDU-2655:
-

It might be difficult to maintain the same metrics if we radically change how 
metadata is stored, but in the meantime I think these metrics might still be 
useful.

> Add metrics for metadata directory I/O
> --
>
> Key: KUDU-2655
> URL: https://issues.apache.org/jira/browse/KUDU-2655
> Project: Kudu
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 1.8.0
>Reporter: Will Berkeley
>Assignee: Will Berkeley
>Priority: Major
>
> There's good metrics for block manager (data dir) and WAL operations, like 
> {{block_manager_total_bytes_written}}, {{block_manager_total_bytes_read}}, 
> {{log_bytes_logged }}, and the {{log_append_latency}} histogram. What we are 
> missing are metrics about the amount of metadata I/O. It'd be nice to add
> * metadata_bytes_read
> * metadata_bytes_written
> * latency histograms for bytes read and bytes written



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   >