[jira] [Updated] (KUDU-3128) Cross compile Kudu Spark for Scala 2.11 & 2.12

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3128:
--
Component/s: build

> Cross compile Kudu Spark for Scala 2.11 & 2.12
> --
>
> Key: KUDU-3128
> URL: https://issues.apache.org/jira/browse/KUDU-3128
> Project: Kudu
>  Issue Type: Improvement
>  Components: build, spark
>Affects Versions: 1.12.0
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
>
> Currently we only publish kudu-spark for Scala 2.11. We should also compile 
> and publish a 2.12 version of the kudu-spark integration now that Spark 
> supports Scala 2.12.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3131) test rw_mutex-test hangs sometimes if build_type is release

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3131:
--
Component/s: test

> test rw_mutex-test hangs sometimes if build_type is release
> ---
>
> Key: KUDU-3131
> URL: https://issues.apache.org/jira/browse/KUDU-3131
> Project: Kudu
>  Issue Type: Sub-task
>  Components: test
>Reporter: huangtianhua
>Priority: Major
>
> Built and test kudu on aarch64, in release mode there is a test hangs 
> sometimes(maybe a deadlock?) the console out as following:
> [==] Running 2 tests from 1 test case.
> [--] Global test environment set-up.
> [--] 2 tests from Priorities/RWMutexTest
> [ RUN  ] Priorities/RWMutexTest.TestDeadlocks/0
> And seems it's ok in debug mode.
> Now only this one test failed sometimes on aarch64, [~aserbin] [~adar] would 
> you please have a look for this? Or give some suggestion to us, thanks very 
> much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3123) tracing.html doesn't render on newer browsers

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3123:
--
Labels: supportability  (was: )

> tracing.html doesn't render on newer browsers
> -
>
> Key: KUDU-3123
> URL: https://issues.apache.org/jira/browse/KUDU-3123
> Project: Kudu
>  Issue Type: Bug
>  Components: ui
>Reporter: Andrew Wong
>Priority: Major
>  Labels: supportability
>
> I tried opening the tracing.html page using Google Chrome Version 
> 81.0.4044.138, and the page was blank. Upon inspecting, seems like Chrome no 
> longer supports {{registerElement}}
> {code:java}
> tracing.js:31 Uncaught TypeError: document.registerElement is not a function
> at tracing.js:31
> at tracing.js:31 {code}
> This was reported to the Chromium project as 
> [https://bugs.chromium.org/p/chromium/issues/detail?id=1036492], which has 
> been closed. We should update the trace viewer version in thirdparty to 
> include whatever fixes are necessary, since tracing is pretty valuable in a 
> pinch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3118) Validate --tserver_enforce_access_control is set when authorization is enabled in Master

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3118:
--
Component/s: security
 authz

> Validate --tserver_enforce_access_control is set when authorization is 
> enabled in Master 
> -
>
> Key: KUDU-3118
> URL: https://issues.apache.org/jira/browse/KUDU-3118
> Project: Kudu
>  Issue Type: Task
>  Components: authz, security
>Reporter: Hao Hao
>Priority: Minor
>
> As mentioned in the code review 
> [https://gerrit.cloudera.org/c/15897/1/docs/security.adoc#476], it would be 
> nice to add some validation (maybe in ksck or something) that this is set if 
> fine-grained authorization is enabled on the master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3111) Make IWYU processes freestanding headers

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3111:
--
Component/s: build

> Make IWYU processes freestanding headers
> 
>
> Key: KUDU-3111
> URL: https://issues.apache.org/jira/browse/KUDU-3111
> Project: Kudu
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0, 
> 1.11.1
>Reporter: Alexey Serbin
>Priority: Major
>
> When working out of the compilation database, IWYU processes only associated 
> headers, i.e. {{.h}} files that pair corresponding {{.cc}} files.   It would 
> be nice to make IWYU processing so-called freestanding header files.  [This 
> thread|https://github.com/include-what-you-use/include-what-you-use/issues/268]
>  contains very useful information on the topic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3109) Log administrative operations

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3109:
--
Component/s: security

> Log administrative operations
> -
>
> Key: KUDU-3109
> URL: https://issues.apache.org/jira/browse/KUDU-3109
> Project: Kudu
>  Issue Type: Task
>  Components: security
>Reporter: Attila Bukor
>Priority: Minor
>
> Sometimes it's impossible to determine what caused an issue when 
> administrators run unsafe commands on the cluster. Logging these in an audit 
> log would help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3090) Add owner concept in Kudu

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3090:
--
Component/s: security
 authz

> Add owner concept in Kudu
> -
>
> Key: KUDU-3090
> URL: https://issues.apache.org/jira/browse/KUDU-3090
> Project: Kudu
>  Issue Type: New Feature
>  Components: authz, security
>Reporter: Hao Hao
>Assignee: Attila Bukor
>Priority: Major
>  Labels: roadmap-candidate
>
> As mentioned in the Ranger integration design doc, Ranger supports ownership 
> privilege by creating a default policy that allows \{OWNER} of a resource to 
> access it without creating additional policy manually. Unless Kudu actually 
> has a full support for owner, ownership privilege is not possible with Ranger 
> integration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KUDU-3091) Support ownership privilege with Ranger

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KUDU-3091:
-

Assignee: Attila Bukor

> Support ownership privilege with Ranger
> ---
>
> Key: KUDU-3091
> URL: https://issues.apache.org/jira/browse/KUDU-3091
> Project: Kudu
>  Issue Type: Task
>Reporter: Hao Hao
>Assignee: Attila Bukor
>Priority: Major
>
> Currently, ownership privilege in Ranger is not available as Kudu has no 
> concept of owner, and does not store owner information internally. It would 
> be nice to enable it once Kudu introduces owner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3091) Support ownership privilege with Ranger

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3091:
--
Component/s: security
 ranger
 authz

> Support ownership privilege with Ranger
> ---
>
> Key: KUDU-3091
> URL: https://issues.apache.org/jira/browse/KUDU-3091
> Project: Kudu
>  Issue Type: Task
>  Components: authz, ranger, security
>Reporter: Hao Hao
>Assignee: Attila Bukor
>Priority: Major
>
> Currently, ownership privilege in Ranger is not available as Kudu has no 
> concept of owner, and does not store owner information internally. It would 
> be nice to enable it once Kudu introduces owner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3089) ERROR when running tests on ARM64 server with TSAN or ASAN enabled

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3089:
--
Component/s: test

> ERROR when running tests on ARM64 server with TSAN or ASAN enabled
> --
>
> Key: KUDU-3089
> URL: https://issues.apache.org/jira/browse/KUDU-3089
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: liusheng
>Priority: Major
>
>  
> Fow now I am trying to build and test Kudu on ARM server, for Debug build 
> type, the building process and almost all tests can pass. but for TSAN and 
> ASAN building type, I can build sucessfully, but all the test cases will 
> raise follow error.
> {code:java}
> root@kudu-asan2:/opt/kudu/build/asan# bin/kudu-ts-cli-test
> AddressSanitizer:DEADLYSIGNAL
> =
> ==14378==ERROR: AddressSanitizer: SEGV on unknown address 0x (pc 
> 0x bp 0xc2649d10 sp 0xc2649d10 T0)
> ==14378==Hint: pc points to the zero page.
> ==14378==The signal is caused by a READ memory access.
> ==14378==Hint: address points to the zero page.AddressSanitizer can not 
> provide additional info.
> SUMMARY: AddressSanitizer: SEGV () 
> ==14378==ABORTING
> {code}
> I have struggle for a while but no progress, could anyone help to give any 
> suggestion ? thanks a lot!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3084) Multiple time sources with fallback behavior between them

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3084:
--
Labels: clock roadmap-candidate usability  (was: clock)

> Multiple time sources with fallback behavior between them
> -
>
> Key: KUDU-3084
> URL: https://issues.apache.org/jira/browse/KUDU-3084
> Project: Kudu
>  Issue Type: Improvement
>  Components: master, tserver
>Reporter: Alexey Serbin
>Priority: Major
>  Labels: clock, roadmap-candidate, usability
>
> [~tlipcon] suggested an alternative approach to configure and select 
> HybridClock's time source.
> Kudu servers could maintain multiple time sources and switch between them 
> with a fallback behavior.  The default or preferred time source might be any 
> of the existing ones (e.g., the built-in client), but when it's not 
> available, another available time source is selected (e.g., {{system}} -- the 
> NTP-synchronized local clock).  Switching between time sources can be done:
> * only upon startup/initialization
> * upon startup/initialization and later during normal run time
> The advantages are:
> * easier deployment and configuration of Kudu clusters
> * simplified upgrade path from older releases using {{system}} time source to 
> newer releases using {{builtin}} time source by default
> There are downsides, though.  Since the new way of maintaining time source is 
> more dynamic, it can:
> * mask various configuration or network issues
> * result in different time source within the same Kudu cluster due to 
> transient issues
> * introduce extra startup delay



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3077) Have client scanners prune the default projection based on the contents of their authz tokens

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3077:
--
Labels: usability  (was: )

> Have client scanners prune the default projection based on the contents of 
> their authz tokens
> -
>
> Key: KUDU-3077
> URL: https://issues.apache.org/jira/browse/KUDU-3077
> Project: Kudu
>  Issue Type: Improvement
>  Components: client, security
>Reporter: Andrew Wong
>Priority: Major
>  Labels: usability
>
> Today, if a scan is sent that contains a column that, per the sender's authz 
> token, the sender isn't authorized to see, the entire scan is rejected. This 
> is all well and good, but users may not be privy to what columns they are or 
> aren't allowed to scan. So, when the default projection is used (which scans 
> all columns), the scan is bound to be rejected if there are any privilege 
> restrictions.
> It'd be significantly more user-friendly if clients opaquely pruned the 
> default projection of unauthorized columns so that (assuming the authz token 
> is valid) default scans always succeed with just the columns the user is 
> authorized to see.
> Special care should be taken for if the user has no column privileges though; 
> passing an empty projection is taken to return the count of rows (which 
> requires the same privileges as {{COUNT(*)}} which requires the same 
> privileges as {{SELECT(*)}}, i.e. {{SELECT ON TABLE}}) rather than an empty 
> set of rows. In such a case, clients should probably fail immediately, since 
> there are no table privileges an no column privileges in the authz token so 
> any scan would be bound to fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3073) BuiltinNtpWithMiniChronydTest.SyncAndUnsyncReferenceServers sometimes fails

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3073:
--
Component/s: test

> BuiltinNtpWithMiniChronydTest.SyncAndUnsyncReferenceServers sometimes fails
> ---
>
> Key: KUDU-3073
> URL: https://issues.apache.org/jira/browse/KUDU-3073
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.12.0
>Reporter: Alexey Serbin
>Priority: Major
> Attachments: ntp-test.txt.xz
>
>
> {noformat}
> src/kudu/clock/ntp-test.cc:478: Failure
> Value of: s.IsRuntimeError()  
>   
>   Actual: false   
>   
> Expected: true
>   
> OK
>   
> src/kudu/clock/ntp-test.cc:595: Failure
> Expected: CheckNoNtpSource(sync_servers_refs) doesn't generate new fatal 
> failures in the current thread.
>   Actual: it does. 
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3076) Add a Kudu cli for granting/revoking Ranger privileges

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3076:
--
Component/s: security
 ops-tooling
 ranger

> Add a Kudu cli for granting/revoking Ranger privileges
> --
>
> Key: KUDU-3076
> URL: https://issues.apache.org/jira/browse/KUDU-3076
> Project: Kudu
>  Issue Type: Task
>  Components: ops-tooling, ranger, security
>Reporter: Hao Hao
>Priority: Major
>
> Even though Ranger has a GUI for policies management (and can be accessed via 
> REST API), it probably will be more user friendly to have a Kudu cli tool for 
> granting and revoking privileges.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3060) Add a tool to identify potential performance bottlenecks

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3060:
--
Labels: roadmap-candidate  (was: )

> Add a tool to identify potential performance bottlenecks
> 
>
> Key: KUDU-3060
> URL: https://issues.apache.org/jira/browse/KUDU-3060
> Project: Kudu
>  Issue Type: Improvement
>  Components: CLI, perf, ui
>Reporter: Andrew Wong
>Priority: Major
>  Labels: roadmap-candidate
>
> When we hear users wondering why their workloads are slower than expected, 
> some common questions arise. It'd be great if we had a single tool (or a 
> single webpage) that aggregated and displayed useful information for a 
> specific tablet or table. Things like, for a specific table:
> - How many partitions and replicas exist for the table.
> - For those replicas, how they are distributed across tablet servers.
> - For those tablet servers, what the block cache configuration is, and what 
> the current block cache stats (hit ratio, evictions, etc) are.
> - For those tablet servers, which tablets have been written to recently.
> - For those tablet servers, which tablets within the target table have been 
> written to recently.
> - For those tablet servers, how many active and non-expired scanners exist.
> - For those tablet servers, which tablets within the target table have been 
> read from recently.
> - For those tablet servers, how many ongoing tablet copies there are both to 
> and from the server.
> - For those tablet servers, how many data directories there are.
> - For the data directories on those tablet servers, how many replicas are 
> spreading data in each directory, how many blocks there are in each, and how 
> much space is available in each.
> The list could go on and on. It probably makes sense to break the diagnostics 
> into different phases or goals, maybe along the lines of 1) identifying 
> hotspots of workloads and lag across tablet servers (e.g. a ton of writes 
> going to a single tserver), and 2) digging into a single tablet server to 
> understand how it's provisioned and whether that provisioning is sufficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3055) Lazily open cold tablets

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3055:
--
Labels: scalability  (was: )

> Lazily open cold tablets
> 
>
> Key: KUDU-3055
> URL: https://issues.apache.org/jira/browse/KUDU-3055
> Project: Kudu
>  Issue Type: New Feature
>  Components: master, tablet, tserver
>Reporter: Andrew Wong
>Priority: Major
>  Labels: scalability
>
> It might be useful in larger deployments to have the ability to lazily 
> bootstrap cold tablets.
> Currently, WAL replay consumes a significant amount of bootstrapping time. If 
> we know certain tablets are read infrequently, we ought to be able to 
> indicate that we only want to bootstrap and replay the WALs for tablets that 
> have been accessed recently.
> [This patch|https://github.com/apache/kudu/commit/ca957fb] gave us a metric 
> for hotness and coldness at the replica level -- we might want to consider 
> aggregating this on the master to determine what partitions are hot and cold, 
> and have the master signal to the appropriate tablet servers upon registering 
> that certain replicas should be bootstrapped. We might also consider 
> bootstrapping when a client first calls {{OpenTable()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3054) Init kudu.write_duration accumulator lazily

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-3054.
---
Fix Version/s: NA
   Resolution: Duplicate

> Init kudu.write_duration accumulator lazily
> ---
>
> Key: KUDU-3054
> URL: https://issues.apache.org/jira/browse/KUDU-3054
> Project: Kudu
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 1.9.0
>Reporter: liupengcheng
>Priority: Major
> Fix For: NA
>
> Attachments: durationHisto_large.png, durationhisto.png, 
> read_kudu_and_shuffle.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, we encountered a issue in kudu-spark that will causing spark sql 
> query failure:
> ```
> Job aborted due to stage failure: Total size of serialized results of 942 
> tasks (2.0 GB) is bigger than spark.driver.maxResultSize (2.0 GB)
> ```
> After carefully debug, we find out that it's the kudu.write_duration 
> accumulators causing single spark task larger than 2M, thus all tasks size of 
> the stage will bigger than the limit.
> However, this stage is just reading kudu table and do shuffle exchange, no 
> writing any kudu tables.
> So I think should init this accumulator lazily in KuduContext to avoid such 
> issues.
> !https://issues.apache.org/jira/secure/attachment/12993451/durationHisto_large.png!
>  
> !https://issues.apache.org/jira/secure/attachment/12993452/durationhisto.png!
> !https://issues.apache.org/jira/secure/attachment/12993453/read_kudu_and_shuffle.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3041) Kudu Java client shade is incomplete

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3041:
--
Component/s: build

> Kudu Java client shade is incomplete
> 
>
> Key: KUDU-3041
> URL: https://issues.apache.org/jira/browse/KUDU-3041
> Project: Kudu
>  Issue Type: Bug
>  Components: build, client
>Affects Versions: 1.11.1
>Reporter: Ismaël Mejía
>Priority: Major
>
> While working on an update of the Kudu integration on Apache Beam BEAM-5086 
> We found this issue. We use [tool to test for linkage 
> errors|https://github.com/GoogleCloudPlatform/cloud-opensource-java] and it 
> reports the classes that are missing but required by other classes.
> This is the result for the kudu-client case:
> {code:java}
> Class javax.servlet.ServletOutputStream is not found;
>  referenced by 1 class file
>  
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet
>  (kudu-client-1.11.1.jar)
> Class javax.servlet.http.HttpServlet is not found;
>  referenced by 1 class file
>  
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet
>  (kudu-client-1.11.1.jar)
> Class javax.servlet.ServletException is not found;
>  referenced by 1 class file
>  
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet
>  (kudu-client-1.11.1.jar)
> Class javax.servlet.ServletConfig is not found;
>  referenced by 1 class file
>  
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet
>  (kudu-client-1.11.1.jar)
> Class javax.servlet.http.HttpServletRequest is not found;
>  referenced by 1 class file
>  
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet
>  (kudu-client-1.11.1.jar)
> Class javax.servlet.http.HttpServletResponse is not found;
>  referenced by 1 class file
>  
> org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet
>  (kudu-client-1.11.1.jar)
> Class org.jboss.marshalling.ByteInput is not found;
>  referenced by 4 class files
>  
> org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.ChannelBufferByteInput
>  (kudu-client-1.11.1.jar)
>  
> org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.LimitingByteInput
>  (kudu-client-1.11.1.jar)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.ChannelBufferByteInput
>  (beam-vendor-grpc-1_21_0-0.1.jar)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.LimitingByteInput
>  (beam-vendor-grpc-1_21_0-0.1.jar)
> Class org.jboss.marshalling.ByteOutput is not found;
>  referenced by 2 class files
>  
> org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.ChannelBufferByteOutput
>  (kudu-client-1.11.1.jar)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.ChannelBufferByteOutput
>  (beam-vendor-grpc-1_21_0-0.1.jar)
> Class org.jboss.marshalling.Unmarshaller is not found;
>  referenced by 8 class files
>  
> org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.CompatibleMarshallingDecoder
>  (kudu-client-1.11.1.jar)
>  
> org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.ContextBoundUnmarshallerProvider
>  (kudu-client-1.11.1.jar)
>  
> org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.MarshallingDecoder
>  (kudu-client-1.11.1.jar)
>  
> org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.ThreadLocalUnmarshallerProvider
>  (kudu-client-1.11.1.jar)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.MarshallingDecoder
>  (beam-vendor-grpc-1_21_0-0.1.jar)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.CompatibleMarshallingDecoder
>  (beam-vendor-grpc-1_21_0-0.1.jar)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.ThreadLocalUnmarshallerProvider
>  (beam-vendor-grpc-1_21_0-0.1.jar)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.ContextBoundUnmarshallerProvider
>  (beam-vendor-grpc-1_21_0-0.1.jar)
> Class org.jboss.marshalling.Marshaller is not found;
>  referenced by 6 class files
>  
> org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.CompatibleMarshallingEncoder
>  (kudu-client-1.11.1.jar)
>  
> org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.MarshallingEncoder
>  (kudu-client-1.11.1.jar)
>  
> org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.ThreadLocalMarshallerProvider
>  (kudu-client-1.11.1.jar)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.CompatibleMarshallingEncoder
>  (beam-vendor-grpc-1_21_0-0.1.jar)
>  
> org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.MarshallingEncoder
>  (beam-vendor-grpc-1_21_0-0.1.jar)
>  
> org.apache.beam.vend

[jira] [Updated] (KUDU-3037) HMS notification log listener runs in follower masters

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3037:
--
Labels: trivial  (was: )

> HMS notification log listener runs in follower masters
> --
>
> Key: KUDU-3037
> URL: https://issues.apache.org/jira/browse/KUDU-3037
> Project: Kudu
>  Issue Type: Improvement
>  Components: hms, master
>Affects Versions: 1.11.1
>Reporter: Adar Dembo
>Priority: Major
>  Labels: trivial
>
> Besides wasting resources, this also emits the same log message every second:
> {noformat}
> I0108 21:05:07.748235  1443 hms_notification_log_listener.cc:227] Skipping 
> Hive Metastore notification log poll: Illegal state: Not the leader. Local 
> UUID: c77bd888023149729481b7fd041b5c83, Raft Consensus state: current_term: 6 
> committed_config { opid_index: -1 OBSOLETE_local: false peers { 
> permanent_uuid: "12e9881f0c094a38b83c3b00f71a0ef1" member_type: VOTER 
> last_known_addr { host: "127.0.60.126" port: 43289 } } peers { 
> permanent_uuid: "c77bd888023149729481b7fd041b5c83" member_type: VOTER 
> last_known_addr { host: "127.0.60.125" port: 38761 } } peers { 
> permanent_uuid: "c362d38459074310a6fbd3da10153538" member_type: VOTER 
> last_known_addr { host: "127.0.60.124" port: 34903 } } }
> {noformat}
> We could throttle the log message, though perhaps the more complete solution 
> is to only activate the log listener in the leader master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3034) kudu 查询性能问题,39万条数据,查询要89秒

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-3034.
---
Fix Version/s: NA
   Resolution: Incomplete

> kudu 查询性能问题,39万条数据,查询要89秒
> -
>
> Key: KUDU-3034
> URL: https://issues.apache.org/jira/browse/KUDU-3034
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.7.1
>Reporter: SeaAndHill
>Priority: Major
> Fix For: NA
>
> Attachments: memory.jpg, query.jpg, threads.jpg
>
>
> 配置参考截图,设置了50G内存,数据占用大小是9M,但是查询需要89秒



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3033) Add min/max values for the non-primary key columns in the metadata of rowsets/datablocks

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3033:
--
Component/s: perf

> Add min/max values for the non-primary key columns in the metadata of 
> rowsets/datablocks
> 
>
> Key: KUDU-3033
> URL: https://issues.apache.org/jira/browse/KUDU-3033
> Project: Kudu
>  Issue Type: New Feature
>  Components: cfile, perf, tablet
>Reporter: LiFu He
>Priority: Major
>
> It's possible to add min/max values for the non-primary key columns in the 
> metadata of diskrowset/datablock, and then we can skip decoding/evaluating 
> the unnecessary diskrowset/datablock while scanning. Just like the "compute 
> stats" feature on impala, and the only difference is that kudu supports 
> updates. So, the min/max values should be invalid if the columns that have 
> deltas while scanning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3024) kudu-master: scale test for high number of simultanenous updates on tables metadata

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3024:
--
Component/s: test
 perf

> kudu-master: scale test for high number of simultanenous updates on tables 
> metadata
> ---
>
> Key: KUDU-3024
> URL: https://issues.apache.org/jira/browse/KUDU-3024
> Project: Kudu
>  Issue Type: Test
>  Components: perf, test
>Reporter: Alexey Serbin
>Priority: Major
>
> It would be nice to have a scale test to verify how Kudu masters behave when 
> there is a high number of almost simultaneous updates in tables' metadata 
> (e.g., ALTER table to add a new partition, add new columns, drop columns, 
> etc.).  At least, it would be able to spot issues like KUDU-3016.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3024) kudu-master: scale test for high number of simultanenous updates on tables metadata

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3024:
--
Labels: benchmarks  (was: )

> kudu-master: scale test for high number of simultanenous updates on tables 
> metadata
> ---
>
> Key: KUDU-3024
> URL: https://issues.apache.org/jira/browse/KUDU-3024
> Project: Kudu
>  Issue Type: Test
>  Components: perf, test
>Reporter: Alexey Serbin
>Priority: Major
>  Labels: benchmarks
>
> It would be nice to have a scale test to verify how Kudu masters behave when 
> there is a high number of almost simultaneous updates in tables' metadata 
> (e.g., ALTER table to add a new partition, add new columns, drop columns, 
> etc.).  At least, it would be able to spot issues like KUDU-3016.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3025) Add metric for the open file descriptors usage vs the limit

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3025:
--
Component/s: metrics

> Add metric for the open file descriptors usage vs the limit
> ---
>
> Key: KUDU-3025
> URL: https://issues.apache.org/jira/browse/KUDU-3025
> Project: Kudu
>  Issue Type: Improvement
>  Components: master, metrics, tserver
>Reporter: Alexey Serbin
>Priority: Major
>  Labels: Availability, observability, scalability
>
> In the case of even replica distribution across all available nodes, once one 
> tablet server hits the maximum number of open file descriptors and go down 
> (e.g., upon hosting another tablet replica), the system will automatically 
> re-replicate tablet replicas from the tablet server, most likely bringing 
> other tablet servers down as well.  That's a cascading failure scenario that 
> nobody wants to experience.
> Monitoring the number of open file descriptors vs the limit can help to 
> prevent full Kudu cluster outage in such case, if operators are given a 
> chance to handle those situations proactively.  Once some threshold is 
> reached (e.g., 90%), an operator could update the limit via corresponding 
> {{ulimit}} setting, preventing an outage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3020) Add metric gauges for the size of incoming RPC request payload and number of RPC rejections due to payload size

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3020:
--
Component/s: metrics

> Add metric gauges for the size of incoming RPC request payload and number of 
> RPC rejections due to payload size
> ---
>
> Key: KUDU-3020
> URL: https://issues.apache.org/jira/browse/KUDU-3020
> Project: Kudu
>  Issue Type: Improvement
>  Components: master, metrics, tserver
>Reporter: Alexey Serbin
>Priority: Major
>  Labels: guidelines, observability, scalability, troubleshooting
>
> Kudu servers have a limit on the size of RPC's payload they accept: 
> {{\-\-rpc_max_message_size}}.
> It would be nice to introduce corresponding metrics to gauge how relevant the 
> current setting for the maximum RPC size is with regard to the incoming 
> requests.  That can help with pro-active tuning of a Kudu cluster to sustain 
> a planned increase in workload.  This is especially useful for tuning 
> parameters of Kudu masters to accommodate higher number of tables/tablets in 
> a cluster (e.g., adding new tables or creating new partitions for already 
> existing tables).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3018) Add tests for CLI tools

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-3018.
---
Fix Version/s: NA
   Resolution: Fixed

> Add tests for CLI tools
> ---
>
> Key: KUDU-3018
> URL: https://issues.apache.org/jira/browse/KUDU-3018
> Project: Kudu
>  Issue Type: Task
>Reporter: Attila Bukor
>Priority: Major
> Fix For: NA
>
>
> Currently the CLI tools are barely tested so it's easy to miss something when 
> introducing new features that won't be supported by the tooling or 
> potentially breaking the tooling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3021) Add metric gauges for the size of transactions applied to tablets and number of rejected transactions due to their size

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3021:
--
Component/s: metrics

> Add metric gauges for the size of transactions applied to tablets and number 
> of rejected transactions due to their size
> ---
>
> Key: KUDU-3021
> URL: https://issues.apache.org/jira/browse/KUDU-3021
> Project: Kudu
>  Issue Type: Improvement
>  Components: master, metrics, tserver
>Reporter: Alexey Serbin
>Assignee: ZhangYao
>Priority: Major
>  Labels: observability, scalability, troubleshooting
>
> Kudu servers have a limit on the size of transaction applied to a tablet: 
> {{\-\-tablet_transaction_memory}}
> It would be nice to introduce corresponding metrics to gauge how relevant the 
> current setting is for the actual size of transactions applied in a running 
> Kudu cluster. That can help with pro-active tuning of a Kudu cluster to 
> sustain a planned increase in workload.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3013) Race in StopTabletITest.TestStoppedTabletsDontWrite

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3013:
--
Component/s: test

> Race in StopTabletITest.TestStoppedTabletsDontWrite
> ---
>
> Key: KUDU-3013
> URL: https://issues.apache.org/jira/browse/KUDU-3013
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: LiFu He
>Priority: Major
> Attachments: 
> jenkins-slave.1575252039.26703.311237e4f4a39e5fea3b175fbf12d3e4aa8674dc.81.0-artifacts.zip
>
>
> I met this issue on Jenkins this morning, and it seems there is a race in 
> StopTabletITest.TestStoppedTabletsDontWrite.
> {code:java}
> // code placeholder
> TransactionDriver::ApplyTask()Tablet::Stop()
>   | |
>   transaction_->Apply() |
>   | |
> tablet->ApplyRowOperations(state()) |
> (RESERVED -> APPLYING)  |
>   | |
>  StartApplying(tx_state);   |
>   |  
> set_state_unlocked(kStopped);
>   ApplyRowOperation()   |
>   | |
> CheckHasNotBeenStoppedUnlocked()|
> (return error since the tablet has been stopped)|
>   | |
> HandleFailure(s)|
>   | |
> transaction_->Finish(Transaction::ABORTED); |
>   | |
> state()->CommitOrAbort(result); |
>   | |
> ReleaseMvccTxn(result); |
>   | |
> mvcc_tx_->Abort();  |
>   | |
> manager_->AbortTransaction(timestamp_); |
>   | |
> if (PREDICT_FALSE(!is_open()))  |
>   | mvcc_.Close();
>   | |
>   |   open_.store(false);
> CHECK_EQ(old_state, RESERVED)   |
>(ASSERT failed)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3012) Throttle "Applying an operation in a closed session" warning log

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3012:
--
Labels: newbie trivial  (was: newbie)

> Throttle "Applying an operation in a closed session" warning log
> 
>
> Key: KUDU-3012
> URL: https://issues.apache.org/jira/browse/KUDU-3012
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 1.9.0
>Reporter: Grant Henke
>Priority: Major
>  Labels: newbie, trivial
>
> In NIFI-6895 it was reported that the log warning about applying an operation 
> in a closed session is occurring millions of times. Of course the NiFi 
> integration should fix the underlying issue, but we should throttle this 
> message to ensure it doesn't overflow logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3003) TestAsyncKuduSession.testTabletCacheInvalidatedDuringWrites is flaky

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3003:
--
Component/s: test

> TestAsyncKuduSession.testTabletCacheInvalidatedDuringWrites is flaky
> 
>
> Key: KUDU-3003
> URL: https://issues.apache.org/jira/browse/KUDU-3003
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: Hao Hao
>Priority: Minor
> Attachments: test-output.txt
>
>
> testTabletCacheInvalidatedDuringWrites of the 
> org.apache.kudu.client.TestAsyncKuduSession test sometimes fails with an 
> error like below. I attached full test log.
> {noformat}
> There was 1 failure:
> 1) 
> testTabletCacheInvalidatedDuringWrites(org.apache.kudu.client.TestAsyncKuduSession)
> org.apache.kudu.client.PleaseThrottleException: all buffers are currently 
> flushing
>   at 
> org.apache.kudu.client.AsyncKuduSession.apply(AsyncKuduSession.java:579)
>   at 
> org.apache.kudu.client.TestAsyncKuduSession.testTabletCacheInvalidatedDuringWrites(TestAsyncKuduSession.java:371)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2988) built-in NTP client: sometimes minichronyd fails to start with address already in use

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2988:
--
Component/s: test

> built-in NTP client: sometimes minichronyd fails to start with address 
> already in use
> -
>
> Key: KUDU-2988
> URL: https://issues.apache.org/jira/browse/KUDU-2988
> Project: Kudu
>  Issue Type: Sub-task
>  Components: clock, ntp-client, test
>Affects Versions: 1.11.0
>Reporter: Adar Dembo
>Priority: Major
>  Labels: clock
>
> From time to time some tests that use the built-in NTP client and MiniChronyd 
> fail. The failure usually looks like this:
> {noformat}
> I1013 22:02:28.429188 23364 hybrid_clock.cc:162] waiting up to 
> --ntp_initial_sync_wait_secs=10 seconds for the clock to synchronize
> I1013 22:02:38.430480 23364 builtin_ntp.cc:552] server 127.16.18.212:42817: 
> addresses=127.16.18.212:42817 current_address=127.16.18.212:42817 
> i_pkt_total_num=0 i_pkt_valid_num=0 o_pkt_total_num=20 o_pkt_timedout_num=14
> is_synchronized=false
> last_mono=0
> last_wall=0
> last_error=0
> now_mono=264459136130
> F1013 22:02:38.430524 23364 master_main.cc:105] Check failed: _s.ok() Bad 
> status: Service unavailable: Cannot initialize clock: timed out waiting for 
> clock synchronisation: wallclock is not synchronized: no valid NTP responses 
> yet
> *** Check failure stack trace: ***
> {noformat}
> I've also seen one failure like this:
> {noformat}
> [ RUN  ] BlockManagerType/TsRecoveryITest.TestCrashDuringLogReplay/0
> 2019-11-01T00:37:31Z chronyd version 3.5 starting (+CMDMON +NTP +REFCLOCK 
> +RTC -PRIVDROP -SCFILTER -SIGND +ASYNCDNS -SECHASH -IPV6 +DEBUG)
> 2019-11-01T00:37:32Z Disabled control of system clock
> Could not open connection to daemon
> W1101 00:37:32.553658   235 mini_chronyd.cc:189] Time spent starting chronyd: 
> real 1.110s user 0.000s sys 0.011s
> 2019-11-01T00:37:32Z chronyd exiting
> /home/jenkins-slave/workspace/kudu-master/2/src/kudu/integration-tests/external_mini_cluster-itest-base.cc:66:
>  Failure
> Failed
> Bad status: Timed out: failed to start NTP server 0: failed to contact 
> chronyd in 1.000s
> /home/jenkins-slave/workspace/kudu-master/2/src/kudu/integration-tests/ts_recovery-itest.cc:428:
>  Failure
> Expected: StartClusterOneTs({ "--fault_crash_during_log_replay=0.05" }) 
> doesn't generate new fatal failures in the current thread.
>   Actual: it does.
> I1101 00:37:32.564697   235 external_mini_cluster-itest-base.cc:80] Found 
> fatal failure
> I1101 00:37:32.566167   235 test_util.cc:136] 
> ---
> I1101 00:37:32.566222   235 test_util.cc:137] Had fatal failures, leaving 
> test files at 
> /tmp/dist-test-taskBEeRDj/test-tmp/ts_recovery-itest.0.BlockManagerType_TsRecoveryITest.TestCrashDuringLogReplay_0.1572568627695135-235
> [  FAILED  ] BlockManagerType/TsRecoveryITest.TestCrashDuringLogReplay/0, 
> where GetParam() = "file" (1125 ms)
> {noformat}
> In the first case it's pretty odd that despite the fact that MiniChronyd 
> failed to bind to a socket, we managed to convince ourselves that it was up 
> and running. My running theory is that chronyd doesn't exit when this happens 
> and still listens on the UNIX domain socket for our "server stats" query, 
> which then convinces us that chronyd is alive and well.
> Anyway, this is the cause of some test flakiness, so we should look into it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2985) CreateTableITest.TestSpreadReplicasEvenlyWithDimension scenario is flaky

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2985:
--
Component/s: test

> CreateTableITest.TestSpreadReplicasEvenlyWithDimension scenario is flaky
> 
>
> Key: KUDU-2985
> URL: https://issues.apache.org/jira/browse/KUDU-2985
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: Alexey Serbin
>Priority: Minor
> Attachments: create-table-itest.txt.xz
>
>
> Sometimes the {{CreateTableITest.TestSpreadReplicasEvenlyWithDimension}} 
> scenario fails (RELEASE build):
> {noformat}
> I1024 19:34:26.842279  7231 create-table-itest.cc:357] stddev = 3.03315   
>   
> src/kudu/integration-tests/create-table-itest.cc:358: Failure
> Expected: (stddev) <= (3.0), actual: 3.03315 vs 3  
> {noformat}
> Full log file is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2984) memory_gc-itest is flaky

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2984:
--
Component/s: test

> memory_gc-itest is flaky
> 
>
> Key: KUDU-2984
> URL: https://issues.apache.org/jira/browse/KUDU-2984
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.11.0, 1.12.0
>Reporter: Alexey Serbin
>Assignee: Yingchun Lai
>Priority: Minor
> Attachments: memory_gc-itest.txt.xz
>
>
> The {{memory_gc-itest}} fails time to time with the following error message 
> (DEBUG build):
> {noformat}
> src/kudu/integration-tests/memory_gc-itest.cc:117: Failure
> Expected: (ratio) >= (0.1), actual: 0.0600604 vs 0.1
> tserver-2
> src/kudu/util/test_util.cc:339: Failure
> Failed
> Timed out waiting for assertion to pass.
> {noformat}
> The full log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2975) Spread WAL across multiple data directories

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2975:
--
Labels: roadmap-candidate scalability  (was: )

> Spread WAL across multiple data directories
> ---
>
> Key: KUDU-2975
> URL: https://issues.apache.org/jira/browse/KUDU-2975
> Project: Kudu
>  Issue Type: New Feature
>  Components: fs, perf, tablet, tserver
>Reporter: LiFu He
>Assignee: YangSong
>Priority: Major
>  Labels: roadmap-candidate, scalability
> Attachments: network.png, tserver-WARNING.png, util.png
>
>
> Recently, we deployed a new kudu cluster and every node has 12 SSD. Then, we 
> created a big table and loaded data to it through flink.  We noticed that the 
> util of one SSD which is used to store WAL is 100% but others are free. So, 
> we suggest to spread WAL across multiple data directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2975) Spread WAL across multiple data directories

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2975:
--
Component/s: perf

> Spread WAL across multiple data directories
> ---
>
> Key: KUDU-2975
> URL: https://issues.apache.org/jira/browse/KUDU-2975
> Project: Kudu
>  Issue Type: New Feature
>  Components: fs, perf, tablet, tserver
>Reporter: LiFu He
>Assignee: YangSong
>Priority: Major
> Attachments: network.png, tserver-WARNING.png, util.png
>
>
> Recently, we deployed a new kudu cluster and every node has 12 SSD. Then, we 
> created a big table and loaded data to it through flink.  We noticed that the 
> util of one SSD which is used to store WAL is 100% but others are free. So, 
> we suggest to spread WAL across multiple data directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2969) UDF support for scans

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2969:
--
Labels: roadmap-candidate  (was: )

> UDF support for scans
> -
>
> Key: KUDU-2969
> URL: https://issues.apache.org/jira/browse/KUDU-2969
> Project: Kudu
>  Issue Type: New Feature
>  Components: tablet, tserver
>Affects Versions: 1.11.0
>Reporter: Adar Dembo
>Priority: Major
>  Labels: roadmap-candidate
>
> It would be nice if Kudu supported some form of user-defined functions (UDFs) 
> for use in scans. These could be used for custom comparisons, or for 
> comparisons between multiple columns (rather than between columns and 
> constant values).
> Impala supports [both Java-based and native 
> UDFs|https://impala.apache.org/docs/build/html/topics/impala_udf.html]. We 
> could explore doing something similar, or an approach with an IR like 
> [Weld|https://www.weld.rs].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2958) ClientTest.TestReplicatedTabletWritesWithLeaderElection is flaky

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2958:
--
Component/s: test

> ClientTest.TestReplicatedTabletWritesWithLeaderElection is flaky
> 
>
> Key: KUDU-2958
> URL: https://issues.apache.org/jira/browse/KUDU-2958
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.11.0
>Reporter: Alexey Serbin
>Assignee: ZhangYao
>Priority: Major
> Attachments: client-test.5.txt.xz
>
>
> The {{TestReplicatedTabletWritesWithLeaderElection}} of the {{client-test}} 
> is flaky.  Time to time in ASAN build configuration it fails with the 
> following error:
> {noformat}
> I0924 20:26:19.869351 14037 client-test.cc:4304] Counting rows... 
>   
> src/kudu/client/client-test.cc:4308: Failure
>   Expected: 2 * kNumRowsToWrite   
>   
>   Which is: 200   
>   
> To be equal to: CountRowsFromClient(table.get(), KuduClient::FIRST_REPLICA, 
> KuduScanner::READ_LATEST, kNoBound, kNoBound)
>   Which is: 100 
> {noformat}
> It seems there is implicit assumption in the test about fast propagation of 
> Raft transactions to follower replicas.
> I attached the full log of the failed tests scenario.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2957) In HmsSentryConfigurations/MasterStressTest kudu-master crashes with on out-of-order notification log event

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2957.
---
Fix Version/s: NA
   Resolution: Won't Fix

The Sentry integration is removed.

> In HmsSentryConfigurations/MasterStressTest kudu-master crashes with on 
> out-of-order notification log event 
> 
>
> Key: KUDU-2957
> URL: https://issues.apache.org/jira/browse/KUDU-2957
> Project: Kudu
>  Issue Type: Bug
>  Components: hms, master, test
>Affects Versions: 1.11.0
>Reporter: Alexey Serbin
>Priority: Major
> Fix For: NA
>
> Attachments: master-stress-test.1.txt.xz
>
>
> The first relevant message is:
> {noformat}
> F0924 20:03:08.307613  1220 hms_notification_log_listener.cc:266] Received 
> out-of-order notification log event (last processed event ID: 22): 22 
> DROP_TABLE default.table_7dad03ec77524186956c5829457d06c7
> *** Check failure stack trace: ***
>   
> @ 0x7f63a55ae62d  google::LogMessage::Fail() at ??:0  
>   
> @ 0x7f63a55b064c  google::LogMessage::SendToLog() at ??:0 
>   
> @ 0x7f63a55ae189  google::LogMessage::Flush() at ??:0 
>   
> @ 0x7f63a55ae3a1  google::LogMessage::~LogMessage() at ??:0   
>   
> @ 0x7f63a70c0c4c  
> kudu::master::HmsNotificationLogListenerTask::Poll() at ??:0
> @ 0x7f63a70bff79  
> kudu::master::HmsNotificationLogListenerTask::RunLoop() at ??:0
> {noformat}
> In the DEBUG build, all three {{kudu-master}} process crashes and next 
> attempt to create a table times out:
> {noformat}
> F0924 20:08:32.662717  1116 master-stress-test.cc:297] Check failed: _s.ok() 
> Bad status: Timed out: Error creating table 
> default.Table_e221014b8c604ed0b635168473827877 on the master: CreateTable 
> timed out after deadline expired: CreateTable passed its deadline: Timed out: 
> ConnectToClusterRpc(addrs: 
> 127.0.58.126:41719,127.0.58.125:36217,127.0.58.124:33155, num_attempts: 772) 
> passed its deadline: Not found: no leader found: ConnectToClusterRpc(addrs: 
> 127.0.58.126:41719,127.0.58.125:36217,127.0.58.124:33155, num_attempts: 1)
> {noformat}
> I attached the full log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2942) A rare flaky test for the aggregated live row count

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2942:
--
Component/s: test

> A rare flaky test for the aggregated live row count
> ---
>
> Key: KUDU-2942
> URL: https://issues.apache.org/jira/browse/KUDU-2942
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: LiFu He
>Priority: Major
> Attachments: ts_tablet_manager-itest.txt
>
>
> A few days ago, Adar met a rare flaky test for the live row count in TSAN 
> mode.
>  
> {code:java}
> // code placeholder
> /home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/ts_tablet_manager-itest.cc:642
>       Expected: live_row_count
>       Which is: 327
> To be equal to: table_info->GetMetrics()->live_row_count->value()
>       Which is: 654
> {code}
> It seems the metric value is doubled. And his full test output is in the 
> attachment.
>  
> I reviewed the previous patches and made some unusual guesses. I think one of 
> them could explain the issue:
> When one master just becomes the leader and there are two heartbeat messages 
> from the same tserver that are processed in parallel at 
> [Line4239|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/master/catalog_manager.cc#L4239],
>  then the metric value will be doubled because the old tablet stats can be 
> accessed concurrently.
> Thus, the question becomes how to generate two heartbeat messages from the 
> same tserver at the same time? The possible answer is: [First heartbeat 
> message|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/integration-tests/ts_tablet_manager-itest.cc#L741]
>  and [Second heartbeat 
> message|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/integration-tests/ts_tablet_manager-itest.cc#L635]
> Please don't forget the above case is integrate test environment, not product.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2950) Support restarting nodes in batches

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2950:
--
Component/s: ops-tooling

> Support restarting nodes in batches
> ---
>
> Key: KUDU-2950
> URL: https://issues.apache.org/jira/browse/KUDU-2950
> Project: Kudu
>  Issue Type: Improvement
>  Components: ops-tooling
>Reporter: Andrew Wong
>Priority: Major
>
> Once Kudu has the building blocks to orchestrate a rolling restart, it'd be 
> great if we could support restarting multiple nodes at a time.
> Location awareness would play a crucial role in this because, if used to 
> identify racks placement, we could bring down an entire rack at a time if we 
> wanted. If we did this, though, during the controlled restart of a given 
> rack, Kudu would be more vulnerable to the _unexpected_ downtime of another 
> rack.
> One approach would be to support something like [HDFS's upgrade 
> domains|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html]:
> {quote}The idea is to group datanodes in a new dimension called upgrade 
> domain, in addition to the existing rack-based grouping. For example, we can 
> assign all datanodes in the first position of any rack to upgrade domain 
> ud_01, nodes in the second position to upgrade domain ud_02 and so on.
>  ...
>  By default, 3 replicas of any given block are placed on 3 different upgrade 
> domains. This means all datanodes belonging to a specific upgrade domain 
> collectively won’t store more than one replica of any block.
> {quote}
> The decoupling of physical groups from restartable groups should make a batch 
> restarts more robust to rack failures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2917) Split a tablet into primary key ranges by number of row

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2917:
--
Component/s: spark
 perf

> Split a tablet into primary key ranges by number of row
> ---
>
> Key: KUDU-2917
> URL: https://issues.apache.org/jira/browse/KUDU-2917
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf, spark
>Reporter: Xu Yao
>Assignee: Xu Yao
>Priority: Major
>  Labels: impala
>
> Since we implemented 
> [KUDU-2437|https://issues.apache.org/jira/browse/KUDU-2437] and 
> [KUDU-2670|https://issues.apache.org/jira/browse/KUDU-2670], the spark job 
> can read data inside the tablet in parallel. However, we found in actual use 
> that splitting key range by size may cause the spark task to read long tails. 
> (Some tasks read more data when the data size in KeyRange is basically the 
> same.)
> I think this issue is caused by the encoding and compression of column-wise. 
> For example, we store 1000 rows of data in column-wise. If most of these 
> columns have the same values, less storage space is required. Instead, If 
> these columns have different values, more storage is needed. So I think maybe 
> split the primary key range by the number of rows might be a good choice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2933) Improvement estimation of spark relation size.

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2933:
--
Component/s: spark
 perf

> Improvement estimation of spark relation size.
> --
>
> Key: KUDU-2933
> URL: https://issues.apache.org/jira/browse/KUDU-2933
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf, spark
>Reporter: ZhangYao
>Priority: Major
>
> Should take projection and predicates into consideration when estimating the 
> sizeInBytes in KuduRelation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2918) Rebalancer can fail when a service queue is full

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2918:
--
Labels: stability supportability  (was: )

> Rebalancer can fail when a service queue is full
> 
>
> Key: KUDU-2918
> URL: https://issues.apache.org/jira/browse/KUDU-2918
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, ksck
>Affects Versions: 1.11.0
>Reporter: Adar Dembo
>Priority: Major
>  Labels: stability, supportability
>
> The various low-level RPCs issued by ksck aren't retried if the corresponding 
> service queues are full. These include GetConsensusState, GetStatus, and 
> ListTablets.
> Without retries, ksck (and the rebalancer) can fail midway:
> {noformat}
> I0812 11:21:10.669682 42799 rebalancer.cc:831] tablet 
> d729fb149e804696a0862adacb725d66: a0dca75bbbfb4de69616694834adf930 -> 
> 24d0eb73b3c64a0f901ae092186b3439 move is abandoned: Remote error: Service 
> unavailable: GetConsensusState request on kudu.consensus.ConsensusService 
> from 10.17.182.15:50754 dropped due to backpressure. The service queue is 
> full; it has 50 items.
> I0812 11:21:10.871894 42799 rebalancer.cc:239] re-synchronizing cluster state
> Illegal state: tablet server 0d88ff7360b74d1e81cd2ccd41fab8a5 
> (foo.bar.com:7050): unacceptable health status UNAVAILABLE
> {noformat}
> The helper classes in rpc/rpc.h may be useful here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2917) Split a tablet into primary key ranges by number of row

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2917:
--
Labels: impala  (was: )

> Split a tablet into primary key ranges by number of row
> ---
>
> Key: KUDU-2917
> URL: https://issues.apache.org/jira/browse/KUDU-2917
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Xu Yao
>Assignee: Xu Yao
>Priority: Major
>  Labels: impala
>
> Since we implemented 
> [KUDU-2437|https://issues.apache.org/jira/browse/KUDU-2437] and 
> [KUDU-2670|https://issues.apache.org/jira/browse/KUDU-2670], the spark job 
> can read data inside the tablet in parallel. However, we found in actual use 
> that splitting key range by size may cause the spark task to read long tails. 
> (Some tasks read more data when the data size in KeyRange is basically the 
> same.)
> I think this issue is caused by the encoding and compression of column-wise. 
> For example, we store 1000 rows of data in column-wise. If most of these 
> columns have the same values, less storage space is required. Instead, If 
> these columns have different values, more storage is needed. So I think maybe 
> split the primary key range by the number of rows might be a good choice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2915) Support to delete dead tservers from CLI

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2915:
--
Labels: supportability  (was: )

> Support to delete dead tservers from CLI
> 
>
> Key: KUDU-2915
> URL: https://issues.apache.org/jira/browse/KUDU-2915
> Project: Kudu
>  Issue Type: Improvement
>  Components: CLI, ops-tooling
>Affects Versions: 1.10.0
>Reporter: Hexin
>Assignee: Hexin
>Priority: Major
>  Labels: supportability
>
> Sometimes the nodes in the cluster will crash due to machine problems such as 
> disk corruption, which can be very common. However, if there are some dead 
> tservers, ksck result will always show error (e.g. Not all Tablet Servers are 
> reachable) although all tables have recovered to be healthy.
> The only way now to get the healthy status of ksck is to restart all masters 
> one by one. In some cases, for example, if the machine has completely 
> corrupted, we hope to get healthy status of ksck without restarting, since 
> after restarting masters the cluster will take some time to recover, during 
> which it will have influence on scanning or upsetting to tables. The recovery 
> time can be long which mainly depends on the scale of cluster. This problem 
> can be serious and annoying especially tservers crashed with high-frequency 
> in a large cluster.
> It’s valuable if we have an easier way to delete dead tservers from master, I 
> will support a kudu command to realize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-1701) Reduce contention in CatalogManager::ScopedLeaderSharedLock

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-1701.
---
Fix Version/s: 1.13.0
   Resolution: Fixed

> Reduce contention in CatalogManager::ScopedLeaderSharedLock 
> 
>
> Key: KUDU-1701
> URL: https://issues.apache.org/jira/browse/KUDU-1701
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 1.0.0
>Reporter: Todd Lipcon
>Assignee: Alexey Serbin
>Priority: Minor
> Fix For: 1.13.0
>
>
> CatalogManager::ScopedLeaderSharedLock::ScopedLeaderSharedLock() currently 
> holds a spinlock while accessing consensus->ConsensusState(). That call makes 
> a copy of a protobuf, which requires allocation and a later deallocation. 
> Every master lookup RPC needs to go through this path, so this can become a 
> contention point under heavy multi-client load.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2915) Support to delete dead tservers from CLI

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2915:
--
Component/s: ops-tooling

> Support to delete dead tservers from CLI
> 
>
> Key: KUDU-2915
> URL: https://issues.apache.org/jira/browse/KUDU-2915
> Project: Kudu
>  Issue Type: Improvement
>  Components: CLI, ops-tooling
>Affects Versions: 1.10.0
>Reporter: Hexin
>Assignee: Hexin
>Priority: Major
>
> Sometimes the nodes in the cluster will crash due to machine problems such as 
> disk corruption, which can be very common. However, if there are some dead 
> tservers, ksck result will always show error (e.g. Not all Tablet Servers are 
> reachable) although all tables have recovered to be healthy.
> The only way now to get the healthy status of ksck is to restart all masters 
> one by one. In some cases, for example, if the machine has completely 
> corrupted, we hope to get healthy status of ksck without restarting, since 
> after restarting masters the cluster will take some time to recover, during 
> which it will have influence on scanning or upsetting to tables. The recovery 
> time can be long which mainly depends on the scale of cluster. This problem 
> can be serious and annoying especially tservers crashed with high-frequency 
> in a large cluster.
> It’s valuable if we have an easier way to delete dead tservers from master, I 
> will support a kudu command to realize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2915) Support to delete dead tservers from CLI

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2915:
--
Target Version/s:   (was: 1.10.0)

> Support to delete dead tservers from CLI
> 
>
> Key: KUDU-2915
> URL: https://issues.apache.org/jira/browse/KUDU-2915
> Project: Kudu
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 1.10.0
>Reporter: Hexin
>Assignee: Hexin
>Priority: Major
>
> Sometimes the nodes in the cluster will crash due to machine problems such as 
> disk corruption, which can be very common. However, if there are some dead 
> tservers, ksck result will always show error (e.g. Not all Tablet Servers are 
> reachable) although all tables have recovered to be healthy.
> The only way now to get the healthy status of ksck is to restart all masters 
> one by one. In some cases, for example, if the machine has completely 
> corrupted, we hope to get healthy status of ksck without restarting, since 
> after restarting masters the cluster will take some time to recover, during 
> which it will have influence on scanning or upsetting to tables. The recovery 
> time can be long which mainly depends on the scale of cluster. This problem 
> can be serious and annoying especially tservers crashed with high-frequency 
> in a large cluster.
> It’s valuable if we have an easier way to delete dead tservers from master, I 
> will support a kudu command to realize it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3115) Improve scalability of Kudu masters

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3115:
--
Labels: scalability  (was: )

> Improve scalability of Kudu masters
> ---
>
> Key: KUDU-3115
> URL: https://issues.apache.org/jira/browse/KUDU-3115
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: Alexey Serbin
>Priority: Major
>  Labels: scalability
>
> Currently, multiple masters in a multi-master Kudu cluster are used only for 
> high availability & fault tolerance use cases, but not for sharing the load 
> among the available master nodes.  For example, Kudu clients detect current 
> leader master upon connecting to the cluster and send all their subsequent 
> requests to the leader master, so serving many more clients require running 
> masters on more powerful nodes.  Current design assumes that masters store 
> and process the requests for metadata only, but that makes sense only up to 
> some limit on the rate of incoming client requests.
> It would be great to achieve better 'horizontal' scalability for Kudu masters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3115) Improve scalability of Kudu masters

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3115:
--
Component/s: master

> Improve scalability of Kudu masters
> ---
>
> Key: KUDU-3115
> URL: https://issues.apache.org/jira/browse/KUDU-3115
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Reporter: Alexey Serbin
>Priority: Major
>
> Currently, multiple masters in a multi-master Kudu cluster are used only for 
> high availability & fault tolerance use cases, but not for sharing the load 
> among the available master nodes.  For example, Kudu clients detect current 
> leader master upon connecting to the cluster and send all their subsequent 
> requests to the leader master, so serving many more clients require running 
> masters on more powerful nodes.  Current design assumes that masters store 
> and process the requests for metadata only, but that makes sense only up to 
> some limit on the rate of incoming client requests.
> It would be great to achieve better 'horizontal' scalability for Kudu masters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2910) Add client cache/factory implementation to the kudu-client

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2910:
--
Component/s: supportability

> Add client cache/factory implementation to the kudu-client
> --
>
> Key: KUDU-2910
> URL: https://issues.apache.org/jira/browse/KUDU-2910
> Project: Kudu
>  Issue Type: Improvement
>  Components: client, supportability
>Reporter: Grant Henke
>Assignee: Sandish Kumar HN
>Priority: Major
>
> Often integrations should cache and use a shared client for all communication 
> to a given list of masters. This is seen in our own kudu-spark integration in 
> `KuduContext.KuduClientCache`. 
> It would be nice to add a generic implementation to the kudu-client so that 
> this code doesn't get re-written over and over. Additionally we can add more 
> complex logic if useful later. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2903) Durability testing framework and tests

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2903:
--
Labels: roadmap-candidate  (was: )

> Durability testing framework and tests
> --
>
> Key: KUDU-2903
> URL: https://issues.apache.org/jira/browse/KUDU-2903
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.11.0
>Reporter: Adar Dembo
>Priority: Critical
>  Labels: roadmap-candidate
>
> From time to time we get user reports of durability issues in Kudu. We try to 
> be good citizens and obey the POSIX spec w.r.t. durably storing data on disk, 
> but we lack any sort of tests that prove we're doing this correctly.
> Ideally, we'd have a framework that allows us to run a standard Kudu workload 
> while doing pathological things to a subset of nodes like:
> * Panicking the Linux kernel.
> * Abruptly cutting power.
> * Abruptly unmounting a filesystem or yanking a disk.
> Then we'd restart Kudu on the affected nodes and prove that all on-disk data 
> remains consistent.
> Without such a framework, we can only theorize issues and their possible 
> fixes. Some examples include KUDU-2195 and KUDU-2260.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2902) Productionize master state rebuilding tool

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2902.
---
Fix Version/s: NA
   Resolution: Duplicate

> Productionize master state rebuilding tool
> --
>
> Key: KUDU-2902
> URL: https://issues.apache.org/jira/browse/KUDU-2902
> Project: Kudu
>  Issue Type: Bug
>  Components: CLI, master
>Affects Versions: 1.11.0
>Reporter: Adar Dembo
>Priority: Major
> Fix For: NA
>
>
> Will authored a [CLI tool|https://gerrit.cloudera.org/c/9490/] that uses 
> cluster-wide tserver state to rebuild master state (i.e. tables and tablets). 
> We've seen this tool prove useful in some really gnarly support situations. 
> We should productionize it and merge it into the CLI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2899) A flakiness in HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2899.
---
Fix Version/s: NA
   Resolution: Won't Fix

The Sentry integration is removed.

> A flakiness in HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence
> --
>
> Key: KUDU-2899
> URL: https://issues.apache.org/jira/browse/KUDU-2899
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.10.0
>Reporter: Alexey Serbin
>Priority: Major
> Fix For: NA
>
> Attachments: alter_table-randomized-test.01.txt.xz, 
> alter_table-randomized-test.1.txt.xz
>
>
> The {{HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence/1}} 
> scenario of the {{alter_table_randomized-itest.cc}} exhibits flakiness in 
> case of TSAN builds, time to time failing with errors like below:
> {noformat}
> F0719 06:51:55.884040   245 alter_table-randomized-test.cc:499] Check failed: 
> _s.ok() Bad status: Not found: The specified column does not exist
> {noformat}
> It's pretty clear what happened: the call timed out as it's seen from the 
> client side,
> {noformat}
> W0719 06:51:55.871377   445 rpcz_store.cc:253] Call 
> kudu.master.MasterService.AlterTable from 127.0.0.1:54308 (request call id 
> 178) took 10022 ms (10 s). Client timeout  ms (10 s)
> {noformat}
> and the client retried performing the same operation:
> {noformat}
> W0719 06:51:55.850235   874 master_proxy_rpc.cc:192] Re-attempting AlterTable 
> request to leader Master (127.0.61.126:33771)
> {noformat}
> but altering the table (dropping column {{c486}}) has actually succeeded:
> {noformat}
> W0719 06:51:55.868930   493 hms_client.cc:272] Time spent alter HMS table: 
> real 9.997s  user 0.000s sys 0.001s
> {noformat}
> {noformat}
> I0719 06:51:55.890976   990 tablet.cc:1259] T 
> 7f145a02995242298c013c329420b6f5 P d12954c90bf3483f80f0222ee1a74ef9: Alter 
> schema from (
> 10:key INT32 NOT NULL,
>   
> 11:c486 INT32 NULLABLE,   
>   
> PRIMARY KEY (key) 
>   
> ) version 1 to (  
>   
> 10:key INT32 NOT NULL,
>   
> 11:c746 INT32 NULLABLE,   
>   
> PRIMARY KEY (key) 
>   
> ) version 2 
> {noformat}
> I'm attaching full test log reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2898) KuduContext doesn't set a serialVersionUID

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2898:
--
Labels: beginner trivial  (was: )

> KuduContext doesn't set a serialVersionUID
> --
>
> Key: KUDU-2898
> URL: https://issues.apache.org/jira/browse/KUDU-2898
> Project: Kudu
>  Issue Type: Bug
>  Components: spark
>Reporter: Grant Henke
>Priority: Major
>  Labels: beginner, trivial
>
> It looks like KuduContext doesn't have an explicitly set `serialVersionUID` 
> which means that each release of spark-kudu is binary incompatible. 
> We should fix this and check for other classes the implement the 
> `Serializable` interface.
> There is some work to detect these breaks here: 
> https://gerrit.cloudera.org/#/c/13004/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2898) KuduContext doesn't set a serialVersionUID

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2898:
--
Component/s: spark

> KuduContext doesn't set a serialVersionUID
> --
>
> Key: KUDU-2898
> URL: https://issues.apache.org/jira/browse/KUDU-2898
> Project: Kudu
>  Issue Type: Bug
>  Components: spark
>Reporter: Grant Henke
>Priority: Major
>
> It looks like KuduContext doesn't have an explicitly set `serialVersionUID` 
> which means that each release of spark-kudu is binary incompatible. 
> We should fix this and check for other classes the implement the 
> `Serializable` interface.
> There is some work to detect these breaks here: 
> https://gerrit.cloudera.org/#/c/13004/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2893) No RHEL 7.x commands in troubleshooting doc

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2893:
--
Component/s: documentation

> No RHEL 7.x commands in troubleshooting doc
> ---
>
> Key: KUDU-2893
> URL: https://issues.apache.org/jira/browse/KUDU-2893
> Project: Kudu
>  Issue Type: Bug
>  Components: documentation
>Reporter: Peter Ebert
>Priority: Major
>
> The troubleshooting  ([https://kudu.apache.org/docs/troubleshooting.html]) 
> doc's command for starting ntp if installed but not running is out of date 
> for RHEL 7, should be 'systemctl start ntpd'
> Also worth noting that ntp is started immediately after installing, so they 
> shouldn't need to run if just installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2890) Tablet memory Unreleased

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2890.
---
Fix Version/s: NA
   Resolution: Incomplete

> Tablet memory Unreleased
> 
>
> Key: KUDU-2890
> URL: https://issues.apache.org/jira/browse/KUDU-2890
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet
>Affects Versions: 1.6.0
>Reporter: zhiyezou
>Priority: Major
> Fix For: NA
>
> Attachments: image-2019-07-10-15-20-32-990.png
>
>
> This is the problem I encountered.
> !image-2019-07-10-15-20-32-990.png!
> I can not understand the memory usage by this tablet:total usage is larger 
> than the MemRowSet + DeltaMemRowSet. 
> Is there any memory used by tablet,and how can i find them.
> The total memory usage will grow up to the hard memory limit.
> Falling into periodic repetition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2887) Expose the tablet statistics in Client API

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2887:
--
Labels: imapala  (was: )

> Expose the tablet statistics in Client API
> --
>
> Key: KUDU-2887
> URL: https://issues.apache.org/jira/browse/KUDU-2887
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: LiFu He
>Priority: Minor
>  Labels: imapala
>
> The patch about aggregating tablet statistics on the kudu-master is on the 
> way. And I think it's important to expose these statistics in client api by 
> which the query engine can optimize their query plan. For example: (1) adjust 
> the order of scanning tables, (2) Split a big tablet into multiple range 
> pieces(KUDU-2437) to improve concurrency automatically, (3) speed up the 
> query like "select count( *) from table".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2887) Expose the tablet statistics in Client API

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2887:
--
Labels: impala roadmap-candidate  (was: impala)

> Expose the tablet statistics in Client API
> --
>
> Key: KUDU-2887
> URL: https://issues.apache.org/jira/browse/KUDU-2887
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: LiFu He
>Priority: Minor
>  Labels: impala, roadmap-candidate
>
> The patch about aggregating tablet statistics on the kudu-master is on the 
> way. And I think it's important to expose these statistics in client api by 
> which the query engine can optimize their query plan. For example: (1) adjust 
> the order of scanning tables, (2) Split a big tablet into multiple range 
> pieces(KUDU-2437) to improve concurrency automatically, (3) speed up the 
> query like "select count( *) from table".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2887) Expose the tablet statistics in Client API

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2887:
--
Labels: impala  (was: imapala)

> Expose the tablet statistics in Client API
> --
>
> Key: KUDU-2887
> URL: https://issues.apache.org/jira/browse/KUDU-2887
> Project: Kudu
>  Issue Type: Improvement
>  Components: client
>Reporter: LiFu He
>Priority: Minor
>  Labels: impala
>
> The patch about aggregating tablet statistics on the kudu-master is on the 
> way. And I think it's important to expose these statistics in client api by 
> which the query engine can optimize their query plan. For example: (1) adjust 
> the order of scanning tables, (2) Split a big tablet into multiple range 
> pieces(KUDU-2437) to improve concurrency automatically, (3) speed up the 
> query like "select count( *) from table".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2877) Support logging to files and stderr at the same time

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2877.
---
Fix Version/s: 1.11.0
   Resolution: Fixed

Resolved via 
https://github.com/apache/kudu/commit/220391f850847232d9aff1dbd72caa2ce7ec3a1e

> Support logging to files and stderr at the same time
> 
>
> Key: KUDU-2877
> URL: https://issues.apache.org/jira/browse/KUDU-2877
> Project: Kudu
>  Issue Type: Improvement
>  Components: docker
>Reporter: Grant Henke
>Assignee: Sandish Kumar HN
>Priority: Minor
>  Labels: newbie
> Fix For: 1.11.0
>
>
> Support logging to both the _--log_dir_ directory and to stderr via 
> _--logtostderr_ at the same time. This could be done with a new flag. __ 
> This would be useful in docker environments where is standard to log to 
> stderr, but we still want the logs to be viewable in the web ui. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2799) Upgrade HMS integration to Hive 3

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2799.
---
Fix Version/s: 1.13.0
   Resolution: Fixed

> Upgrade HMS integration to Hive 3
> -
>
> Key: KUDU-2799
> URL: https://issues.apache.org/jira/browse/KUDU-2799
> Project: Kudu
>  Issue Type: Bug
>  Components: hms
>Affects Versions: 1.10.0
>Reporter: Adar Dembo
>Priority: Major
> Fix For: 1.13.0
>
>
> Currently our HMS integration depends on Hive 2. We should upgrade it to be 
> compatible with Hive 3.
> For a while we may want to actively test against both versions of Hive if 
> that's possible. If not, we should at least support both versions until 
> users/vendors have had enough time to complete their own upgrades.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2880) TestSecurity is flaky

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2880:
--
Component/s: test

> TestSecurity is flaky
> -
>
> Key: KUDU-2880
> URL: https://issues.apache.org/jira/browse/KUDU-2880
> Project: Kudu
>  Issue Type: Test
>  Components: test
>Reporter: Hao Hao
>Priority: Major
> Attachments: test-output.txt
>
>
> A recent run of TestSecurity failed with the following error:
> {noformat}
> There was 1 failure:
> 1) 
> testExternallyProvidedSubjectRefreshedExternally(org.apache.kudu.client.TestSecurity)
> org.apache.kudu.client.NonRecoverableException: cannot complete before 
> timeout: KuduRpc(method=ListTabletServers, tablet=null, attempt=26, 
> TimeoutTracker(timeout=3, elapsed=29608), Traces: [0ms] refreshing cache 
> from master, [46ms] Sub RPC ConnectToMaster: sending RPC to server 
> master-127.0.202.126:46581, [63ms] Sub RPC ConnectToMaster: sending RPC to 
> server master-127.0.202.124:43241, [69ms] Sub RPC ConnectToMaster: received 
> response from server master-127.0.202.126:46581: Network error: Failed to 
> connect to peer master-127.0.202.126:46581(127.0.202.126:46581): Connection 
> refused: /127.0.202.126:46581, [70ms] Sub RPC ConnectToMaster: sending RPC to 
> server master-127.0.202.125:43873, [250ms] Sub RPC ConnectToMaster: received 
> response from server master-127.0.202.125:43873: Network error: [peer 
> master-127.0.202.125:43873(127.0.202.125:43873)] unexpected exception from 
> downstream on [id: 0x2fae7299, /127.0.0.1:57014 => /127.0.202.125:43873], 
> [282ms] Sub RPC ConnectToMaster: received response from server 
> master-127.0.202.124:43241: OK, [336ms] delaying RPC due to: Service 
> unavailable: Master config 
> (127.0.202.126:46581,127.0.202.124:43241,127.0.202.125:43873) has no leader. 
> Exceptions received: org.apache.kudu.client.RecoverableException: Failed to 
> connect to peer master-127.0.202.126:46581(127.0.202.126:46581): Connection 
> refused: /127.0.202.126:46581,org.apache.kudu.client.RecoverableException: 
> [peer master-127.0.202.125:43873(127.0.202.125:43873)] unexpected exception 
> from downstream on [id: 0x2fae7299, /127.0.0.1:57014 => 
> /127.0.202.125:43873], [357ms] refreshing cache from master, [358ms] Sub RPC 
> ConnectToMaster: sending RPC to server master-127.0.202.126:46581, [358ms] 
> Sub RPC ConnectToMaster: sending RPC to server master-127.0.202.124:43241, 
> [360ms] Sub RPC ConnectToMaster: received response from server 
> master-127.0.202.126:46581: Network error: java.net.ConnectException: 
> Connection refused: /127.0.202.126:46581, [360ms] Sub RPC ConnectToMaster: 
> sending RPC to server master-127.0.202.125:43873, [361ms] Sub RPC 
> ConnectToMaster: received response from server master-127.0.202.125:43873: 
> Network error: Failed to connect to peer 
> master-127.0.202.125:43873(127.0.202.125:43873): Connection refused: 
> /127.0.202.125:43873, [363ms] Sub RPC ConnectToMaster: received response from 
> server master-127.0.202.124:43241: OK, [364ms] delaying RPC due to: Service 
> unavailable: Master config 
> (127.0.202.126:46581,127.0.202.124:43241,127.0.202.125:43873) has no leader. 
> Exceptions received: org.apache.kudu.client.RecoverableException: 
> java.net.ConnectException: Connection refused: 
> /127.0.202.126:46581,org.apache.kudu.client.RecoverableException: Failed to 
> connect to peer master-127.0.202.125:43873(127.0.202.125:43873): Connection 
> refused: /127.0.202.125:43873, [376ms] refreshing cache from master, [377ms] 
> Sub RPC ConnectToMaster: sending RPC to server master-127.0.202.126:46581, 
> [377ms] Sub RPC ConnectToMaster: sending RPC to server 
> master-127.0.202.124:43241, [378ms] Sub RPC ConnectToMaster: sending RPC to 
> server master-127.0.202.125:43873, [379ms] Sub RPC ConnectToMaster: received 
> response from server master-127.0.202.126:46581: Network error: Failed to 
> connect to peer master-127.0.202.126:46581(127.0.202.126:46581): Connection 
> refused: /127.0.202.126:46581, [381ms] Sub RPC ConnectToMaster: received 
> response from server master-127.0.202.125:43873: Network error: 
> java.net.ConnectException: Connection refused: /127.0.202.125:43873, [382ms] 
> Sub RPC ConnectToMaster: received response from server 
> master-127.0.202.124:43241: OK, [383ms] delaying RPC due to: Service 
> unavailable: Master config 
> (127.0.202.126:46581,127.0.202.124:43241,127.0.202.125:43873) has no leader. 
> Exceptions received: org.apache.kudu.client.RecoverableException: Failed to 
> connect to peer master-127.0.202.126:46581(127.0.202.126:46581): Connection 
> refused: /127.0.202.126:46581,org.apache.kudu.client.RecoverableException: 
> java.net.ConnectException: Connection refused: /127.0.202.125:43873, [397ms] 
> refreshing cache from master, [397ms] Sub RPC ConnectToMa

[jira] [Assigned] (KUDU-2799) Upgrade HMS integration to Hive 3

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KUDU-2799:
-

Assignee: Grant Henke

> Upgrade HMS integration to Hive 3
> -
>
> Key: KUDU-2799
> URL: https://issues.apache.org/jira/browse/KUDU-2799
> Project: Kudu
>  Issue Type: Bug
>  Components: hms
>Affects Versions: 1.10.0
>Reporter: Adar Dembo
>Assignee: Grant Henke
>Priority: Major
> Fix For: 1.13.0
>
>
> Currently our HMS integration depends on Hive 2. We should upgrade it to be 
> compatible with Hive 3.
> For a while we may want to actively test against both versions of Hive if 
> that's possible. If not, we should at least support both versions until 
> users/vendors have had enough time to complete their own upgrades.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2798) Fix logging on deleted TSK entries

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2798:
--
Fix Version/s: 1.12.0
   Resolution: Fixed
   Status: Resolved  (was: In Review)

> Fix logging on deleted TSK entries
> --
>
> Key: KUDU-2798
> URL: https://issues.apache.org/jira/browse/KUDU-2798
> Project: Kudu
>  Issue Type: Task
>Affects Versions: 1.8.0, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.11.0, 1.11.1
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Minor
>  Labels: newbie
> Fix For: 1.12.0
>
>
> It seems the identifiers of the deleted TSK entries in the log lines below 
> need decoding:
> {noformat}
> I0312 15:17:14.808763 71553 catalog_manager.cc:4095] T 
>  P f05d759af7824df9aafedcc106674182: 
> Generated new TSK 2
> I0312 15:17:14.811144 71553 catalog_manager.cc:4133] T 
>  P f05d759af7824df9aafedcc106674182: Deleted 
> TSKs: �, �
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2829) GCC 9 compilation fails on linux system syscall support library -- dependency of breakpoint

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2829:
--
Component/s: build

> GCC 9 compilation fails on linux system syscall support library -- dependency 
> of breakpoint
> ---
>
> Key: KUDU-2829
> URL: https://issues.apache.org/jira/browse/KUDU-2829
> Project: Kudu
>  Issue Type: Bug
>  Components: build
>Reporter: Scott Reynolds
>Assignee: Scott Reynolds
>Priority: Major
>
> GCC 9.X adds a compilation failure when code attempts to clobber %rsp.
> [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813]
> GCC 9.X added the enforcement of this and causes Kudu not to compile. Linux 
> System Syscall Support library added a change to adjust that:
> [https://chromium.googlesource.com/linux-syscall-support/+/8048ece6c16c91acfe0d36d1d3cc0890ab6e945c%5E%21/#F0]
> We can either upgrade to a newer version or backport that patch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2865) Relax the requirements to get an authorization token

2020-06-03 Thread Grant Henke (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125046#comment-17125046
 ] 

Grant Henke commented on KUDU-2865:
---

Has this changed at all as a result of the Ranger integration? 

> Relax the requirements to get an authorization token
> 
>
> Key: KUDU-2865
> URL: https://issues.apache.org/jira/browse/KUDU-2865
> Project: Kudu
>  Issue Type: Improvement
>  Components: authz
>Affects Versions: 1.10.0
>Reporter: Andrew Wong
>Priority: Major
>
> Currently in order to do any DML with Kudu, a user must have any (i.e. 
> "METADATA") privilege on a table so the user can get an authorization token. 
> This is because authz token generation is piggy-backed on the GetTableSchema 
> endpoint, which does all-or-nothing authorization for the table.
> This isn't a great user experience, e.g. if a user only has column-level 
> privileges. Unless such a user _also_ had a table-level privilege (e.g. 
> insert privileges on the table), the user would be unable to scan the columns 
> through direct Kudu APIs. We should consider perhaps modifying the 
> GetTableSchema endpoint to return only the sub-schema and the privileges for 
> which the user has column-level privileges or higher.
> This user experience would be closer to what is supported by Apache Impala.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2871) TLS 1.3 not supported by krpc

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2871:
--
Target Version/s:   (was: 1.10.0)

> TLS 1.3 not supported by krpc
> -
>
> Key: KUDU-2871
> URL: https://issues.apache.org/jira/browse/KUDU-2871
> Project: Kudu
>  Issue Type: Bug
>  Components: master, rpc, security, tserver
>Affects Versions: 1.8.0, 1.9.0, 1.9.1
>Reporter: Todd Lipcon
>Priority: Major
>
> The TLS negotiation in our RPC protocol assumes a whole number of round trips 
> between client and server. For TLS 1.3, the exchange has 1.5 round trips (the 
> client is the last sender rather than the server) which breaks negotiation. 
> Most tests thus fail with OpenSSL 1.1.1.
> We should temporarily disable TLS 1.3 and then fix RPC to support this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2858) Update docker readme to be more user focused

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2858:
--
Component/s: documentation

> Update docker readme to be more user focused
> 
>
> Key: KUDU-2858
> URL: https://issues.apache.org/jira/browse/KUDU-2858
> Project: Kudu
>  Issue Type: Improvement
>  Components: docker, documentation
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
>  Labels: docker
>
> Now that the docker images are being published, we should update the readme 
> to focus less on building the images and more on using the already built 
> images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2857) Rewrite docker build script in python

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2857:
--
Labels: build docker  (was: docker)

> Rewrite docker build script in python
> -
>
> Key: KUDU-2857
> URL: https://issues.apache.org/jira/browse/KUDU-2857
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
>  Labels: build, docker
>
> The docker build bash script has gotten sufficiently complicated that it 
> should be rewritten in python.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2811) Fuzz test needed for backup-restore

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2811:
--
Component/s: test

> Fuzz test needed for backup-restore
> ---
>
> Key: KUDU-2811
> URL: https://issues.apache.org/jira/browse/KUDU-2811
> Project: Kudu
>  Issue Type: Bug
>  Components: backup, test
>Affects Versions: 1.9.0
>Reporter: William Berkeley
>Priority: Major
>  Labels: backup
>
> We need to fuzz test backup-restore by having a test that creates a table 
> through a random sequence of operations while also randomly doing incremental 
> backups. We should then check the restored table against the original table.
> This would have caught KUDU-2809.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2828) Fix C++ code style and compile warning

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2828:
--
Component/s: build

> Fix C++ code style and compile warning
> --
>
> Key: KUDU-2828
> URL: https://issues.apache.org/jira/browse/KUDU-2828
> Project: Kudu
>  Issue Type: Improvement
>  Components: build
>Reporter: ZhangYao
>Assignee: ZhangYao
>Priority: Major
>
>     Currently I run kudu by the C++ static analysis tools which reports some 
> warning of the code. I try to fix the considerable warning and will push
> some commit.
> Such as:
>  Unuse variables in functions.
>  Uninit member varibales.
>  and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2805) ClientTest.TestServerTooBusyRetry fails due to TSAN thread limit

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2805:
--
Component/s: test

> ClientTest.TestServerTooBusyRetry fails due to TSAN thread limit
> 
>
> Key: KUDU-2805
> URL: https://issues.apache.org/jira/browse/KUDU-2805
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.9.0
>Reporter: William Berkeley
>Priority: Major
> Attachments: client-test.tsanlimit.txt
>
>
> I've seen a couple instances where ClientTest.TestServerTooBusyRetry fails 
> after hitting the TSAN thread limit, after seemingly being stuck for 10 
> minutes or so. The end of the logs look like
> {noformat}
> W0428 12:20:07.406752 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08000c2ba0 after lost signal to thread 8435
> W0428 12:20:07.412693 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b080019f2a0 after lost signal to thread 10185
> W0428 12:20:07.418191 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b080018f060 after lost signal to thread 10361
> W0428 12:20:23.873589 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08000fc360 after lost signal to thread 8435
> W0428 12:20:23.878401 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08000ccf20 after lost signal to thread 10185
> W0428 12:20:23.884522 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b0800051ae0 after lost signal to thread 10361
> W0428 12:22:03.715726 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08000f9280 after lost signal to thread 8435
> W0428 12:22:03.721261 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08001b0e40 after lost signal to thread 10185
> W0428 12:22:03.727725 10297 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08000b7460 after lost signal to thread 10361
> W0428 12:22:11.928373 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b0800044be0 after lost signal to thread 8435
> W0428 12:22:11.933187 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b080018f3c0 after lost signal to thread 10185
> W0428 12:22:11.939275 10139 debug-util.cc:397] Leaking SignalData structure 
> 0x7b08001b3480 after lost signal to thread 10361
> ==8432==ThreadSanitizer: Thread limit (8128 threads) exceeded. Dying.
> {noformat}
> Some threads are unresponsive, even to the signals sent by the stack trace 
> collector thread. Unfortunately, there's nothing in the logs about those 
> threads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2816) Failure due to column already present in HmsSentryConfigurations.AlterTableRandomized

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2816.
---
Fix Version/s: NA
   Resolution: Won't Fix

The Sentry integration is removed.

> Failure due to column already present in 
> HmsSentryConfigurations.AlterTableRandomized
> -
>
> Key: KUDU-2816
> URL: https://issues.apache.org/jira/browse/KUDU-2816
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: William Berkeley
>Priority: Major
> Fix For: NA
>
> Attachments: alter_table-randomized-test.1.txt
>
>
> {noformat}
> F0504 12:41:37.638859   231 alter_table-randomized-test.cc:499] Check failed: 
> _s.ok() Bad status: Already present: The column already exists: c310
> *** Check failure stack trace: ***
> *** Aborted at 1556973697 (unix time) try "date -d @1556973697" if you are 
> using GNU date ***
> PC: @ 0x7f698597bc37 gsignal
> *** SIGABRT (@0x3e800e7) received by PID 231 (TID 0x7f69a02ef900) from 
> PID 231; stack trace: ***
> @ 0x7f698d6c0330 (unknown) at ??:0
> @ 0x7f698597bc37 gsignal at ??:0
> @ 0x7f698597f028 abort at ??:0
> @ 0x7f6988cbfa29 google::logging_fail() at ??:0
> @ 0x7f6988cc131d google::LogMessage::Fail() at ??:0
> @ 0x7f6988cc31dd google::LogMessage::SendToLog() at ??:0
> @ 0x7f6988cc0e59 google::LogMessage::Flush() at ??:0
> @ 0x7f6988cc3c7f google::LogMessageFatal::~LogMessageFatal() at ??:0
> @   0x586325 kudu::MirrorTable::RandomAlterTable() at 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/alter_table-randomized-test.cc:499
> @   0x5805b4 
> kudu::AlterTableRandomized_TestRandomSequence_Test::TestBody() at 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/alter_table-randomized-test.cc:749
> @ 0x7f698ada0b98 
> testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0
> @ 0x7f698ad8e1b2 testing::Test::Run() at ??:0
> @ 0x7f698ad8e2f8 testing::TestInfo::Run() at ??:0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2829) GCC 9 compilation fails on linux system syscall support library -- dependency of breakpoint

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2829:
--
Target Version/s:   (was: 1.10.0)

> GCC 9 compilation fails on linux system syscall support library -- dependency 
> of breakpoint
> ---
>
> Key: KUDU-2829
> URL: https://issues.apache.org/jira/browse/KUDU-2829
> Project: Kudu
>  Issue Type: Bug
>Reporter: Scott Reynolds
>Assignee: Scott Reynolds
>Priority: Major
>
> GCC 9.X adds a compilation failure when code attempts to clobber %rsp.
> [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813]
> GCC 9.X added the enforcement of this and causes Kudu not to compile. Linux 
> System Syscall Support library added a change to adjust that:
> [https://chromium.googlesource.com/linux-syscall-support/+/8048ece6c16c91acfe0d36d1d3cc0890ab6e945c%5E%21/#F0]
> We can either upgrade to a newer version or backport that patch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2817) C++ Upgrades for before Kudu 1.13 release

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2817:
--
Component/s: build

> C++ Upgrades for before Kudu 1.13 release
> -
>
> Key: KUDU-2817
> URL: https://issues.apache.org/jira/browse/KUDU-2817
> Project: Kudu
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 1.10.0
>Reporter: Grant Henke
>Priority: Major
>
> We should consider reviewing and upgrading our dependencies before the next 
> release. Below is a list of current dependencies and their latest release.
>  * gflags: 2.2.0 (Nov 2016) -> 2.2.2 (Nov 2018)
>  * glog: 0.3.5 (May 2017) -> 0.4.0 (Mar 2019)
>  * gmock: 1.8.0 -> 1.8.1
>  * gperftools: 2.6.90 -> 2.7
>  * protobuf: 3.4.1 -> 3.7.1 (3.8.0 soon)
>  * cmake: 3.9.0 (Nov 2018) -> 3.14.3 (May 2019)
>  * snappy: 1.1.4 (Jan 2017) -> 1.1.7 (Aug 2017)
>  * lz4: r130 (patched, 2015) -> 1.9.1 (May 2019, expected perf gains)
>  * bitshuffle: 55f9b4c (patched, 2016) -> 0.3.5 (Nov 2018)
>  * zlib: 1.2.8 (Apr 2013) -> 1.2.11 (Jan 2017)
>  * libev: 4.20 -> 4.22
>  * rapidjson: 1.1.0 (current)
>  * squeasel: current
>  * mustache: 87a592e8aa04497764c533acd6e887618ca7b8a8 (Feb 2017) -> 
> cf5c3dd499ea2bc9eb5c2072fb551dc7af75aa57 (Jun 2017)
>  ** Consider using official mustach c++ support?
>  * curl: 7.59.0 (Mar 2018) -> 7.64.1 (Mar 2019)
>  * crcutil: current
>  * libunwind: 1.3-rc1 (patched, Nov 2017) -> 1.3.1 (Jan 2019)
>  * llvm: 6.0.0 (Mar 2018) -> 8.0.0 (Mar 2019)
>  * iwyu: 0.9 -> 0.12 (May 2019)
>  * nvml: 1.1 (2016) -> 1.6 (now called pmdk, Mar 2019)
>  ** Patch to replace with memkind is posted
>  * boost: 1.61.0 (patched, 2016) -> 1.70.0 (Apr 2019)
>  * breakpad: 9eac2058b70615519b2c4d8c6bdbfca1bd079e39 (Apr 2013) -> 
> 21b48a72aa50dde84149267f6b7402522b846b24 (Apr 2019)
>  * sparsepp: 47a55825ca3b35eab1ca22b7ab82b9544e32a9af (Nov 2016) -> 
> 5ca6de766db32b3fb08a040636423cd3988d2d4f (Jun 2018)
>  * thrift: 0.11 (Dec 2017) -> 0.12 (Dec 2018)
>  * bison: 3.0.4 (patched, 2015) -> 3.3 (Jan 2019)
>  * hive: 498021fa15186aee8b282d3c032fbd2cede6bec4 (commit in Hive 2) -> 3.1.1 
> (Oct 2018)
>  * hadoop: 2.8.5 (Sept 2018) -> 3.1.2 (Feb 2019)
>  * sentry: 505b42e81a9d85c4ebe8db3f48ad7a6e824a5db5 (commit in Master)
>  * python: 2.7.13 -> (a lot of choices here)
> A quick risk/reward review should be done and we should upgrade the 
> dependencies that are expected to be beneficial. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2815) RaftConsensusNonVoterITest.PromoteAndDemote fails if manually-run election fails.

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2815:
--
Component/s: test

> RaftConsensusNonVoterITest.PromoteAndDemote fails if manually-run election 
> fails.
> -
>
> Key: KUDU-2815
> URL: https://issues.apache.org/jira/browse/KUDU-2815
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.9.0
>Reporter: William Berkeley
>Priority: Major
> Attachments: raft_consensus_nonvoter-itest.txt, 
> raft_consensus_nonvoter-itest.txt
>
>
> RaftConsensusNonVoterITest.PromoteAndDemote disables normal leader elections 
> and runs an election manually, to avoid some previous flakiness. 
> Unfortunately, this introduces flakiness, because, rarely, the manual 
> election fails when the vote requests time out. The candidate concludes it 
> has lost the election, and then after that the two other voters vote yes.
> The timeout for vote requests is 170ms, which is pretty short. If it were 
> raised to, say, 5s, the test would probably not be flaky anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2816) Failure due to column already present in HmsSentryConfigurations.AlterTableRandomized

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2816:
--
Component/s: test

> Failure due to column already present in 
> HmsSentryConfigurations.AlterTableRandomized
> -
>
> Key: KUDU-2816
> URL: https://issues.apache.org/jira/browse/KUDU-2816
> Project: Kudu
>  Issue Type: Bug
>  Components: test
>Reporter: William Berkeley
>Priority: Major
> Attachments: alter_table-randomized-test.1.txt
>
>
> {noformat}
> F0504 12:41:37.638859   231 alter_table-randomized-test.cc:499] Check failed: 
> _s.ok() Bad status: Already present: The column already exists: c310
> *** Check failure stack trace: ***
> *** Aborted at 1556973697 (unix time) try "date -d @1556973697" if you are 
> using GNU date ***
> PC: @ 0x7f698597bc37 gsignal
> *** SIGABRT (@0x3e800e7) received by PID 231 (TID 0x7f69a02ef900) from 
> PID 231; stack trace: ***
> @ 0x7f698d6c0330 (unknown) at ??:0
> @ 0x7f698597bc37 gsignal at ??:0
> @ 0x7f698597f028 abort at ??:0
> @ 0x7f6988cbfa29 google::logging_fail() at ??:0
> @ 0x7f6988cc131d google::LogMessage::Fail() at ??:0
> @ 0x7f6988cc31dd google::LogMessage::SendToLog() at ??:0
> @ 0x7f6988cc0e59 google::LogMessage::Flush() at ??:0
> @ 0x7f6988cc3c7f google::LogMessageFatal::~LogMessageFatal() at ??:0
> @   0x586325 kudu::MirrorTable::RandomAlterTable() at 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/alter_table-randomized-test.cc:499
> @   0x5805b4 
> kudu::AlterTableRandomized_TestRandomSequence_Test::TestBody() at 
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/alter_table-randomized-test.cc:749
> @ 0x7f698ada0b98 
> testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0
> @ 0x7f698ad8e1b2 testing::Test::Run() at ??:0
> @ 0x7f698ad8e2f8 testing::TestInfo::Run() at ??:0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2822) Kudu create table problem

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2822.
---
Fix Version/s: NA
   Resolution: Incomplete

> Kudu create table  problem
> --
>
> Key: KUDU-2822
> URL: https://issues.apache.org/jira/browse/KUDU-2822
> Project: Kudu
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: kun'qin 
>Priority: Major
> Fix For: NA
>
>
> There are five ts, each with 775 partitions. Through the impala kudu table, 
> 100 partitions, the number of partitions per ts has been growing, increasing 
> to 1500+



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2838) HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence is flaky

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2838.
---
Fix Version/s: NA
   Resolution: Won't Fix

The Sentry integration is removed.

> HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence is flaky
> 
>
> Key: KUDU-2838
> URL: https://issues.apache.org/jira/browse/KUDU-2838
> Project: Kudu
>  Issue Type: Bug
>  Components: master
>Affects Versions: 1.10.0
>Reporter: Alexey Serbin
>Priority: Major
> Fix For: NA
>
> Attachments: alter_table-randomized-test.1.txt.xz
>
>
> The {{HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence}} test 
> is flaky.  Sometimes, it fails with the following error:
> {noformat}
> Bad status: Already present: Error creating table default.test_table on the 
> master: table default.test_table already exists with id 
> b1359f3663b34ce2a01009f6538dbffc
> {noformat}
> The log is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2829) GCC 9 compilation fails on linux system syscall support library -- dependency of breakpoint

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2829.
---
Fix Version/s: 1.13.0
   Resolution: Fixed

Resolved via 
https://github.com/apache/kudu/commit/c5fd2e44245502c3c7f8dd0f2cd7639815fef0aa

> GCC 9 compilation fails on linux system syscall support library -- dependency 
> of breakpoint
> ---
>
> Key: KUDU-2829
> URL: https://issues.apache.org/jira/browse/KUDU-2829
> Project: Kudu
>  Issue Type: Bug
>  Components: build
>Reporter: Scott Reynolds
>Assignee: Scott Reynolds
>Priority: Major
> Fix For: 1.13.0
>
>
> GCC 9.X adds a compilation failure when code attempts to clobber %rsp.
> [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813]
> GCC 9.X added the enforcement of this and causes Kudu not to compile. Linux 
> System Syscall Support library added a change to adjust that:
> [https://chromium.googlesource.com/linux-syscall-support/+/8048ece6c16c91acfe0d36d1d3cc0890ab6e945c%5E%21/#F0]
> We can either upgrade to a newer version or backport that patch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2782) Implement distributed tracing support in Kudu

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2782:
--
Labels: roadmap-candidate supportability  (was: )

> Implement distributed tracing support in Kudu
> -
>
> Key: KUDU-2782
> URL: https://issues.apache.org/jira/browse/KUDU-2782
> Project: Kudu
>  Issue Type: Task
>  Components: ops-tooling
>Reporter: Mike Percy
>Priority: Major
>  Labels: roadmap-candidate, supportability
>
> It would be useful to implement distributed tracing support in Kudu, 
> especially something like OpenTracing support that we could use with Zipkin, 
> Jaeger, DataDog, etc. Particularly useful would be auto-sampled and on-demand 
> traces of write RPCs since that would help us identify slow nodes or hotspots 
> in the replication group and troubleshoot performance and stability issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2789) Document how to use/run the backup and restore jobs

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2789.
---
Fix Version/s: 1.10.0
   Resolution: Fixed

> Document how to use/run the backup and restore jobs
> ---
>
> Key: KUDU-2789
> URL: https://issues.apache.org/jira/browse/KUDU-2789
> Project: Kudu
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Grant Henke
>Priority: Major
>  Labels: backup
> Fix For: 1.10.0
>
>
> Before the backup and restore functionality is considered GA we should be 
> sure it's well documented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2779) MasterStressTest is flaky when HMS is enabled

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2779:
--
Component/s: test

> MasterStressTest is flaky when HMS is enabled
> -
>
> Key: KUDU-2779
> URL: https://issues.apache.org/jira/browse/KUDU-2779
> Project: Kudu
>  Issue Type: Test
>  Components: test
>Reporter: Hao Hao
>Priority: Major
> Attachments: master-stress-test.1.txt.xz
>
>
> Encountered failure in master-stress-test.cc when HMS integration is enabled: 
> {noformat}
> 22:30:11.487 [HMS - ERROR - pool-8-thread-2] (HiveAlterHandler.java:341) 
> Failed to alter table default.table_1529084adeeb48719dd0a1d18572b357
> 22:30:11.494 [HMS - ERROR - pool-8-thread-3] (HiveAlterHandler.java:341) 
> Failed to alter table default.table_4657eb1f8bbe4b60b03db2cbf07803a3
> 22:30:11.506 [HMS - ERROR - pool-8-thread-2] (RetryingHMSHandler.java:200) 
> MetaException(message:java.lang.IllegalStateException: Event not set up 
> correctly)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6189)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:4063)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:4020)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy24.alter_table_with_environment_context(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:11631)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:11615)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:103)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Event not set up correctly
>   at 
> org.apache.hadoop.hive.metastore.messaging.AlterTableMessage.checkValid(AlterTableMessage.java:49)
>   at 
> org.apache.hadoop.hive.metastore.messaging.json.JSONAlterTableMessage.(JSONAlterTableMessage.java:57)
>   at 
> org.apache.hadoop.hive.metastore.messaging.json.JSONMessageFactory.buildAlterTableMessage(JSONMessageFactory.java:115)
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener.onAlterTable(DbNotificationListener.java:187)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$8.notify(MetaStoreListenerNotifier.java:107)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:175)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:205)
>   at 
> org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:317)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:4049)
>   ... 16 more
> Caused by: org.apache.thrift.protocol.TProtocolException: Unexpected 
> character:{
>   at 
> org.apache.thrift.protocol.TJSONProtocol.readJSONSyntaxChar(TJSONProtocol.java:337)
>   at 
> org.apache.thrift.protocol.TJSONProtocol$JSONPairContext.read(TJSONProtocol.java:246)
>   at 
> org.apache.thrift.protocol.TJSONProtocol.readJSONObjectStart(TJSONProtocol.java:793)
>   at 
> org.apache.thrift.protocol.TJSONProtocol.readStructBegin(TJSONProtocol.java:840)
>   at 
> org.apache.hadoop.hive.metastore.api.Table$TableStandardScheme.read(Table.java:1577)
>   at 
> org.apache.hadoop.hive.metastore.api.Table$TableStandardScheme.read(Table.java:1573)
>   at org.apache.hadoop.hive.metastore.api.Table.read(Table.java:1407)
>   at org.apache.thrift.TDeserializer.deserialize(TDeserializer.java:81)
>   at org.apache.thr

[jira] [Updated] (KUDU-2778) Explore limitations of multi-master Kudu deployments with more than 3 masters

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2778:
--
Labels: roadmap-candidate supportability  (was: )

> Explore limitations of multi-master Kudu deployments with more than 3 masters
> -
>
> Key: KUDU-2778
> URL: https://issues.apache.org/jira/browse/KUDU-2778
> Project: Kudu
>  Issue Type: Task
>  Components: master
>Reporter: Alexey Serbin
>Priority: Major
>  Labels: roadmap-candidate, supportability
>
> Currently, the recommended limit of Kudu masters in multi-master deployment 
> is 3 (i.e. no more than 3 masters is recommended): 
> https://github.com/apache/kudu/blob/branch-1.9.x/docs/known_issues.adoc#scale
> It would be nice to clarify whether there is anything substantial behind that 
> limit.  As of now the recommendation stems from the fact that all of our 
> multi-master tests and tested deployments use 3 masters.  Overall, being able 
> to deploy 5 or more masters in case of bigger clusters makes sense from the 
> HA perspective.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2970) Fine-grained authorization with Ranger

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2970.
---
Fix Version/s: 1.12.0
   Resolution: Fixed

>  Fine-grained authorization with Ranger
> ---
>
> Key: KUDU-2970
> URL: https://issues.apache.org/jira/browse/KUDU-2970
> Project: Kudu
>  Issue Type: New Feature
>  Components: security
>Affects Versions: 1.11.0
>Reporter: Hao Hao
>Assignee: Hao Hao
>Priority: Major
> Fix For: 1.12.0
>
>
> With the completion of Kudu’s integration with Apache Sentry, fine-grained 
> authorization capabilities have been added to Kudu. However, because Apache 
> Ranger has wider adoption and provides a more comprehensive security features 
> (such as attribute based access control, audit, etc) than Sentry, it is 
> important for Kudu to also integrate Ranger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-3080) Add SSL support to MiniRanger

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3080:
--
Component/s: test

> Add SSL support to MiniRanger
> -
>
> Key: KUDU-3080
> URL: https://issues.apache.org/jira/browse/KUDU-3080
> Project: Kudu
>  Issue Type: Sub-task
>  Components: test
>Reporter: Attila Bukor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-3078) Ranger integration testing

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-3078.
---
Fix Version/s: 1.12.0
   Resolution: Fixed

> Ranger integration testing
> --
>
> Key: KUDU-3078
> URL: https://issues.apache.org/jira/browse/KUDU-3078
> Project: Kudu
>  Issue Type: Sub-task
>Reporter: Attila Bukor
>Assignee: Attila Bukor
>Priority: Major
> Fix For: 1.12.0
>
>
> The Ranger integration should be properly tested before we can remove the 
> experimental flag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2882) Increase the timeout interval for TestSentryClientMetrics.Basic

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2882.
---
Fix Version/s: NA
   Resolution: Won't Fix

The Sentry integration is removed.

> Increase the timeout interval for TestSentryClientMetrics.Basic
> ---
>
> Key: KUDU-2882
> URL: https://issues.apache.org/jira/browse/KUDU-2882
> Project: Kudu
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 1.10.0
>Reporter: LiFu He
>Priority: Minor
> Fix For: NA
>
>
> When I run the test cases of 1.10.0-RC2, the 'TestSentryClientMetrics.Basic' 
> is a little bit strange. Sometimes it works, but sometime it doesn't. Today, 
> I took a close look at the output log and found some useful info:
> {code:java}
> // code placeholder
> I0701 16:37:24.925388 33240 thread.cc:675] Ended thread 33240 - thread 
> pool:Sentry [worker]
> I0701 16:37:24.925501 33015 thread.cc:624] Started thread 33436 - thread 
> pool:Sentry [worker]
> I0701 16:37:25.322556 33015 mini_sentry.cc:164] Pausing Sentry
> W0701 16:37:27.331832 33436 sentry_client.cc:134] Time spent starting Sentry 
> client: real 1.999s user 0.000s sys 0.000s
> W0701 16:37:27.331894 33436 client.h:352] Failed to connect to Sentry 
> (127.32.61.193:59755): Timed out: failed to open Sentry connection: 
> THRIFT_EAGAIN (timed out)
> I0701 16:37:27.331986 33015 mini_sentry.cc:172] Resuming Sentry
> /mnt/ddb/2/helif/apache/kudu/src/kudu/master/sentry_authz_provider-test.cc:1415:
>  Failure
> Expected: (200) < (hist->histogram()->MaxValue()), actual: 200 vs 
> 1999002
> I0701 16:37:27.332604 33015 mini_sentry.cc:155] Stopping Sentry
> {code}
> Then I looked through the file 'sentry_authz_provider-test.cc', it seems the 
> timeout value is too short:
> [https://github.com/apache/kudu/blob/5c652defff422f908dacc11011dc6ae59bf49be5/src/kudu/master/sentry_authz_provider-test.cc#L1396]
> Perhaps, we can increase this value (default 60 seconds) to 4 or 5 seconds to 
> avoid the failures, though Alexey Serbin(not sure) and I have this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2784) MasterSentryTest.TestTableOwnership is flaky

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2784.
---
Fix Version/s: NA
   Resolution: Won't Fix

The Sentry integration is removed.

> MasterSentryTest.TestTableOwnership is flaky
> 
>
> Key: KUDU-2784
> URL: https://issues.apache.org/jira/browse/KUDU-2784
> Project: Kudu
>  Issue Type: Test
>Reporter: Hao Hao
>Assignee: Hao Hao
>Priority: Major
> Fix For: NA
>
> Attachments: master_sentry-itest.2.txt
>
>
> Encountered a failure in with the following error:
> {noformat}
> W0423 04:49:43.773183  1862 sentry_authz_provider.cc:269] Action  on 
> table  with authorizable scope  is not permitted for 
> user 
> I0423 04:49:43.773447  1862 rpcz_store.cc:269] Call 
> kudu.master.MasterService.DeleteTable from 127.0.0.1:44822 (request call id 
> 6) took 2093ms. Request Metrics: 
> {"Sentry.queue_time_us":33,"Sentry.run_cpu_time_us":390,"Sentry.run_wall_time_us":18856}
> /home/jenkins-slave/workspace/kudu-master/1/src/kudu/integration-tests/master_sentry-itest.cc:446:
>  Failure
> Failed
> Bad status: Not authorized: unauthorized action
> {noformat}
> This could be owner privilege hasn't reflected yet for ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2773) MiniSentry: JAVA_TOOL_OPTIONS env variable sometimes is not picked up

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2773.
---
Fix Version/s: NA
   Resolution: Won't Fix

The Sentry integration is removed.

> MiniSentry: JAVA_TOOL_OPTIONS env variable sometimes is not picked up
> -
>
> Key: KUDU-2773
> URL: https://issues.apache.org/jira/browse/KUDU-2773
> Project: Kudu
>  Issue Type: Bug
>  Components: authz, security, test
>Reporter: Alexey Serbin
>Priority: Major
> Fix For: NA
>
>
> In MiniSentry, the {{JAVA_TOOL_OPTIONS}} environment variable is set into the 
> environment upon every start of Sentry process by the {{MiniSentry::Start()}} 
> method.  In most cases it's being picked up, but sometimes that fails in the 
> {{TestSentryClientMetrics.Basic}} test scenario.  When that happens, the 
> tests fails.  An example of failure log is below:
> {noformat}
> [--] 1 test from TestSentryClientMetrics  
>   
> [ RUN  ] TestSentryClientMetrics.Basic
>   
> Picked up JAVA_TOOL_OPTIONS:  
>   
> 16:45:18.723 [SENTRY - WARN - main] (Log4JLogger.java:96) Metadata has 
> jdbc-type of null yet this is not valid. Ignored
> 16:45:20.587 [SENTRY - WARN - main] (Log4JLogger.java:96) Metadata has 
> jdbc-type of null yet this is not valid. Ignored
> 16:45:22.701 [SENTRY - WARN - sentry-service] (Log4JLogger.java:96) Metadata 
> has jdbc-type of null yet this is not valid. Ignored
> Sentry service is ready to serve client requests  
>   
> W0409 16:45:23.483585 28712 client.h:346] Failed to connect to Sentry 
> (127.23.212.1:35614): Not authorized: failed to open Sentry connection: 
> Configuration file does not specify default realm
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/sentry_authz_provider-test.cc:686:
>  Failure
>   Expected: 1 
>   
> To be equal to: GetTasksSuccessful()  
>   
>   Which is: 0 
>   
> I0409 16:45:23.520182 24400 test_util.cc:135] 
> ---
> I0409 16:45:23.520295 24400 test_util.cc:136] Had fatal failures, leaving 
> test files at 
> /data/somelongdirectorytoavoidrpathissues/src/kudutest/sentry_authz_provider-test.1.TestSentryClientMetrics.Basic.1554853465542407-24400
> [  FAILED  ] TestSentryClientMetrics.Basic (11490 ms) 
>   
> [--] 1 test from TestSentryClientMetrics (11490 ms total) 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2766) Add metrics to HMS client for better observability

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2766:
--
Component/s: metrics

> Add metrics to HMS client for better observability
> --
>
> Key: KUDU-2766
> URL: https://issues.apache.org/jira/browse/KUDU-2766
> Project: Kudu
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 1.10.0
>Reporter: Alexey Serbin
>Assignee: Alexey Serbin
>Priority: Major
>
> It would be nice to add metrics into HMS client for better observability of 
> RPC communication between the client and HMS service. 
> The following changelist might be a useful reference:
>   https://gerrit.cloudera.org/12951/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2769) Investigate NotAuthorized responses from Sentry on ListPrivilegesByUser in case of non-existent user

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2769.
---
Fix Version/s: NA
   Resolution: Won't Fix

The Sentry integration is removed.

> Investigate NotAuthorized responses from Sentry on ListPrivilegesByUser in 
> case of non-existent user
> 
>
> Key: KUDU-2769
> URL: https://issues.apache.org/jira/browse/KUDU-2769
> Project: Kudu
>  Issue Type: Task
>Reporter: Alexey Serbin
>Priority: Major
> Fix For: NA
>
>
> It would be nice to clarify on the behavior of both Sentry and Kudu's wrapper 
> around Sentry's HA client in {{src/kudu/thrift/client.h}} in the case when 
> retrieving privileges for a non-existent user.  Right now it seems Sentry 
> responds with something that {{HaClient}} converts into 
> {{Status::NotAuthorized}}, and that error causes the client to re-connect to 
> the Sentry service (which is sub-optimal?).  So, a couple of questions to 
> clarify:
> * Is it a legit behavior from the Sentry side to responds with something 
> that's converted into {{Status::NotAuthorized}} by the {{HaClient}}?
> * Is it really necessary for the {{HaClien}} to reconnect to Sentry upon 
> 'sensing' {{Status::NotAuthorized}} status code from Sentry?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2756) RemoteKsckTest.TestClusterWithLocation failed with master consensus conflicts

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2756:
--
Component/s: test

> RemoteKsckTest.TestClusterWithLocation failed with master consensus conflicts
> -
>
> Key: KUDU-2756
> URL: https://issues.apache.org/jira/browse/KUDU-2756
> Project: Kudu
>  Issue Type: Test
>  Components: test
>Reporter: Hao Hao
>Priority: Major
> Attachments: ksck_remote-test.txt
>
>
> RemoteKsckTest.TestClusterWithLocation is still flaky after fix from 
> KUDU-2748 and failed with the following error.
> {noformat}
> I0401 16:42:06.135743 18496 sys_catalog.cc:340] T 
>  P 1afc84687f934a5a8055897bbf6c2a92 
> [sys.catalog]: This master's current role is: LEADER
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:542:
>  Failure
> Failed
> Bad status: Corruption: there are master consensus conflicts
> /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/test_util.cc:326:
>  Failure
> Failed
> Timed out waiting for assertion to pass.
> I0401 16:42:35.964449 12160 tablet_server.cc:165] TabletServer shutting 
> down...
> {noformat}
> Attached the full log.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2758) TLS socket writes in 16kb chunks with intervening epoll/setsockopt syscalls

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2758:
--
Labels: impala  (was: )

> TLS socket writes in 16kb chunks with intervening epoll/setsockopt syscalls
> ---
>
> Key: KUDU-2758
> URL: https://issues.apache.org/jira/browse/KUDU-2758
> Project: Kudu
>  Issue Type: Bug
>  Components: perf, rpc, security
>Reporter: Todd Lipcon
>Priority: Major
>  Labels: impala
>
> I noticed that krpc has the following syscall pattern:
> 
> {code}
>  rpc reactor-231 23122 [002] 35488410.994309: syscalls:sys_enter_epoll_wait: 
> epfd: 0x0007, events: 0x02137520, maxevents: 0x0040, timeout: 
> 0x0050
>  rpc reactor-231 23122 [002] 35488410.994310: syscalls:sys_exit_epoll_wait: 
> 0x1
>  rpc reactor-231 23122 [002] 35488410.994313: syscalls:sys_enter_setsockopt: 
> fd: 0x0011, level: 0x0006, optname: 0x0003, optval: 
> 0x7fc80910175c, optlen: 0x0004
>  rpc reactor-231 23122 [002] 35488410.994314: syscalls:sys_exit_setsockopt: 
> 0x0
>  rpc reactor-231 23122 [002] 35488410.994351: syscalls:sys_enter_write: fd: 
> 0x0011, buf: 0x7fc7e8059e93, count: 0x401d
>  rpc reactor-231 23122 [002] 35488410.994370: syscalls:sys_exit_write: 0x401d
>  rpc reactor-231 23122 [002] 35488410.994372: syscalls:sys_enter_setsockopt: 
> fd: 0x0011, level: 0x0006, optname: 0x0003, optval: 
> 0x7fc80910175c, optlen: 0x0004
>  rpc reactor-231 23122 [002] 35488410.994378: syscalls:sys_exit_setsockopt: 
> 0x0
> {code}
> This block of syscalls repeats in a pretty tight loop -- epoll_wait, CORK, 
> write, UNCORK. The writes are always 0x401d bytes (just more than 16kb). I 
> found the following in the ssl_write manpage:
> {quote}
> SSL_write() will only return with success, when the complete contents of buf 
> of length num has been written. This default behaviour can be changed with 
> the SSL_MODE_ENABLE_PARTIAL_WRITE option of ssl_ctx_set_mode(3). When this 
> flag is set, SSL_write() will also return with success, when a partial write 
> has been successfully completed. In this case the SSL_write() operation is 
> considered completed. The bytes are sent and a new SSL_write() operation with 
> a new buffer (with the already sent bytes removed) must be started. A partial 
> write is performed with the size of a message block, which is 16kB for 
> SSLv3/TLSv1.
> {quote}
> Seems likely we should be looping the writes before uncorking -- either until 
> we run into a temporary socket error or run out of stuff to write.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2754) Keep a maximum number of old log files

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2754:
--
Labels: beginner trivial  (was: )

> Keep a maximum number of old log files
> --
>
> Key: KUDU-2754
> URL: https://issues.apache.org/jira/browse/KUDU-2754
> Project: Kudu
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
>  Labels: beginner, trivial
>
> Kudu generates various different log files 
> (INFO,WARNING,ERROR,diagnostic,minidumps,etc). To prevent issues running out 
> of logging space, it would be nice if a user could configure the maximum 
> number of each log file type to keep.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KUDU-2751) Java tests that start an HMS client fail when run on JDK10+

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KUDU-2751.
---
Fix Version/s: 1.13.0
   Resolution: Fixed

Fixed via upgrading the Hive dependency. 

> Java tests that start an HMS client fail when run on JDK10+
> ---
>
> Key: KUDU-2751
> URL: https://issues.apache.org/jira/browse/KUDU-2751
> Project: Kudu
>  Issue Type: Bug
>  Components: java, test
>Affects Versions: 1.10.0
>Reporter: Adar Dembo
>Priority: Major
> Fix For: 1.13.0
>
>
> They may fail on JDK9 as well, with something like this:
> {noformat}
> MetaException(message:Got exception: java.lang.ClassCastException 
> java.base/[Ljava.lang.Object; cannot be cast to java.base/[Ljava.net.URI;)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1389)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:204)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:129)
>   at 
> org.apache.kudu.hive.metastore.TestKuduMetastorePlugin.setUp(TestKuduMetastorePlugin.java:108)
> {noformat}
> I tracked this down and filed HIVE-21508. We should see if we can find some 
> sort of workaround that isn't necessarily upgrading to a newer Hive artifact 
> (or maybe we should upgrade our Hive dependency).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2731) Getting column schema information from KuduSchema requires copying a KuduColumnSchema object

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2731:
--
Component/s: perf

> Getting column schema information from KuduSchema requires copying a 
> KuduColumnSchema object
> 
>
> Key: KUDU-2731
> URL: https://issues.apache.org/jira/browse/KUDU-2731
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf
>Affects Versions: 1.9.0
>Reporter: William Berkeley
>Priority: Major
>
> I'm looking at a CPU profile of Impala inserting into Kudu. 
> {{KuduTableSink::Send}} has code that schematically does the following:
> {noformat}
> for each row in the batch
>   for each column
> if (schema.Column(col_idx).isNullable()) {
>   write->mutable_row()->SetNull(col);
> }
>   }
> }
> {noformat}
> See 
> [kudu-table-sink.cc|https://github.com/apache/impala/blob/branch-3.1.0/be/src/exec/kudu-table-sink.cc#L236].
>  However, {{KuduSchema::Column}} copies the column schema and returns it by 
> value, so the if statement constructs and destroys a column schema object 
> just to check if the column is nullable.
> This is by far the biggest user of CPU in the Impala process (35% or so). The 
> workload might be I/O bound writing to Kudu anyway, though. Nevertheless, we 
> should provide a way to avoid this copying in the API, either by adding a 
> method like
> {noformat}
> class KuduSchema {
>   const KuduColumnSchema& get_column(int idx);
> }
> {noformat}
> or a method like
> {noformat}
> class KuduSchema {
>   bool is_column_nullable(int idx);
> }
> {noformat}
> The former is the most flexible while the latter frees the client from 
> worrying about holding the ref longer than the KuduColumnSchema object lives. 
> We might need to add a number of methods similar to the latter method to 
> cover other potentially useful things like checking encoding, type, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KUDU-2727) Contention on the Raft consensus lock can cause tablet service queue overflows

2020-06-03 Thread Grant Henke (Jira)


 [ 
https://issues.apache.org/jira/browse/KUDU-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2727:
--
Component/s: perf

> Contention on the Raft consensus lock can cause tablet service queue overflows
> --
>
> Key: KUDU-2727
> URL: https://issues.apache.org/jira/browse/KUDU-2727
> Project: Kudu
>  Issue Type: Improvement
>  Components: perf
>Reporter: William Berkeley
>Assignee: Mike Percy
>Priority: Major
>
> Here's stacks illustrating the phenomenon:
> {noformat}
>   tids=[2201]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb4e68e kudu::consensus::Peer::SignalRequest()
> 0xb9c0df kudu::consensus::PeerManager::SignalRequest()
> 0xb8c178 kudu::consensus::RaftConsensus::Replicate()
> 0xaab816 kudu::tablet::TransactionDriver::Prepare()
> 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[4515]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb74c60 kudu::consensus::RaftConsensus::NotifyCommitIndex()
> 0xb59307 kudu::consensus::PeerMessageQueue::NotifyObserversTask()
> 0xb54058 
> _ZN4kudu8internal7InvokerILi2ENS0_9BindStateINS0_15RunnableAdapterIMNS_9consensus16PeerMessageQueueEFvRKSt8functionIFvPNS4_24PeerMessageQueueObserverEEFvPS5_SC_EFvNS0_17UnretainedWrapperIS5_EEZNS5_34NotifyObserversOfCommitIndexChangeElEUlS8_E_EEESH_E3RunEPNS0_13BindStateBaseE
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[22185,22194,22193,22188,22187,22186]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
> 0xb8bff8 
> kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()
> 0xaaaef9 kudu::tablet::TransactionDriver::ExecuteAsync()
> 0xaa3742 kudu::tablet::TabletReplica::SubmitWrite()
> 0x92812d kudu::tserver::TabletServiceImpl::Write()
>0x1e28f3c kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2986a kudu::rpc::ServicePool::RunThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[22192,22191]
> 0x379ba0f710 
>0x1fb951a base::internal::SpinLockDelay()
>0x1fb93b7 base::SpinLock::SlowLock()
>0x1e13dec kudu::rpc::ResultTracker::TrackRpc()
>0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle()
>0x1e2986a kudu::rpc::ServicePool::RunThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
>   tids=[4426]
> 0x379ba0f710 
>0x206d3d0 
>0x212fd25 google::protobuf::Message::SpaceUsedLong()
>0x211dee4 
> google::protobuf::internal::GeneratedMessageReflection::SpaceUsedLong()
> 0xb6658e kudu::consensus::LogCache::AppendOperations()
> 0xb5c539 kudu::consensus::PeerMessageQueue::AppendOperations()
> 0xb5c7c7 kudu::consensus::PeerMessageQueue::AppendOperation()
> 0xb7c675 
> kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked()
> 0xb8c147 kudu::consensus::RaftConsensus::Replicate()
> 0xaab816 kudu::tablet::TransactionDriver::Prepare()
> 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask()
>0x1fa37ed kudu::ThreadPool::DispatchThread()
>0x1f9c2a1 kudu::Thread::SuperviseThread()
> 0x379ba079d1 start_thread
> 0x379b6e88fd clone
> {noformat}
> {{kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()}} needs to 
> take the lock to check the term and the Raft role. When many RPCs come in for 
> the same tablet, the contention can hog service threads and cause queue 
> overflows on busy systems.
> Yugabyte switched their equivalent lock to be an atomic that allows them to 
> read the term and role wait-free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


<    1   2   3   4   5   6   7   8   9   10   >