[jira] [Updated] (KUDU-3128) Cross compile Kudu Spark for Scala 2.11 & 2.12
[ https://issues.apache.org/jira/browse/KUDU-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3128: -- Component/s: build > Cross compile Kudu Spark for Scala 2.11 & 2.12 > -- > > Key: KUDU-3128 > URL: https://issues.apache.org/jira/browse/KUDU-3128 > Project: Kudu > Issue Type: Improvement > Components: build, spark >Affects Versions: 1.12.0 >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > > Currently we only publish kudu-spark for Scala 2.11. We should also compile > and publish a 2.12 version of the kudu-spark integration now that Spark > supports Scala 2.12. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3131) test rw_mutex-test hangs sometimes if build_type is release
[ https://issues.apache.org/jira/browse/KUDU-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3131: -- Component/s: test > test rw_mutex-test hangs sometimes if build_type is release > --- > > Key: KUDU-3131 > URL: https://issues.apache.org/jira/browse/KUDU-3131 > Project: Kudu > Issue Type: Sub-task > Components: test >Reporter: huangtianhua >Priority: Major > > Built and test kudu on aarch64, in release mode there is a test hangs > sometimes(maybe a deadlock?) the console out as following: > [==] Running 2 tests from 1 test case. > [--] Global test environment set-up. > [--] 2 tests from Priorities/RWMutexTest > [ RUN ] Priorities/RWMutexTest.TestDeadlocks/0 > And seems it's ok in debug mode. > Now only this one test failed sometimes on aarch64, [~aserbin] [~adar] would > you please have a look for this? Or give some suggestion to us, thanks very > much. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3123) tracing.html doesn't render on newer browsers
[ https://issues.apache.org/jira/browse/KUDU-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3123: -- Labels: supportability (was: ) > tracing.html doesn't render on newer browsers > - > > Key: KUDU-3123 > URL: https://issues.apache.org/jira/browse/KUDU-3123 > Project: Kudu > Issue Type: Bug > Components: ui >Reporter: Andrew Wong >Priority: Major > Labels: supportability > > I tried opening the tracing.html page using Google Chrome Version > 81.0.4044.138, and the page was blank. Upon inspecting, seems like Chrome no > longer supports {{registerElement}} > {code:java} > tracing.js:31 Uncaught TypeError: document.registerElement is not a function > at tracing.js:31 > at tracing.js:31 {code} > This was reported to the Chromium project as > [https://bugs.chromium.org/p/chromium/issues/detail?id=1036492], which has > been closed. We should update the trace viewer version in thirdparty to > include whatever fixes are necessary, since tracing is pretty valuable in a > pinch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3118) Validate --tserver_enforce_access_control is set when authorization is enabled in Master
[ https://issues.apache.org/jira/browse/KUDU-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3118: -- Component/s: security authz > Validate --tserver_enforce_access_control is set when authorization is > enabled in Master > - > > Key: KUDU-3118 > URL: https://issues.apache.org/jira/browse/KUDU-3118 > Project: Kudu > Issue Type: Task > Components: authz, security >Reporter: Hao Hao >Priority: Minor > > As mentioned in the code review > [https://gerrit.cloudera.org/c/15897/1/docs/security.adoc#476], it would be > nice to add some validation (maybe in ksck or something) that this is set if > fine-grained authorization is enabled on the master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3111) Make IWYU processes freestanding headers
[ https://issues.apache.org/jira/browse/KUDU-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3111: -- Component/s: build > Make IWYU processes freestanding headers > > > Key: KUDU-3111 > URL: https://issues.apache.org/jira/browse/KUDU-3111 > Project: Kudu > Issue Type: Improvement > Components: build >Affects Versions: 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.10.1, 1.11.0, 1.12.0, > 1.11.1 >Reporter: Alexey Serbin >Priority: Major > > When working out of the compilation database, IWYU processes only associated > headers, i.e. {{.h}} files that pair corresponding {{.cc}} files. It would > be nice to make IWYU processing so-called freestanding header files. [This > thread|https://github.com/include-what-you-use/include-what-you-use/issues/268] > contains very useful information on the topic. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3109) Log administrative operations
[ https://issues.apache.org/jira/browse/KUDU-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3109: -- Component/s: security > Log administrative operations > - > > Key: KUDU-3109 > URL: https://issues.apache.org/jira/browse/KUDU-3109 > Project: Kudu > Issue Type: Task > Components: security >Reporter: Attila Bukor >Priority: Minor > > Sometimes it's impossible to determine what caused an issue when > administrators run unsafe commands on the cluster. Logging these in an audit > log would help. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3090) Add owner concept in Kudu
[ https://issues.apache.org/jira/browse/KUDU-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3090: -- Component/s: security authz > Add owner concept in Kudu > - > > Key: KUDU-3090 > URL: https://issues.apache.org/jira/browse/KUDU-3090 > Project: Kudu > Issue Type: New Feature > Components: authz, security >Reporter: Hao Hao >Assignee: Attila Bukor >Priority: Major > Labels: roadmap-candidate > > As mentioned in the Ranger integration design doc, Ranger supports ownership > privilege by creating a default policy that allows \{OWNER} of a resource to > access it without creating additional policy manually. Unless Kudu actually > has a full support for owner, ownership privilege is not possible with Ranger > integration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KUDU-3091) Support ownership privilege with Ranger
[ https://issues.apache.org/jira/browse/KUDU-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke reassigned KUDU-3091: - Assignee: Attila Bukor > Support ownership privilege with Ranger > --- > > Key: KUDU-3091 > URL: https://issues.apache.org/jira/browse/KUDU-3091 > Project: Kudu > Issue Type: Task >Reporter: Hao Hao >Assignee: Attila Bukor >Priority: Major > > Currently, ownership privilege in Ranger is not available as Kudu has no > concept of owner, and does not store owner information internally. It would > be nice to enable it once Kudu introduces owner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3091) Support ownership privilege with Ranger
[ https://issues.apache.org/jira/browse/KUDU-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3091: -- Component/s: security ranger authz > Support ownership privilege with Ranger > --- > > Key: KUDU-3091 > URL: https://issues.apache.org/jira/browse/KUDU-3091 > Project: Kudu > Issue Type: Task > Components: authz, ranger, security >Reporter: Hao Hao >Assignee: Attila Bukor >Priority: Major > > Currently, ownership privilege in Ranger is not available as Kudu has no > concept of owner, and does not store owner information internally. It would > be nice to enable it once Kudu introduces owner. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3089) ERROR when running tests on ARM64 server with TSAN or ASAN enabled
[ https://issues.apache.org/jira/browse/KUDU-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3089: -- Component/s: test > ERROR when running tests on ARM64 server with TSAN or ASAN enabled > -- > > Key: KUDU-3089 > URL: https://issues.apache.org/jira/browse/KUDU-3089 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: liusheng >Priority: Major > > > Fow now I am trying to build and test Kudu on ARM server, for Debug build > type, the building process and almost all tests can pass. but for TSAN and > ASAN building type, I can build sucessfully, but all the test cases will > raise follow error. > {code:java} > root@kudu-asan2:/opt/kudu/build/asan# bin/kudu-ts-cli-test > AddressSanitizer:DEADLYSIGNAL > = > ==14378==ERROR: AddressSanitizer: SEGV on unknown address 0x (pc > 0x bp 0xc2649d10 sp 0xc2649d10 T0) > ==14378==Hint: pc points to the zero page. > ==14378==The signal is caused by a READ memory access. > ==14378==Hint: address points to the zero page.AddressSanitizer can not > provide additional info. > SUMMARY: AddressSanitizer: SEGV () > ==14378==ABORTING > {code} > I have struggle for a while but no progress, could anyone help to give any > suggestion ? thanks a lot! > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3084) Multiple time sources with fallback behavior between them
[ https://issues.apache.org/jira/browse/KUDU-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3084: -- Labels: clock roadmap-candidate usability (was: clock) > Multiple time sources with fallback behavior between them > - > > Key: KUDU-3084 > URL: https://issues.apache.org/jira/browse/KUDU-3084 > Project: Kudu > Issue Type: Improvement > Components: master, tserver >Reporter: Alexey Serbin >Priority: Major > Labels: clock, roadmap-candidate, usability > > [~tlipcon] suggested an alternative approach to configure and select > HybridClock's time source. > Kudu servers could maintain multiple time sources and switch between them > with a fallback behavior. The default or preferred time source might be any > of the existing ones (e.g., the built-in client), but when it's not > available, another available time source is selected (e.g., {{system}} -- the > NTP-synchronized local clock). Switching between time sources can be done: > * only upon startup/initialization > * upon startup/initialization and later during normal run time > The advantages are: > * easier deployment and configuration of Kudu clusters > * simplified upgrade path from older releases using {{system}} time source to > newer releases using {{builtin}} time source by default > There are downsides, though. Since the new way of maintaining time source is > more dynamic, it can: > * mask various configuration or network issues > * result in different time source within the same Kudu cluster due to > transient issues > * introduce extra startup delay -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3077) Have client scanners prune the default projection based on the contents of their authz tokens
[ https://issues.apache.org/jira/browse/KUDU-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3077: -- Labels: usability (was: ) > Have client scanners prune the default projection based on the contents of > their authz tokens > - > > Key: KUDU-3077 > URL: https://issues.apache.org/jira/browse/KUDU-3077 > Project: Kudu > Issue Type: Improvement > Components: client, security >Reporter: Andrew Wong >Priority: Major > Labels: usability > > Today, if a scan is sent that contains a column that, per the sender's authz > token, the sender isn't authorized to see, the entire scan is rejected. This > is all well and good, but users may not be privy to what columns they are or > aren't allowed to scan. So, when the default projection is used (which scans > all columns), the scan is bound to be rejected if there are any privilege > restrictions. > It'd be significantly more user-friendly if clients opaquely pruned the > default projection of unauthorized columns so that (assuming the authz token > is valid) default scans always succeed with just the columns the user is > authorized to see. > Special care should be taken for if the user has no column privileges though; > passing an empty projection is taken to return the count of rows (which > requires the same privileges as {{COUNT(*)}} which requires the same > privileges as {{SELECT(*)}}, i.e. {{SELECT ON TABLE}}) rather than an empty > set of rows. In such a case, clients should probably fail immediately, since > there are no table privileges an no column privileges in the authz token so > any scan would be bound to fail. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3073) BuiltinNtpWithMiniChronydTest.SyncAndUnsyncReferenceServers sometimes fails
[ https://issues.apache.org/jira/browse/KUDU-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3073: -- Component/s: test > BuiltinNtpWithMiniChronydTest.SyncAndUnsyncReferenceServers sometimes fails > --- > > Key: KUDU-3073 > URL: https://issues.apache.org/jira/browse/KUDU-3073 > Project: Kudu > Issue Type: Bug > Components: test >Affects Versions: 1.12.0 >Reporter: Alexey Serbin >Priority: Major > Attachments: ntp-test.txt.xz > > > {noformat} > src/kudu/clock/ntp-test.cc:478: Failure > Value of: s.IsRuntimeError() > > Actual: false > > Expected: true > > OK > > src/kudu/clock/ntp-test.cc:595: Failure > Expected: CheckNoNtpSource(sync_servers_refs) doesn't generate new fatal > failures in the current thread. > Actual: it does. > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3076) Add a Kudu cli for granting/revoking Ranger privileges
[ https://issues.apache.org/jira/browse/KUDU-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3076: -- Component/s: security ops-tooling ranger > Add a Kudu cli for granting/revoking Ranger privileges > -- > > Key: KUDU-3076 > URL: https://issues.apache.org/jira/browse/KUDU-3076 > Project: Kudu > Issue Type: Task > Components: ops-tooling, ranger, security >Reporter: Hao Hao >Priority: Major > > Even though Ranger has a GUI for policies management (and can be accessed via > REST API), it probably will be more user friendly to have a Kudu cli tool for > granting and revoking privileges. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3060) Add a tool to identify potential performance bottlenecks
[ https://issues.apache.org/jira/browse/KUDU-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3060: -- Labels: roadmap-candidate (was: ) > Add a tool to identify potential performance bottlenecks > > > Key: KUDU-3060 > URL: https://issues.apache.org/jira/browse/KUDU-3060 > Project: Kudu > Issue Type: Improvement > Components: CLI, perf, ui >Reporter: Andrew Wong >Priority: Major > Labels: roadmap-candidate > > When we hear users wondering why their workloads are slower than expected, > some common questions arise. It'd be great if we had a single tool (or a > single webpage) that aggregated and displayed useful information for a > specific tablet or table. Things like, for a specific table: > - How many partitions and replicas exist for the table. > - For those replicas, how they are distributed across tablet servers. > - For those tablet servers, what the block cache configuration is, and what > the current block cache stats (hit ratio, evictions, etc) are. > - For those tablet servers, which tablets have been written to recently. > - For those tablet servers, which tablets within the target table have been > written to recently. > - For those tablet servers, how many active and non-expired scanners exist. > - For those tablet servers, which tablets within the target table have been > read from recently. > - For those tablet servers, how many ongoing tablet copies there are both to > and from the server. > - For those tablet servers, how many data directories there are. > - For the data directories on those tablet servers, how many replicas are > spreading data in each directory, how many blocks there are in each, and how > much space is available in each. > The list could go on and on. It probably makes sense to break the diagnostics > into different phases or goals, maybe along the lines of 1) identifying > hotspots of workloads and lag across tablet servers (e.g. a ton of writes > going to a single tserver), and 2) digging into a single tablet server to > understand how it's provisioned and whether that provisioning is sufficient. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3055) Lazily open cold tablets
[ https://issues.apache.org/jira/browse/KUDU-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3055: -- Labels: scalability (was: ) > Lazily open cold tablets > > > Key: KUDU-3055 > URL: https://issues.apache.org/jira/browse/KUDU-3055 > Project: Kudu > Issue Type: New Feature > Components: master, tablet, tserver >Reporter: Andrew Wong >Priority: Major > Labels: scalability > > It might be useful in larger deployments to have the ability to lazily > bootstrap cold tablets. > Currently, WAL replay consumes a significant amount of bootstrapping time. If > we know certain tablets are read infrequently, we ought to be able to > indicate that we only want to bootstrap and replay the WALs for tablets that > have been accessed recently. > [This patch|https://github.com/apache/kudu/commit/ca957fb] gave us a metric > for hotness and coldness at the replica level -- we might want to consider > aggregating this on the master to determine what partitions are hot and cold, > and have the master signal to the appropriate tablet servers upon registering > that certain replicas should be bootstrapped. We might also consider > bootstrapping when a client first calls {{OpenTable()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3054) Init kudu.write_duration accumulator lazily
[ https://issues.apache.org/jira/browse/KUDU-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-3054. --- Fix Version/s: NA Resolution: Duplicate > Init kudu.write_duration accumulator lazily > --- > > Key: KUDU-3054 > URL: https://issues.apache.org/jira/browse/KUDU-3054 > Project: Kudu > Issue Type: Improvement > Components: spark >Affects Versions: 1.9.0 >Reporter: liupengcheng >Priority: Major > Fix For: NA > > Attachments: durationHisto_large.png, durationhisto.png, > read_kudu_and_shuffle.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, we encountered a issue in kudu-spark that will causing spark sql > query failure: > ``` > Job aborted due to stage failure: Total size of serialized results of 942 > tasks (2.0 GB) is bigger than spark.driver.maxResultSize (2.0 GB) > ``` > After carefully debug, we find out that it's the kudu.write_duration > accumulators causing single spark task larger than 2M, thus all tasks size of > the stage will bigger than the limit. > However, this stage is just reading kudu table and do shuffle exchange, no > writing any kudu tables. > So I think should init this accumulator lazily in KuduContext to avoid such > issues. > !https://issues.apache.org/jira/secure/attachment/12993451/durationHisto_large.png! > > !https://issues.apache.org/jira/secure/attachment/12993452/durationhisto.png! > !https://issues.apache.org/jira/secure/attachment/12993453/read_kudu_and_shuffle.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3041) Kudu Java client shade is incomplete
[ https://issues.apache.org/jira/browse/KUDU-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3041: -- Component/s: build > Kudu Java client shade is incomplete > > > Key: KUDU-3041 > URL: https://issues.apache.org/jira/browse/KUDU-3041 > Project: Kudu > Issue Type: Bug > Components: build, client >Affects Versions: 1.11.1 >Reporter: Ismaël Mejía >Priority: Major > > While working on an update of the Kudu integration on Apache Beam BEAM-5086 > We found this issue. We use [tool to test for linkage > errors|https://github.com/GoogleCloudPlatform/cloud-opensource-java] and it > reports the classes that are missing but required by other classes. > This is the result for the kudu-client case: > {code:java} > Class javax.servlet.ServletOutputStream is not found; > referenced by 1 class file > > org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet > (kudu-client-1.11.1.jar) > Class javax.servlet.http.HttpServlet is not found; > referenced by 1 class file > > org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet > (kudu-client-1.11.1.jar) > Class javax.servlet.ServletException is not found; > referenced by 1 class file > > org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet > (kudu-client-1.11.1.jar) > Class javax.servlet.ServletConfig is not found; > referenced by 1 class file > > org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet > (kudu-client-1.11.1.jar) > Class javax.servlet.http.HttpServletRequest is not found; > referenced by 1 class file > > org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet > (kudu-client-1.11.1.jar) > Class javax.servlet.http.HttpServletResponse is not found; > referenced by 1 class file > > org.apache.kudu.shaded.org.jboss.netty.channel.socket.http.HttpTunnelingServlet > (kudu-client-1.11.1.jar) > Class org.jboss.marshalling.ByteInput is not found; > referenced by 4 class files > > org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.ChannelBufferByteInput > (kudu-client-1.11.1.jar) > > org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.LimitingByteInput > (kudu-client-1.11.1.jar) > > org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.ChannelBufferByteInput > (beam-vendor-grpc-1_21_0-0.1.jar) > > org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.LimitingByteInput > (beam-vendor-grpc-1_21_0-0.1.jar) > Class org.jboss.marshalling.ByteOutput is not found; > referenced by 2 class files > > org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.ChannelBufferByteOutput > (kudu-client-1.11.1.jar) > > org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.ChannelBufferByteOutput > (beam-vendor-grpc-1_21_0-0.1.jar) > Class org.jboss.marshalling.Unmarshaller is not found; > referenced by 8 class files > > org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.CompatibleMarshallingDecoder > (kudu-client-1.11.1.jar) > > org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.ContextBoundUnmarshallerProvider > (kudu-client-1.11.1.jar) > > org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.MarshallingDecoder > (kudu-client-1.11.1.jar) > > org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.ThreadLocalUnmarshallerProvider > (kudu-client-1.11.1.jar) > > org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.MarshallingDecoder > (beam-vendor-grpc-1_21_0-0.1.jar) > > org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.CompatibleMarshallingDecoder > (beam-vendor-grpc-1_21_0-0.1.jar) > > org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.ThreadLocalUnmarshallerProvider > (beam-vendor-grpc-1_21_0-0.1.jar) > > org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.ContextBoundUnmarshallerProvider > (beam-vendor-grpc-1_21_0-0.1.jar) > Class org.jboss.marshalling.Marshaller is not found; > referenced by 6 class files > > org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.CompatibleMarshallingEncoder > (kudu-client-1.11.1.jar) > > org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.MarshallingEncoder > (kudu-client-1.11.1.jar) > > org.apache.kudu.shaded.org.jboss.netty.handler.codec.marshalling.ThreadLocalMarshallerProvider > (kudu-client-1.11.1.jar) > > org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.CompatibleMarshallingEncoder > (beam-vendor-grpc-1_21_0-0.1.jar) > > org.apache.beam.vendor.grpc.v1p21p0.io.netty.handler.codec.marshalling.MarshallingEncoder > (beam-vendor-grpc-1_21_0-0.1.jar) > > org.apache.beam.vend
[jira] [Updated] (KUDU-3037) HMS notification log listener runs in follower masters
[ https://issues.apache.org/jira/browse/KUDU-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3037: -- Labels: trivial (was: ) > HMS notification log listener runs in follower masters > -- > > Key: KUDU-3037 > URL: https://issues.apache.org/jira/browse/KUDU-3037 > Project: Kudu > Issue Type: Improvement > Components: hms, master >Affects Versions: 1.11.1 >Reporter: Adar Dembo >Priority: Major > Labels: trivial > > Besides wasting resources, this also emits the same log message every second: > {noformat} > I0108 21:05:07.748235 1443 hms_notification_log_listener.cc:227] Skipping > Hive Metastore notification log poll: Illegal state: Not the leader. Local > UUID: c77bd888023149729481b7fd041b5c83, Raft Consensus state: current_term: 6 > committed_config { opid_index: -1 OBSOLETE_local: false peers { > permanent_uuid: "12e9881f0c094a38b83c3b00f71a0ef1" member_type: VOTER > last_known_addr { host: "127.0.60.126" port: 43289 } } peers { > permanent_uuid: "c77bd888023149729481b7fd041b5c83" member_type: VOTER > last_known_addr { host: "127.0.60.125" port: 38761 } } peers { > permanent_uuid: "c362d38459074310a6fbd3da10153538" member_type: VOTER > last_known_addr { host: "127.0.60.124" port: 34903 } } } > {noformat} > We could throttle the log message, though perhaps the more complete solution > is to only activate the log listener in the leader master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3034) kudu 查询性能问题,39万条数据,查询要89秒
[ https://issues.apache.org/jira/browse/KUDU-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-3034. --- Fix Version/s: NA Resolution: Incomplete > kudu 查询性能问题,39万条数据,查询要89秒 > - > > Key: KUDU-3034 > URL: https://issues.apache.org/jira/browse/KUDU-3034 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.7.1 >Reporter: SeaAndHill >Priority: Major > Fix For: NA > > Attachments: memory.jpg, query.jpg, threads.jpg > > > 配置参考截图,设置了50G内存,数据占用大小是9M,但是查询需要89秒 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3033) Add min/max values for the non-primary key columns in the metadata of rowsets/datablocks
[ https://issues.apache.org/jira/browse/KUDU-3033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3033: -- Component/s: perf > Add min/max values for the non-primary key columns in the metadata of > rowsets/datablocks > > > Key: KUDU-3033 > URL: https://issues.apache.org/jira/browse/KUDU-3033 > Project: Kudu > Issue Type: New Feature > Components: cfile, perf, tablet >Reporter: LiFu He >Priority: Major > > It's possible to add min/max values for the non-primary key columns in the > metadata of diskrowset/datablock, and then we can skip decoding/evaluating > the unnecessary diskrowset/datablock while scanning. Just like the "compute > stats" feature on impala, and the only difference is that kudu supports > updates. So, the min/max values should be invalid if the columns that have > deltas while scanning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3024) kudu-master: scale test for high number of simultanenous updates on tables metadata
[ https://issues.apache.org/jira/browse/KUDU-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3024: -- Component/s: test perf > kudu-master: scale test for high number of simultanenous updates on tables > metadata > --- > > Key: KUDU-3024 > URL: https://issues.apache.org/jira/browse/KUDU-3024 > Project: Kudu > Issue Type: Test > Components: perf, test >Reporter: Alexey Serbin >Priority: Major > > It would be nice to have a scale test to verify how Kudu masters behave when > there is a high number of almost simultaneous updates in tables' metadata > (e.g., ALTER table to add a new partition, add new columns, drop columns, > etc.). At least, it would be able to spot issues like KUDU-3016. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3024) kudu-master: scale test for high number of simultanenous updates on tables metadata
[ https://issues.apache.org/jira/browse/KUDU-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3024: -- Labels: benchmarks (was: ) > kudu-master: scale test for high number of simultanenous updates on tables > metadata > --- > > Key: KUDU-3024 > URL: https://issues.apache.org/jira/browse/KUDU-3024 > Project: Kudu > Issue Type: Test > Components: perf, test >Reporter: Alexey Serbin >Priority: Major > Labels: benchmarks > > It would be nice to have a scale test to verify how Kudu masters behave when > there is a high number of almost simultaneous updates in tables' metadata > (e.g., ALTER table to add a new partition, add new columns, drop columns, > etc.). At least, it would be able to spot issues like KUDU-3016. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3025) Add metric for the open file descriptors usage vs the limit
[ https://issues.apache.org/jira/browse/KUDU-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3025: -- Component/s: metrics > Add metric for the open file descriptors usage vs the limit > --- > > Key: KUDU-3025 > URL: https://issues.apache.org/jira/browse/KUDU-3025 > Project: Kudu > Issue Type: Improvement > Components: master, metrics, tserver >Reporter: Alexey Serbin >Priority: Major > Labels: Availability, observability, scalability > > In the case of even replica distribution across all available nodes, once one > tablet server hits the maximum number of open file descriptors and go down > (e.g., upon hosting another tablet replica), the system will automatically > re-replicate tablet replicas from the tablet server, most likely bringing > other tablet servers down as well. That's a cascading failure scenario that > nobody wants to experience. > Monitoring the number of open file descriptors vs the limit can help to > prevent full Kudu cluster outage in such case, if operators are given a > chance to handle those situations proactively. Once some threshold is > reached (e.g., 90%), an operator could update the limit via corresponding > {{ulimit}} setting, preventing an outage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3020) Add metric gauges for the size of incoming RPC request payload and number of RPC rejections due to payload size
[ https://issues.apache.org/jira/browse/KUDU-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3020: -- Component/s: metrics > Add metric gauges for the size of incoming RPC request payload and number of > RPC rejections due to payload size > --- > > Key: KUDU-3020 > URL: https://issues.apache.org/jira/browse/KUDU-3020 > Project: Kudu > Issue Type: Improvement > Components: master, metrics, tserver >Reporter: Alexey Serbin >Priority: Major > Labels: guidelines, observability, scalability, troubleshooting > > Kudu servers have a limit on the size of RPC's payload they accept: > {{\-\-rpc_max_message_size}}. > It would be nice to introduce corresponding metrics to gauge how relevant the > current setting for the maximum RPC size is with regard to the incoming > requests. That can help with pro-active tuning of a Kudu cluster to sustain > a planned increase in workload. This is especially useful for tuning > parameters of Kudu masters to accommodate higher number of tables/tablets in > a cluster (e.g., adding new tables or creating new partitions for already > existing tables). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3018) Add tests for CLI tools
[ https://issues.apache.org/jira/browse/KUDU-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-3018. --- Fix Version/s: NA Resolution: Fixed > Add tests for CLI tools > --- > > Key: KUDU-3018 > URL: https://issues.apache.org/jira/browse/KUDU-3018 > Project: Kudu > Issue Type: Task >Reporter: Attila Bukor >Priority: Major > Fix For: NA > > > Currently the CLI tools are barely tested so it's easy to miss something when > introducing new features that won't be supported by the tooling or > potentially breaking the tooling. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3021) Add metric gauges for the size of transactions applied to tablets and number of rejected transactions due to their size
[ https://issues.apache.org/jira/browse/KUDU-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3021: -- Component/s: metrics > Add metric gauges for the size of transactions applied to tablets and number > of rejected transactions due to their size > --- > > Key: KUDU-3021 > URL: https://issues.apache.org/jira/browse/KUDU-3021 > Project: Kudu > Issue Type: Improvement > Components: master, metrics, tserver >Reporter: Alexey Serbin >Assignee: ZhangYao >Priority: Major > Labels: observability, scalability, troubleshooting > > Kudu servers have a limit on the size of transaction applied to a tablet: > {{\-\-tablet_transaction_memory}} > It would be nice to introduce corresponding metrics to gauge how relevant the > current setting is for the actual size of transactions applied in a running > Kudu cluster. That can help with pro-active tuning of a Kudu cluster to > sustain a planned increase in workload. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3013) Race in StopTabletITest.TestStoppedTabletsDontWrite
[ https://issues.apache.org/jira/browse/KUDU-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3013: -- Component/s: test > Race in StopTabletITest.TestStoppedTabletsDontWrite > --- > > Key: KUDU-3013 > URL: https://issues.apache.org/jira/browse/KUDU-3013 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: LiFu He >Priority: Major > Attachments: > jenkins-slave.1575252039.26703.311237e4f4a39e5fea3b175fbf12d3e4aa8674dc.81.0-artifacts.zip > > > I met this issue on Jenkins this morning, and it seems there is a race in > StopTabletITest.TestStoppedTabletsDontWrite. > {code:java} > // code placeholder > TransactionDriver::ApplyTask()Tablet::Stop() > | | > transaction_->Apply() | > | | > tablet->ApplyRowOperations(state()) | > (RESERVED -> APPLYING) | > | | > StartApplying(tx_state); | > | > set_state_unlocked(kStopped); > ApplyRowOperation() | > | | > CheckHasNotBeenStoppedUnlocked()| > (return error since the tablet has been stopped)| > | | > HandleFailure(s)| > | | > transaction_->Finish(Transaction::ABORTED); | > | | > state()->CommitOrAbort(result); | > | | > ReleaseMvccTxn(result); | > | | > mvcc_tx_->Abort(); | > | | > manager_->AbortTransaction(timestamp_); | > | | > if (PREDICT_FALSE(!is_open())) | > | mvcc_.Close(); > | | > | open_.store(false); > CHECK_EQ(old_state, RESERVED) | >(ASSERT failed) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3012) Throttle "Applying an operation in a closed session" warning log
[ https://issues.apache.org/jira/browse/KUDU-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3012: -- Labels: newbie trivial (was: newbie) > Throttle "Applying an operation in a closed session" warning log > > > Key: KUDU-3012 > URL: https://issues.apache.org/jira/browse/KUDU-3012 > Project: Kudu > Issue Type: Improvement > Components: client >Affects Versions: 1.9.0 >Reporter: Grant Henke >Priority: Major > Labels: newbie, trivial > > In NIFI-6895 it was reported that the log warning about applying an operation > in a closed session is occurring millions of times. Of course the NiFi > integration should fix the underlying issue, but we should throttle this > message to ensure it doesn't overflow logs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3003) TestAsyncKuduSession.testTabletCacheInvalidatedDuringWrites is flaky
[ https://issues.apache.org/jira/browse/KUDU-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3003: -- Component/s: test > TestAsyncKuduSession.testTabletCacheInvalidatedDuringWrites is flaky > > > Key: KUDU-3003 > URL: https://issues.apache.org/jira/browse/KUDU-3003 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: Hao Hao >Priority: Minor > Attachments: test-output.txt > > > testTabletCacheInvalidatedDuringWrites of the > org.apache.kudu.client.TestAsyncKuduSession test sometimes fails with an > error like below. I attached full test log. > {noformat} > There was 1 failure: > 1) > testTabletCacheInvalidatedDuringWrites(org.apache.kudu.client.TestAsyncKuduSession) > org.apache.kudu.client.PleaseThrottleException: all buffers are currently > flushing > at > org.apache.kudu.client.AsyncKuduSession.apply(AsyncKuduSession.java:579) > at > org.apache.kudu.client.TestAsyncKuduSession.testTabletCacheInvalidatedDuringWrites(TestAsyncKuduSession.java:371) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2988) built-in NTP client: sometimes minichronyd fails to start with address already in use
[ https://issues.apache.org/jira/browse/KUDU-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2988: -- Component/s: test > built-in NTP client: sometimes minichronyd fails to start with address > already in use > - > > Key: KUDU-2988 > URL: https://issues.apache.org/jira/browse/KUDU-2988 > Project: Kudu > Issue Type: Sub-task > Components: clock, ntp-client, test >Affects Versions: 1.11.0 >Reporter: Adar Dembo >Priority: Major > Labels: clock > > From time to time some tests that use the built-in NTP client and MiniChronyd > fail. The failure usually looks like this: > {noformat} > I1013 22:02:28.429188 23364 hybrid_clock.cc:162] waiting up to > --ntp_initial_sync_wait_secs=10 seconds for the clock to synchronize > I1013 22:02:38.430480 23364 builtin_ntp.cc:552] server 127.16.18.212:42817: > addresses=127.16.18.212:42817 current_address=127.16.18.212:42817 > i_pkt_total_num=0 i_pkt_valid_num=0 o_pkt_total_num=20 o_pkt_timedout_num=14 > is_synchronized=false > last_mono=0 > last_wall=0 > last_error=0 > now_mono=264459136130 > F1013 22:02:38.430524 23364 master_main.cc:105] Check failed: _s.ok() Bad > status: Service unavailable: Cannot initialize clock: timed out waiting for > clock synchronisation: wallclock is not synchronized: no valid NTP responses > yet > *** Check failure stack trace: *** > {noformat} > I've also seen one failure like this: > {noformat} > [ RUN ] BlockManagerType/TsRecoveryITest.TestCrashDuringLogReplay/0 > 2019-11-01T00:37:31Z chronyd version 3.5 starting (+CMDMON +NTP +REFCLOCK > +RTC -PRIVDROP -SCFILTER -SIGND +ASYNCDNS -SECHASH -IPV6 +DEBUG) > 2019-11-01T00:37:32Z Disabled control of system clock > Could not open connection to daemon > W1101 00:37:32.553658 235 mini_chronyd.cc:189] Time spent starting chronyd: > real 1.110s user 0.000s sys 0.011s > 2019-11-01T00:37:32Z chronyd exiting > /home/jenkins-slave/workspace/kudu-master/2/src/kudu/integration-tests/external_mini_cluster-itest-base.cc:66: > Failure > Failed > Bad status: Timed out: failed to start NTP server 0: failed to contact > chronyd in 1.000s > /home/jenkins-slave/workspace/kudu-master/2/src/kudu/integration-tests/ts_recovery-itest.cc:428: > Failure > Expected: StartClusterOneTs({ "--fault_crash_during_log_replay=0.05" }) > doesn't generate new fatal failures in the current thread. > Actual: it does. > I1101 00:37:32.564697 235 external_mini_cluster-itest-base.cc:80] Found > fatal failure > I1101 00:37:32.566167 235 test_util.cc:136] > --- > I1101 00:37:32.566222 235 test_util.cc:137] Had fatal failures, leaving > test files at > /tmp/dist-test-taskBEeRDj/test-tmp/ts_recovery-itest.0.BlockManagerType_TsRecoveryITest.TestCrashDuringLogReplay_0.1572568627695135-235 > [ FAILED ] BlockManagerType/TsRecoveryITest.TestCrashDuringLogReplay/0, > where GetParam() = "file" (1125 ms) > {noformat} > In the first case it's pretty odd that despite the fact that MiniChronyd > failed to bind to a socket, we managed to convince ourselves that it was up > and running. My running theory is that chronyd doesn't exit when this happens > and still listens on the UNIX domain socket for our "server stats" query, > which then convinces us that chronyd is alive and well. > Anyway, this is the cause of some test flakiness, so we should look into it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2985) CreateTableITest.TestSpreadReplicasEvenlyWithDimension scenario is flaky
[ https://issues.apache.org/jira/browse/KUDU-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2985: -- Component/s: test > CreateTableITest.TestSpreadReplicasEvenlyWithDimension scenario is flaky > > > Key: KUDU-2985 > URL: https://issues.apache.org/jira/browse/KUDU-2985 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: Alexey Serbin >Priority: Minor > Attachments: create-table-itest.txt.xz > > > Sometimes the {{CreateTableITest.TestSpreadReplicasEvenlyWithDimension}} > scenario fails (RELEASE build): > {noformat} > I1024 19:34:26.842279 7231 create-table-itest.cc:357] stddev = 3.03315 > > src/kudu/integration-tests/create-table-itest.cc:358: Failure > Expected: (stddev) <= (3.0), actual: 3.03315 vs 3 > {noformat} > Full log file is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2984) memory_gc-itest is flaky
[ https://issues.apache.org/jira/browse/KUDU-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2984: -- Component/s: test > memory_gc-itest is flaky > > > Key: KUDU-2984 > URL: https://issues.apache.org/jira/browse/KUDU-2984 > Project: Kudu > Issue Type: Bug > Components: test >Affects Versions: 1.11.0, 1.12.0 >Reporter: Alexey Serbin >Assignee: Yingchun Lai >Priority: Minor > Attachments: memory_gc-itest.txt.xz > > > The {{memory_gc-itest}} fails time to time with the following error message > (DEBUG build): > {noformat} > src/kudu/integration-tests/memory_gc-itest.cc:117: Failure > Expected: (ratio) >= (0.1), actual: 0.0600604 vs 0.1 > tserver-2 > src/kudu/util/test_util.cc:339: Failure > Failed > Timed out waiting for assertion to pass. > {noformat} > The full log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2975) Spread WAL across multiple data directories
[ https://issues.apache.org/jira/browse/KUDU-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2975: -- Labels: roadmap-candidate scalability (was: ) > Spread WAL across multiple data directories > --- > > Key: KUDU-2975 > URL: https://issues.apache.org/jira/browse/KUDU-2975 > Project: Kudu > Issue Type: New Feature > Components: fs, perf, tablet, tserver >Reporter: LiFu He >Assignee: YangSong >Priority: Major > Labels: roadmap-candidate, scalability > Attachments: network.png, tserver-WARNING.png, util.png > > > Recently, we deployed a new kudu cluster and every node has 12 SSD. Then, we > created a big table and loaded data to it through flink. We noticed that the > util of one SSD which is used to store WAL is 100% but others are free. So, > we suggest to spread WAL across multiple data directories. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2975) Spread WAL across multiple data directories
[ https://issues.apache.org/jira/browse/KUDU-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2975: -- Component/s: perf > Spread WAL across multiple data directories > --- > > Key: KUDU-2975 > URL: https://issues.apache.org/jira/browse/KUDU-2975 > Project: Kudu > Issue Type: New Feature > Components: fs, perf, tablet, tserver >Reporter: LiFu He >Assignee: YangSong >Priority: Major > Attachments: network.png, tserver-WARNING.png, util.png > > > Recently, we deployed a new kudu cluster and every node has 12 SSD. Then, we > created a big table and loaded data to it through flink. We noticed that the > util of one SSD which is used to store WAL is 100% but others are free. So, > we suggest to spread WAL across multiple data directories. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2969) UDF support for scans
[ https://issues.apache.org/jira/browse/KUDU-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2969: -- Labels: roadmap-candidate (was: ) > UDF support for scans > - > > Key: KUDU-2969 > URL: https://issues.apache.org/jira/browse/KUDU-2969 > Project: Kudu > Issue Type: New Feature > Components: tablet, tserver >Affects Versions: 1.11.0 >Reporter: Adar Dembo >Priority: Major > Labels: roadmap-candidate > > It would be nice if Kudu supported some form of user-defined functions (UDFs) > for use in scans. These could be used for custom comparisons, or for > comparisons between multiple columns (rather than between columns and > constant values). > Impala supports [both Java-based and native > UDFs|https://impala.apache.org/docs/build/html/topics/impala_udf.html]. We > could explore doing something similar, or an approach with an IR like > [Weld|https://www.weld.rs]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2958) ClientTest.TestReplicatedTabletWritesWithLeaderElection is flaky
[ https://issues.apache.org/jira/browse/KUDU-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2958: -- Component/s: test > ClientTest.TestReplicatedTabletWritesWithLeaderElection is flaky > > > Key: KUDU-2958 > URL: https://issues.apache.org/jira/browse/KUDU-2958 > Project: Kudu > Issue Type: Bug > Components: test >Affects Versions: 1.11.0 >Reporter: Alexey Serbin >Assignee: ZhangYao >Priority: Major > Attachments: client-test.5.txt.xz > > > The {{TestReplicatedTabletWritesWithLeaderElection}} of the {{client-test}} > is flaky. Time to time in ASAN build configuration it fails with the > following error: > {noformat} > I0924 20:26:19.869351 14037 client-test.cc:4304] Counting rows... > > src/kudu/client/client-test.cc:4308: Failure > Expected: 2 * kNumRowsToWrite > > Which is: 200 > > To be equal to: CountRowsFromClient(table.get(), KuduClient::FIRST_REPLICA, > KuduScanner::READ_LATEST, kNoBound, kNoBound) > Which is: 100 > {noformat} > It seems there is implicit assumption in the test about fast propagation of > Raft transactions to follower replicas. > I attached the full log of the failed tests scenario. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2957) In HmsSentryConfigurations/MasterStressTest kudu-master crashes with on out-of-order notification log event
[ https://issues.apache.org/jira/browse/KUDU-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2957. --- Fix Version/s: NA Resolution: Won't Fix The Sentry integration is removed. > In HmsSentryConfigurations/MasterStressTest kudu-master crashes with on > out-of-order notification log event > > > Key: KUDU-2957 > URL: https://issues.apache.org/jira/browse/KUDU-2957 > Project: Kudu > Issue Type: Bug > Components: hms, master, test >Affects Versions: 1.11.0 >Reporter: Alexey Serbin >Priority: Major > Fix For: NA > > Attachments: master-stress-test.1.txt.xz > > > The first relevant message is: > {noformat} > F0924 20:03:08.307613 1220 hms_notification_log_listener.cc:266] Received > out-of-order notification log event (last processed event ID: 22): 22 > DROP_TABLE default.table_7dad03ec77524186956c5829457d06c7 > *** Check failure stack trace: *** > > @ 0x7f63a55ae62d google::LogMessage::Fail() at ??:0 > > @ 0x7f63a55b064c google::LogMessage::SendToLog() at ??:0 > > @ 0x7f63a55ae189 google::LogMessage::Flush() at ??:0 > > @ 0x7f63a55ae3a1 google::LogMessage::~LogMessage() at ??:0 > > @ 0x7f63a70c0c4c > kudu::master::HmsNotificationLogListenerTask::Poll() at ??:0 > @ 0x7f63a70bff79 > kudu::master::HmsNotificationLogListenerTask::RunLoop() at ??:0 > {noformat} > In the DEBUG build, all three {{kudu-master}} process crashes and next > attempt to create a table times out: > {noformat} > F0924 20:08:32.662717 1116 master-stress-test.cc:297] Check failed: _s.ok() > Bad status: Timed out: Error creating table > default.Table_e221014b8c604ed0b635168473827877 on the master: CreateTable > timed out after deadline expired: CreateTable passed its deadline: Timed out: > ConnectToClusterRpc(addrs: > 127.0.58.126:41719,127.0.58.125:36217,127.0.58.124:33155, num_attempts: 772) > passed its deadline: Not found: no leader found: ConnectToClusterRpc(addrs: > 127.0.58.126:41719,127.0.58.125:36217,127.0.58.124:33155, num_attempts: 1) > {noformat} > I attached the full log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2942) A rare flaky test for the aggregated live row count
[ https://issues.apache.org/jira/browse/KUDU-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2942: -- Component/s: test > A rare flaky test for the aggregated live row count > --- > > Key: KUDU-2942 > URL: https://issues.apache.org/jira/browse/KUDU-2942 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: LiFu He >Priority: Major > Attachments: ts_tablet_manager-itest.txt > > > A few days ago, Adar met a rare flaky test for the live row count in TSAN > mode. > > {code:java} > // code placeholder > /home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/ts_tablet_manager-itest.cc:642 > Expected: live_row_count > Which is: 327 > To be equal to: table_info->GetMetrics()->live_row_count->value() > Which is: 654 > {code} > It seems the metric value is doubled. And his full test output is in the > attachment. > > I reviewed the previous patches and made some unusual guesses. I think one of > them could explain the issue: > When one master just becomes the leader and there are two heartbeat messages > from the same tserver that are processed in parallel at > [Line4239|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/master/catalog_manager.cc#L4239], > then the metric value will be doubled because the old tablet stats can be > accessed concurrently. > Thus, the question becomes how to generate two heartbeat messages from the > same tserver at the same time? The possible answer is: [First heartbeat > message|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/integration-tests/ts_tablet_manager-itest.cc#L741] > and [Second heartbeat > message|https://github.com/apache/kudu/blob/1bdae88faefe9b0d43b6897d96cd853bc5dd7353/src/kudu/integration-tests/ts_tablet_manager-itest.cc#L635] > Please don't forget the above case is integrate test environment, not product. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2950) Support restarting nodes in batches
[ https://issues.apache.org/jira/browse/KUDU-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2950: -- Component/s: ops-tooling > Support restarting nodes in batches > --- > > Key: KUDU-2950 > URL: https://issues.apache.org/jira/browse/KUDU-2950 > Project: Kudu > Issue Type: Improvement > Components: ops-tooling >Reporter: Andrew Wong >Priority: Major > > Once Kudu has the building blocks to orchestrate a rolling restart, it'd be > great if we could support restarting multiple nodes at a time. > Location awareness would play a crucial role in this because, if used to > identify racks placement, we could bring down an entire rack at a time if we > wanted. If we did this, though, during the controlled restart of a given > rack, Kudu would be more vulnerable to the _unexpected_ downtime of another > rack. > One approach would be to support something like [HDFS's upgrade > domains|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUpgradeDomain.html]: > {quote}The idea is to group datanodes in a new dimension called upgrade > domain, in addition to the existing rack-based grouping. For example, we can > assign all datanodes in the first position of any rack to upgrade domain > ud_01, nodes in the second position to upgrade domain ud_02 and so on. > ... > By default, 3 replicas of any given block are placed on 3 different upgrade > domains. This means all datanodes belonging to a specific upgrade domain > collectively won’t store more than one replica of any block. > {quote} > The decoupling of physical groups from restartable groups should make a batch > restarts more robust to rack failures. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2917) Split a tablet into primary key ranges by number of row
[ https://issues.apache.org/jira/browse/KUDU-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2917: -- Component/s: spark perf > Split a tablet into primary key ranges by number of row > --- > > Key: KUDU-2917 > URL: https://issues.apache.org/jira/browse/KUDU-2917 > Project: Kudu > Issue Type: Improvement > Components: perf, spark >Reporter: Xu Yao >Assignee: Xu Yao >Priority: Major > Labels: impala > > Since we implemented > [KUDU-2437|https://issues.apache.org/jira/browse/KUDU-2437] and > [KUDU-2670|https://issues.apache.org/jira/browse/KUDU-2670], the spark job > can read data inside the tablet in parallel. However, we found in actual use > that splitting key range by size may cause the spark task to read long tails. > (Some tasks read more data when the data size in KeyRange is basically the > same.) > I think this issue is caused by the encoding and compression of column-wise. > For example, we store 1000 rows of data in column-wise. If most of these > columns have the same values, less storage space is required. Instead, If > these columns have different values, more storage is needed. So I think maybe > split the primary key range by the number of rows might be a good choice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2933) Improvement estimation of spark relation size.
[ https://issues.apache.org/jira/browse/KUDU-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2933: -- Component/s: spark perf > Improvement estimation of spark relation size. > -- > > Key: KUDU-2933 > URL: https://issues.apache.org/jira/browse/KUDU-2933 > Project: Kudu > Issue Type: Improvement > Components: perf, spark >Reporter: ZhangYao >Priority: Major > > Should take projection and predicates into consideration when estimating the > sizeInBytes in KuduRelation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2918) Rebalancer can fail when a service queue is full
[ https://issues.apache.org/jira/browse/KUDU-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2918: -- Labels: stability supportability (was: ) > Rebalancer can fail when a service queue is full > > > Key: KUDU-2918 > URL: https://issues.apache.org/jira/browse/KUDU-2918 > Project: Kudu > Issue Type: Bug > Components: CLI, ksck >Affects Versions: 1.11.0 >Reporter: Adar Dembo >Priority: Major > Labels: stability, supportability > > The various low-level RPCs issued by ksck aren't retried if the corresponding > service queues are full. These include GetConsensusState, GetStatus, and > ListTablets. > Without retries, ksck (and the rebalancer) can fail midway: > {noformat} > I0812 11:21:10.669682 42799 rebalancer.cc:831] tablet > d729fb149e804696a0862adacb725d66: a0dca75bbbfb4de69616694834adf930 -> > 24d0eb73b3c64a0f901ae092186b3439 move is abandoned: Remote error: Service > unavailable: GetConsensusState request on kudu.consensus.ConsensusService > from 10.17.182.15:50754 dropped due to backpressure. The service queue is > full; it has 50 items. > I0812 11:21:10.871894 42799 rebalancer.cc:239] re-synchronizing cluster state > Illegal state: tablet server 0d88ff7360b74d1e81cd2ccd41fab8a5 > (foo.bar.com:7050): unacceptable health status UNAVAILABLE > {noformat} > The helper classes in rpc/rpc.h may be useful here. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2917) Split a tablet into primary key ranges by number of row
[ https://issues.apache.org/jira/browse/KUDU-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2917: -- Labels: impala (was: ) > Split a tablet into primary key ranges by number of row > --- > > Key: KUDU-2917 > URL: https://issues.apache.org/jira/browse/KUDU-2917 > Project: Kudu > Issue Type: Improvement >Reporter: Xu Yao >Assignee: Xu Yao >Priority: Major > Labels: impala > > Since we implemented > [KUDU-2437|https://issues.apache.org/jira/browse/KUDU-2437] and > [KUDU-2670|https://issues.apache.org/jira/browse/KUDU-2670], the spark job > can read data inside the tablet in parallel. However, we found in actual use > that splitting key range by size may cause the spark task to read long tails. > (Some tasks read more data when the data size in KeyRange is basically the > same.) > I think this issue is caused by the encoding and compression of column-wise. > For example, we store 1000 rows of data in column-wise. If most of these > columns have the same values, less storage space is required. Instead, If > these columns have different values, more storage is needed. So I think maybe > split the primary key range by the number of rows might be a good choice. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2915) Support to delete dead tservers from CLI
[ https://issues.apache.org/jira/browse/KUDU-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2915: -- Labels: supportability (was: ) > Support to delete dead tservers from CLI > > > Key: KUDU-2915 > URL: https://issues.apache.org/jira/browse/KUDU-2915 > Project: Kudu > Issue Type: Improvement > Components: CLI, ops-tooling >Affects Versions: 1.10.0 >Reporter: Hexin >Assignee: Hexin >Priority: Major > Labels: supportability > > Sometimes the nodes in the cluster will crash due to machine problems such as > disk corruption, which can be very common. However, if there are some dead > tservers, ksck result will always show error (e.g. Not all Tablet Servers are > reachable) although all tables have recovered to be healthy. > The only way now to get the healthy status of ksck is to restart all masters > one by one. In some cases, for example, if the machine has completely > corrupted, we hope to get healthy status of ksck without restarting, since > after restarting masters the cluster will take some time to recover, during > which it will have influence on scanning or upsetting to tables. The recovery > time can be long which mainly depends on the scale of cluster. This problem > can be serious and annoying especially tservers crashed with high-frequency > in a large cluster. > It’s valuable if we have an easier way to delete dead tservers from master, I > will support a kudu command to realize it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-1701) Reduce contention in CatalogManager::ScopedLeaderSharedLock
[ https://issues.apache.org/jira/browse/KUDU-1701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-1701. --- Fix Version/s: 1.13.0 Resolution: Fixed > Reduce contention in CatalogManager::ScopedLeaderSharedLock > > > Key: KUDU-1701 > URL: https://issues.apache.org/jira/browse/KUDU-1701 > Project: Kudu > Issue Type: Improvement > Components: master >Affects Versions: 1.0.0 >Reporter: Todd Lipcon >Assignee: Alexey Serbin >Priority: Minor > Fix For: 1.13.0 > > > CatalogManager::ScopedLeaderSharedLock::ScopedLeaderSharedLock() currently > holds a spinlock while accessing consensus->ConsensusState(). That call makes > a copy of a protobuf, which requires allocation and a later deallocation. > Every master lookup RPC needs to go through this path, so this can become a > contention point under heavy multi-client load. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2915) Support to delete dead tservers from CLI
[ https://issues.apache.org/jira/browse/KUDU-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2915: -- Component/s: ops-tooling > Support to delete dead tservers from CLI > > > Key: KUDU-2915 > URL: https://issues.apache.org/jira/browse/KUDU-2915 > Project: Kudu > Issue Type: Improvement > Components: CLI, ops-tooling >Affects Versions: 1.10.0 >Reporter: Hexin >Assignee: Hexin >Priority: Major > > Sometimes the nodes in the cluster will crash due to machine problems such as > disk corruption, which can be very common. However, if there are some dead > tservers, ksck result will always show error (e.g. Not all Tablet Servers are > reachable) although all tables have recovered to be healthy. > The only way now to get the healthy status of ksck is to restart all masters > one by one. In some cases, for example, if the machine has completely > corrupted, we hope to get healthy status of ksck without restarting, since > after restarting masters the cluster will take some time to recover, during > which it will have influence on scanning or upsetting to tables. The recovery > time can be long which mainly depends on the scale of cluster. This problem > can be serious and annoying especially tservers crashed with high-frequency > in a large cluster. > It’s valuable if we have an easier way to delete dead tservers from master, I > will support a kudu command to realize it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2915) Support to delete dead tservers from CLI
[ https://issues.apache.org/jira/browse/KUDU-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2915: -- Target Version/s: (was: 1.10.0) > Support to delete dead tservers from CLI > > > Key: KUDU-2915 > URL: https://issues.apache.org/jira/browse/KUDU-2915 > Project: Kudu > Issue Type: Improvement > Components: CLI >Affects Versions: 1.10.0 >Reporter: Hexin >Assignee: Hexin >Priority: Major > > Sometimes the nodes in the cluster will crash due to machine problems such as > disk corruption, which can be very common. However, if there are some dead > tservers, ksck result will always show error (e.g. Not all Tablet Servers are > reachable) although all tables have recovered to be healthy. > The only way now to get the healthy status of ksck is to restart all masters > one by one. In some cases, for example, if the machine has completely > corrupted, we hope to get healthy status of ksck without restarting, since > after restarting masters the cluster will take some time to recover, during > which it will have influence on scanning or upsetting to tables. The recovery > time can be long which mainly depends on the scale of cluster. This problem > can be serious and annoying especially tservers crashed with high-frequency > in a large cluster. > It’s valuable if we have an easier way to delete dead tservers from master, I > will support a kudu command to realize it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3115) Improve scalability of Kudu masters
[ https://issues.apache.org/jira/browse/KUDU-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3115: -- Labels: scalability (was: ) > Improve scalability of Kudu masters > --- > > Key: KUDU-3115 > URL: https://issues.apache.org/jira/browse/KUDU-3115 > Project: Kudu > Issue Type: Improvement > Components: master >Reporter: Alexey Serbin >Priority: Major > Labels: scalability > > Currently, multiple masters in a multi-master Kudu cluster are used only for > high availability & fault tolerance use cases, but not for sharing the load > among the available master nodes. For example, Kudu clients detect current > leader master upon connecting to the cluster and send all their subsequent > requests to the leader master, so serving many more clients require running > masters on more powerful nodes. Current design assumes that masters store > and process the requests for metadata only, but that makes sense only up to > some limit on the rate of incoming client requests. > It would be great to achieve better 'horizontal' scalability for Kudu masters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3115) Improve scalability of Kudu masters
[ https://issues.apache.org/jira/browse/KUDU-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3115: -- Component/s: master > Improve scalability of Kudu masters > --- > > Key: KUDU-3115 > URL: https://issues.apache.org/jira/browse/KUDU-3115 > Project: Kudu > Issue Type: Improvement > Components: master >Reporter: Alexey Serbin >Priority: Major > > Currently, multiple masters in a multi-master Kudu cluster are used only for > high availability & fault tolerance use cases, but not for sharing the load > among the available master nodes. For example, Kudu clients detect current > leader master upon connecting to the cluster and send all their subsequent > requests to the leader master, so serving many more clients require running > masters on more powerful nodes. Current design assumes that masters store > and process the requests for metadata only, but that makes sense only up to > some limit on the rate of incoming client requests. > It would be great to achieve better 'horizontal' scalability for Kudu masters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2910) Add client cache/factory implementation to the kudu-client
[ https://issues.apache.org/jira/browse/KUDU-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2910: -- Component/s: supportability > Add client cache/factory implementation to the kudu-client > -- > > Key: KUDU-2910 > URL: https://issues.apache.org/jira/browse/KUDU-2910 > Project: Kudu > Issue Type: Improvement > Components: client, supportability >Reporter: Grant Henke >Assignee: Sandish Kumar HN >Priority: Major > > Often integrations should cache and use a shared client for all communication > to a given list of masters. This is seen in our own kudu-spark integration in > `KuduContext.KuduClientCache`. > It would be nice to add a generic implementation to the kudu-client so that > this code doesn't get re-written over and over. Additionally we can add more > complex logic if useful later. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2903) Durability testing framework and tests
[ https://issues.apache.org/jira/browse/KUDU-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2903: -- Labels: roadmap-candidate (was: ) > Durability testing framework and tests > -- > > Key: KUDU-2903 > URL: https://issues.apache.org/jira/browse/KUDU-2903 > Project: Kudu > Issue Type: Bug > Components: test >Affects Versions: 1.11.0 >Reporter: Adar Dembo >Priority: Critical > Labels: roadmap-candidate > > From time to time we get user reports of durability issues in Kudu. We try to > be good citizens and obey the POSIX spec w.r.t. durably storing data on disk, > but we lack any sort of tests that prove we're doing this correctly. > Ideally, we'd have a framework that allows us to run a standard Kudu workload > while doing pathological things to a subset of nodes like: > * Panicking the Linux kernel. > * Abruptly cutting power. > * Abruptly unmounting a filesystem or yanking a disk. > Then we'd restart Kudu on the affected nodes and prove that all on-disk data > remains consistent. > Without such a framework, we can only theorize issues and their possible > fixes. Some examples include KUDU-2195 and KUDU-2260. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2902) Productionize master state rebuilding tool
[ https://issues.apache.org/jira/browse/KUDU-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2902. --- Fix Version/s: NA Resolution: Duplicate > Productionize master state rebuilding tool > -- > > Key: KUDU-2902 > URL: https://issues.apache.org/jira/browse/KUDU-2902 > Project: Kudu > Issue Type: Bug > Components: CLI, master >Affects Versions: 1.11.0 >Reporter: Adar Dembo >Priority: Major > Fix For: NA > > > Will authored a [CLI tool|https://gerrit.cloudera.org/c/9490/] that uses > cluster-wide tserver state to rebuild master state (i.e. tables and tablets). > We've seen this tool prove useful in some really gnarly support situations. > We should productionize it and merge it into the CLI. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2899) A flakiness in HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence
[ https://issues.apache.org/jira/browse/KUDU-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2899. --- Fix Version/s: NA Resolution: Won't Fix The Sentry integration is removed. > A flakiness in HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence > -- > > Key: KUDU-2899 > URL: https://issues.apache.org/jira/browse/KUDU-2899 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: 1.10.0 >Reporter: Alexey Serbin >Priority: Major > Fix For: NA > > Attachments: alter_table-randomized-test.01.txt.xz, > alter_table-randomized-test.1.txt.xz > > > The {{HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence/1}} > scenario of the {{alter_table_randomized-itest.cc}} exhibits flakiness in > case of TSAN builds, time to time failing with errors like below: > {noformat} > F0719 06:51:55.884040 245 alter_table-randomized-test.cc:499] Check failed: > _s.ok() Bad status: Not found: The specified column does not exist > {noformat} > It's pretty clear what happened: the call timed out as it's seen from the > client side, > {noformat} > W0719 06:51:55.871377 445 rpcz_store.cc:253] Call > kudu.master.MasterService.AlterTable from 127.0.0.1:54308 (request call id > 178) took 10022 ms (10 s). Client timeout ms (10 s) > {noformat} > and the client retried performing the same operation: > {noformat} > W0719 06:51:55.850235 874 master_proxy_rpc.cc:192] Re-attempting AlterTable > request to leader Master (127.0.61.126:33771) > {noformat} > but altering the table (dropping column {{c486}}) has actually succeeded: > {noformat} > W0719 06:51:55.868930 493 hms_client.cc:272] Time spent alter HMS table: > real 9.997s user 0.000s sys 0.001s > {noformat} > {noformat} > I0719 06:51:55.890976 990 tablet.cc:1259] T > 7f145a02995242298c013c329420b6f5 P d12954c90bf3483f80f0222ee1a74ef9: Alter > schema from ( > 10:key INT32 NOT NULL, > > 11:c486 INT32 NULLABLE, > > PRIMARY KEY (key) > > ) version 1 to ( > > 10:key INT32 NOT NULL, > > 11:c746 INT32 NULLABLE, > > PRIMARY KEY (key) > > ) version 2 > {noformat} > I'm attaching full test log reference. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2898) KuduContext doesn't set a serialVersionUID
[ https://issues.apache.org/jira/browse/KUDU-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2898: -- Labels: beginner trivial (was: ) > KuduContext doesn't set a serialVersionUID > -- > > Key: KUDU-2898 > URL: https://issues.apache.org/jira/browse/KUDU-2898 > Project: Kudu > Issue Type: Bug > Components: spark >Reporter: Grant Henke >Priority: Major > Labels: beginner, trivial > > It looks like KuduContext doesn't have an explicitly set `serialVersionUID` > which means that each release of spark-kudu is binary incompatible. > We should fix this and check for other classes the implement the > `Serializable` interface. > There is some work to detect these breaks here: > https://gerrit.cloudera.org/#/c/13004/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2898) KuduContext doesn't set a serialVersionUID
[ https://issues.apache.org/jira/browse/KUDU-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2898: -- Component/s: spark > KuduContext doesn't set a serialVersionUID > -- > > Key: KUDU-2898 > URL: https://issues.apache.org/jira/browse/KUDU-2898 > Project: Kudu > Issue Type: Bug > Components: spark >Reporter: Grant Henke >Priority: Major > > It looks like KuduContext doesn't have an explicitly set `serialVersionUID` > which means that each release of spark-kudu is binary incompatible. > We should fix this and check for other classes the implement the > `Serializable` interface. > There is some work to detect these breaks here: > https://gerrit.cloudera.org/#/c/13004/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2893) No RHEL 7.x commands in troubleshooting doc
[ https://issues.apache.org/jira/browse/KUDU-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2893: -- Component/s: documentation > No RHEL 7.x commands in troubleshooting doc > --- > > Key: KUDU-2893 > URL: https://issues.apache.org/jira/browse/KUDU-2893 > Project: Kudu > Issue Type: Bug > Components: documentation >Reporter: Peter Ebert >Priority: Major > > The troubleshooting ([https://kudu.apache.org/docs/troubleshooting.html]) > doc's command for starting ntp if installed but not running is out of date > for RHEL 7, should be 'systemctl start ntpd' > Also worth noting that ntp is started immediately after installing, so they > shouldn't need to run if just installed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2890) Tablet memory Unreleased
[ https://issues.apache.org/jira/browse/KUDU-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2890. --- Fix Version/s: NA Resolution: Incomplete > Tablet memory Unreleased > > > Key: KUDU-2890 > URL: https://issues.apache.org/jira/browse/KUDU-2890 > Project: Kudu > Issue Type: Bug > Components: tablet >Affects Versions: 1.6.0 >Reporter: zhiyezou >Priority: Major > Fix For: NA > > Attachments: image-2019-07-10-15-20-32-990.png > > > This is the problem I encountered. > !image-2019-07-10-15-20-32-990.png! > I can not understand the memory usage by this tablet:total usage is larger > than the MemRowSet + DeltaMemRowSet. > Is there any memory used by tablet,and how can i find them. > The total memory usage will grow up to the hard memory limit. > Falling into periodic repetition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2887) Expose the tablet statistics in Client API
[ https://issues.apache.org/jira/browse/KUDU-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2887: -- Labels: imapala (was: ) > Expose the tablet statistics in Client API > -- > > Key: KUDU-2887 > URL: https://issues.apache.org/jira/browse/KUDU-2887 > Project: Kudu > Issue Type: Improvement > Components: client >Reporter: LiFu He >Priority: Minor > Labels: imapala > > The patch about aggregating tablet statistics on the kudu-master is on the > way. And I think it's important to expose these statistics in client api by > which the query engine can optimize their query plan. For example: (1) adjust > the order of scanning tables, (2) Split a big tablet into multiple range > pieces(KUDU-2437) to improve concurrency automatically, (3) speed up the > query like "select count( *) from table". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2887) Expose the tablet statistics in Client API
[ https://issues.apache.org/jira/browse/KUDU-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2887: -- Labels: impala roadmap-candidate (was: impala) > Expose the tablet statistics in Client API > -- > > Key: KUDU-2887 > URL: https://issues.apache.org/jira/browse/KUDU-2887 > Project: Kudu > Issue Type: Improvement > Components: client >Reporter: LiFu He >Priority: Minor > Labels: impala, roadmap-candidate > > The patch about aggregating tablet statistics on the kudu-master is on the > way. And I think it's important to expose these statistics in client api by > which the query engine can optimize their query plan. For example: (1) adjust > the order of scanning tables, (2) Split a big tablet into multiple range > pieces(KUDU-2437) to improve concurrency automatically, (3) speed up the > query like "select count( *) from table". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2887) Expose the tablet statistics in Client API
[ https://issues.apache.org/jira/browse/KUDU-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2887: -- Labels: impala (was: imapala) > Expose the tablet statistics in Client API > -- > > Key: KUDU-2887 > URL: https://issues.apache.org/jira/browse/KUDU-2887 > Project: Kudu > Issue Type: Improvement > Components: client >Reporter: LiFu He >Priority: Minor > Labels: impala > > The patch about aggregating tablet statistics on the kudu-master is on the > way. And I think it's important to expose these statistics in client api by > which the query engine can optimize their query plan. For example: (1) adjust > the order of scanning tables, (2) Split a big tablet into multiple range > pieces(KUDU-2437) to improve concurrency automatically, (3) speed up the > query like "select count( *) from table". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2877) Support logging to files and stderr at the same time
[ https://issues.apache.org/jira/browse/KUDU-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2877. --- Fix Version/s: 1.11.0 Resolution: Fixed Resolved via https://github.com/apache/kudu/commit/220391f850847232d9aff1dbd72caa2ce7ec3a1e > Support logging to files and stderr at the same time > > > Key: KUDU-2877 > URL: https://issues.apache.org/jira/browse/KUDU-2877 > Project: Kudu > Issue Type: Improvement > Components: docker >Reporter: Grant Henke >Assignee: Sandish Kumar HN >Priority: Minor > Labels: newbie > Fix For: 1.11.0 > > > Support logging to both the _--log_dir_ directory and to stderr via > _--logtostderr_ at the same time. This could be done with a new flag. __ > This would be useful in docker environments where is standard to log to > stderr, but we still want the logs to be viewable in the web ui. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2799) Upgrade HMS integration to Hive 3
[ https://issues.apache.org/jira/browse/KUDU-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2799. --- Fix Version/s: 1.13.0 Resolution: Fixed > Upgrade HMS integration to Hive 3 > - > > Key: KUDU-2799 > URL: https://issues.apache.org/jira/browse/KUDU-2799 > Project: Kudu > Issue Type: Bug > Components: hms >Affects Versions: 1.10.0 >Reporter: Adar Dembo >Priority: Major > Fix For: 1.13.0 > > > Currently our HMS integration depends on Hive 2. We should upgrade it to be > compatible with Hive 3. > For a while we may want to actively test against both versions of Hive if > that's possible. If not, we should at least support both versions until > users/vendors have had enough time to complete their own upgrades. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2880) TestSecurity is flaky
[ https://issues.apache.org/jira/browse/KUDU-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2880: -- Component/s: test > TestSecurity is flaky > - > > Key: KUDU-2880 > URL: https://issues.apache.org/jira/browse/KUDU-2880 > Project: Kudu > Issue Type: Test > Components: test >Reporter: Hao Hao >Priority: Major > Attachments: test-output.txt > > > A recent run of TestSecurity failed with the following error: > {noformat} > There was 1 failure: > 1) > testExternallyProvidedSubjectRefreshedExternally(org.apache.kudu.client.TestSecurity) > org.apache.kudu.client.NonRecoverableException: cannot complete before > timeout: KuduRpc(method=ListTabletServers, tablet=null, attempt=26, > TimeoutTracker(timeout=3, elapsed=29608), Traces: [0ms] refreshing cache > from master, [46ms] Sub RPC ConnectToMaster: sending RPC to server > master-127.0.202.126:46581, [63ms] Sub RPC ConnectToMaster: sending RPC to > server master-127.0.202.124:43241, [69ms] Sub RPC ConnectToMaster: received > response from server master-127.0.202.126:46581: Network error: Failed to > connect to peer master-127.0.202.126:46581(127.0.202.126:46581): Connection > refused: /127.0.202.126:46581, [70ms] Sub RPC ConnectToMaster: sending RPC to > server master-127.0.202.125:43873, [250ms] Sub RPC ConnectToMaster: received > response from server master-127.0.202.125:43873: Network error: [peer > master-127.0.202.125:43873(127.0.202.125:43873)] unexpected exception from > downstream on [id: 0x2fae7299, /127.0.0.1:57014 => /127.0.202.125:43873], > [282ms] Sub RPC ConnectToMaster: received response from server > master-127.0.202.124:43241: OK, [336ms] delaying RPC due to: Service > unavailable: Master config > (127.0.202.126:46581,127.0.202.124:43241,127.0.202.125:43873) has no leader. > Exceptions received: org.apache.kudu.client.RecoverableException: Failed to > connect to peer master-127.0.202.126:46581(127.0.202.126:46581): Connection > refused: /127.0.202.126:46581,org.apache.kudu.client.RecoverableException: > [peer master-127.0.202.125:43873(127.0.202.125:43873)] unexpected exception > from downstream on [id: 0x2fae7299, /127.0.0.1:57014 => > /127.0.202.125:43873], [357ms] refreshing cache from master, [358ms] Sub RPC > ConnectToMaster: sending RPC to server master-127.0.202.126:46581, [358ms] > Sub RPC ConnectToMaster: sending RPC to server master-127.0.202.124:43241, > [360ms] Sub RPC ConnectToMaster: received response from server > master-127.0.202.126:46581: Network error: java.net.ConnectException: > Connection refused: /127.0.202.126:46581, [360ms] Sub RPC ConnectToMaster: > sending RPC to server master-127.0.202.125:43873, [361ms] Sub RPC > ConnectToMaster: received response from server master-127.0.202.125:43873: > Network error: Failed to connect to peer > master-127.0.202.125:43873(127.0.202.125:43873): Connection refused: > /127.0.202.125:43873, [363ms] Sub RPC ConnectToMaster: received response from > server master-127.0.202.124:43241: OK, [364ms] delaying RPC due to: Service > unavailable: Master config > (127.0.202.126:46581,127.0.202.124:43241,127.0.202.125:43873) has no leader. > Exceptions received: org.apache.kudu.client.RecoverableException: > java.net.ConnectException: Connection refused: > /127.0.202.126:46581,org.apache.kudu.client.RecoverableException: Failed to > connect to peer master-127.0.202.125:43873(127.0.202.125:43873): Connection > refused: /127.0.202.125:43873, [376ms] refreshing cache from master, [377ms] > Sub RPC ConnectToMaster: sending RPC to server master-127.0.202.126:46581, > [377ms] Sub RPC ConnectToMaster: sending RPC to server > master-127.0.202.124:43241, [378ms] Sub RPC ConnectToMaster: sending RPC to > server master-127.0.202.125:43873, [379ms] Sub RPC ConnectToMaster: received > response from server master-127.0.202.126:46581: Network error: Failed to > connect to peer master-127.0.202.126:46581(127.0.202.126:46581): Connection > refused: /127.0.202.126:46581, [381ms] Sub RPC ConnectToMaster: received > response from server master-127.0.202.125:43873: Network error: > java.net.ConnectException: Connection refused: /127.0.202.125:43873, [382ms] > Sub RPC ConnectToMaster: received response from server > master-127.0.202.124:43241: OK, [383ms] delaying RPC due to: Service > unavailable: Master config > (127.0.202.126:46581,127.0.202.124:43241,127.0.202.125:43873) has no leader. > Exceptions received: org.apache.kudu.client.RecoverableException: Failed to > connect to peer master-127.0.202.126:46581(127.0.202.126:46581): Connection > refused: /127.0.202.126:46581,org.apache.kudu.client.RecoverableException: > java.net.ConnectException: Connection refused: /127.0.202.125:43873, [397ms] > refreshing cache from master, [397ms] Sub RPC ConnectToMa
[jira] [Assigned] (KUDU-2799) Upgrade HMS integration to Hive 3
[ https://issues.apache.org/jira/browse/KUDU-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke reassigned KUDU-2799: - Assignee: Grant Henke > Upgrade HMS integration to Hive 3 > - > > Key: KUDU-2799 > URL: https://issues.apache.org/jira/browse/KUDU-2799 > Project: Kudu > Issue Type: Bug > Components: hms >Affects Versions: 1.10.0 >Reporter: Adar Dembo >Assignee: Grant Henke >Priority: Major > Fix For: 1.13.0 > > > Currently our HMS integration depends on Hive 2. We should upgrade it to be > compatible with Hive 3. > For a while we may want to actively test against both versions of Hive if > that's possible. If not, we should at least support both versions until > users/vendors have had enough time to complete their own upgrades. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2798) Fix logging on deleted TSK entries
[ https://issues.apache.org/jira/browse/KUDU-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2798: -- Fix Version/s: 1.12.0 Resolution: Fixed Status: Resolved (was: In Review) > Fix logging on deleted TSK entries > -- > > Key: KUDU-2798 > URL: https://issues.apache.org/jira/browse/KUDU-2798 > Project: Kudu > Issue Type: Task >Affects Versions: 1.8.0, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.11.0, 1.11.1 >Reporter: Alexey Serbin >Assignee: Alexey Serbin >Priority: Minor > Labels: newbie > Fix For: 1.12.0 > > > It seems the identifiers of the deleted TSK entries in the log lines below > need decoding: > {noformat} > I0312 15:17:14.808763 71553 catalog_manager.cc:4095] T > P f05d759af7824df9aafedcc106674182: > Generated new TSK 2 > I0312 15:17:14.811144 71553 catalog_manager.cc:4133] T > P f05d759af7824df9aafedcc106674182: Deleted > TSKs: �, � > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2829) GCC 9 compilation fails on linux system syscall support library -- dependency of breakpoint
[ https://issues.apache.org/jira/browse/KUDU-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2829: -- Component/s: build > GCC 9 compilation fails on linux system syscall support library -- dependency > of breakpoint > --- > > Key: KUDU-2829 > URL: https://issues.apache.org/jira/browse/KUDU-2829 > Project: Kudu > Issue Type: Bug > Components: build >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > > GCC 9.X adds a compilation failure when code attempts to clobber %rsp. > [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813] > GCC 9.X added the enforcement of this and causes Kudu not to compile. Linux > System Syscall Support library added a change to adjust that: > [https://chromium.googlesource.com/linux-syscall-support/+/8048ece6c16c91acfe0d36d1d3cc0890ab6e945c%5E%21/#F0] > We can either upgrade to a newer version or backport that patch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2865) Relax the requirements to get an authorization token
[ https://issues.apache.org/jira/browse/KUDU-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125046#comment-17125046 ] Grant Henke commented on KUDU-2865: --- Has this changed at all as a result of the Ranger integration? > Relax the requirements to get an authorization token > > > Key: KUDU-2865 > URL: https://issues.apache.org/jira/browse/KUDU-2865 > Project: Kudu > Issue Type: Improvement > Components: authz >Affects Versions: 1.10.0 >Reporter: Andrew Wong >Priority: Major > > Currently in order to do any DML with Kudu, a user must have any (i.e. > "METADATA") privilege on a table so the user can get an authorization token. > This is because authz token generation is piggy-backed on the GetTableSchema > endpoint, which does all-or-nothing authorization for the table. > This isn't a great user experience, e.g. if a user only has column-level > privileges. Unless such a user _also_ had a table-level privilege (e.g. > insert privileges on the table), the user would be unable to scan the columns > through direct Kudu APIs. We should consider perhaps modifying the > GetTableSchema endpoint to return only the sub-schema and the privileges for > which the user has column-level privileges or higher. > This user experience would be closer to what is supported by Apache Impala. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2871) TLS 1.3 not supported by krpc
[ https://issues.apache.org/jira/browse/KUDU-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2871: -- Target Version/s: (was: 1.10.0) > TLS 1.3 not supported by krpc > - > > Key: KUDU-2871 > URL: https://issues.apache.org/jira/browse/KUDU-2871 > Project: Kudu > Issue Type: Bug > Components: master, rpc, security, tserver >Affects Versions: 1.8.0, 1.9.0, 1.9.1 >Reporter: Todd Lipcon >Priority: Major > > The TLS negotiation in our RPC protocol assumes a whole number of round trips > between client and server. For TLS 1.3, the exchange has 1.5 round trips (the > client is the last sender rather than the server) which breaks negotiation. > Most tests thus fail with OpenSSL 1.1.1. > We should temporarily disable TLS 1.3 and then fix RPC to support this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2858) Update docker readme to be more user focused
[ https://issues.apache.org/jira/browse/KUDU-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2858: -- Component/s: documentation > Update docker readme to be more user focused > > > Key: KUDU-2858 > URL: https://issues.apache.org/jira/browse/KUDU-2858 > Project: Kudu > Issue Type: Improvement > Components: docker, documentation >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Labels: docker > > Now that the docker images are being published, we should update the readme > to focus less on building the images and more on using the already built > images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2857) Rewrite docker build script in python
[ https://issues.apache.org/jira/browse/KUDU-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2857: -- Labels: build docker (was: docker) > Rewrite docker build script in python > - > > Key: KUDU-2857 > URL: https://issues.apache.org/jira/browse/KUDU-2857 > Project: Kudu > Issue Type: Improvement >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Labels: build, docker > > The docker build bash script has gotten sufficiently complicated that it > should be rewritten in python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2811) Fuzz test needed for backup-restore
[ https://issues.apache.org/jira/browse/KUDU-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2811: -- Component/s: test > Fuzz test needed for backup-restore > --- > > Key: KUDU-2811 > URL: https://issues.apache.org/jira/browse/KUDU-2811 > Project: Kudu > Issue Type: Bug > Components: backup, test >Affects Versions: 1.9.0 >Reporter: William Berkeley >Priority: Major > Labels: backup > > We need to fuzz test backup-restore by having a test that creates a table > through a random sequence of operations while also randomly doing incremental > backups. We should then check the restored table against the original table. > This would have caught KUDU-2809. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2828) Fix C++ code style and compile warning
[ https://issues.apache.org/jira/browse/KUDU-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2828: -- Component/s: build > Fix C++ code style and compile warning > -- > > Key: KUDU-2828 > URL: https://issues.apache.org/jira/browse/KUDU-2828 > Project: Kudu > Issue Type: Improvement > Components: build >Reporter: ZhangYao >Assignee: ZhangYao >Priority: Major > > Currently I run kudu by the C++ static analysis tools which reports some > warning of the code. I try to fix the considerable warning and will push > some commit. > Such as: > Unuse variables in functions. > Uninit member varibales. > and so on. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2805) ClientTest.TestServerTooBusyRetry fails due to TSAN thread limit
[ https://issues.apache.org/jira/browse/KUDU-2805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2805: -- Component/s: test > ClientTest.TestServerTooBusyRetry fails due to TSAN thread limit > > > Key: KUDU-2805 > URL: https://issues.apache.org/jira/browse/KUDU-2805 > Project: Kudu > Issue Type: Bug > Components: test >Affects Versions: 1.9.0 >Reporter: William Berkeley >Priority: Major > Attachments: client-test.tsanlimit.txt > > > I've seen a couple instances where ClientTest.TestServerTooBusyRetry fails > after hitting the TSAN thread limit, after seemingly being stuck for 10 > minutes or so. The end of the logs look like > {noformat} > W0428 12:20:07.406752 10297 debug-util.cc:397] Leaking SignalData structure > 0x7b08000c2ba0 after lost signal to thread 8435 > W0428 12:20:07.412693 10297 debug-util.cc:397] Leaking SignalData structure > 0x7b080019f2a0 after lost signal to thread 10185 > W0428 12:20:07.418191 10297 debug-util.cc:397] Leaking SignalData structure > 0x7b080018f060 after lost signal to thread 10361 > W0428 12:20:23.873589 10139 debug-util.cc:397] Leaking SignalData structure > 0x7b08000fc360 after lost signal to thread 8435 > W0428 12:20:23.878401 10139 debug-util.cc:397] Leaking SignalData structure > 0x7b08000ccf20 after lost signal to thread 10185 > W0428 12:20:23.884522 10139 debug-util.cc:397] Leaking SignalData structure > 0x7b0800051ae0 after lost signal to thread 10361 > W0428 12:22:03.715726 10297 debug-util.cc:397] Leaking SignalData structure > 0x7b08000f9280 after lost signal to thread 8435 > W0428 12:22:03.721261 10297 debug-util.cc:397] Leaking SignalData structure > 0x7b08001b0e40 after lost signal to thread 10185 > W0428 12:22:03.727725 10297 debug-util.cc:397] Leaking SignalData structure > 0x7b08000b7460 after lost signal to thread 10361 > W0428 12:22:11.928373 10139 debug-util.cc:397] Leaking SignalData structure > 0x7b0800044be0 after lost signal to thread 8435 > W0428 12:22:11.933187 10139 debug-util.cc:397] Leaking SignalData structure > 0x7b080018f3c0 after lost signal to thread 10185 > W0428 12:22:11.939275 10139 debug-util.cc:397] Leaking SignalData structure > 0x7b08001b3480 after lost signal to thread 10361 > ==8432==ThreadSanitizer: Thread limit (8128 threads) exceeded. Dying. > {noformat} > Some threads are unresponsive, even to the signals sent by the stack trace > collector thread. Unfortunately, there's nothing in the logs about those > threads. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2816) Failure due to column already present in HmsSentryConfigurations.AlterTableRandomized
[ https://issues.apache.org/jira/browse/KUDU-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2816. --- Fix Version/s: NA Resolution: Won't Fix The Sentry integration is removed. > Failure due to column already present in > HmsSentryConfigurations.AlterTableRandomized > - > > Key: KUDU-2816 > URL: https://issues.apache.org/jira/browse/KUDU-2816 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: William Berkeley >Priority: Major > Fix For: NA > > Attachments: alter_table-randomized-test.1.txt > > > {noformat} > F0504 12:41:37.638859 231 alter_table-randomized-test.cc:499] Check failed: > _s.ok() Bad status: Already present: The column already exists: c310 > *** Check failure stack trace: *** > *** Aborted at 1556973697 (unix time) try "date -d @1556973697" if you are > using GNU date *** > PC: @ 0x7f698597bc37 gsignal > *** SIGABRT (@0x3e800e7) received by PID 231 (TID 0x7f69a02ef900) from > PID 231; stack trace: *** > @ 0x7f698d6c0330 (unknown) at ??:0 > @ 0x7f698597bc37 gsignal at ??:0 > @ 0x7f698597f028 abort at ??:0 > @ 0x7f6988cbfa29 google::logging_fail() at ??:0 > @ 0x7f6988cc131d google::LogMessage::Fail() at ??:0 > @ 0x7f6988cc31dd google::LogMessage::SendToLog() at ??:0 > @ 0x7f6988cc0e59 google::LogMessage::Flush() at ??:0 > @ 0x7f6988cc3c7f google::LogMessageFatal::~LogMessageFatal() at ??:0 > @ 0x586325 kudu::MirrorTable::RandomAlterTable() at > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/alter_table-randomized-test.cc:499 > @ 0x5805b4 > kudu::AlterTableRandomized_TestRandomSequence_Test::TestBody() at > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/alter_table-randomized-test.cc:749 > @ 0x7f698ada0b98 > testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0 > @ 0x7f698ad8e1b2 testing::Test::Run() at ??:0 > @ 0x7f698ad8e2f8 testing::TestInfo::Run() at ??:0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2829) GCC 9 compilation fails on linux system syscall support library -- dependency of breakpoint
[ https://issues.apache.org/jira/browse/KUDU-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2829: -- Target Version/s: (was: 1.10.0) > GCC 9 compilation fails on linux system syscall support library -- dependency > of breakpoint > --- > > Key: KUDU-2829 > URL: https://issues.apache.org/jira/browse/KUDU-2829 > Project: Kudu > Issue Type: Bug >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > > GCC 9.X adds a compilation failure when code attempts to clobber %rsp. > [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813] > GCC 9.X added the enforcement of this and causes Kudu not to compile. Linux > System Syscall Support library added a change to adjust that: > [https://chromium.googlesource.com/linux-syscall-support/+/8048ece6c16c91acfe0d36d1d3cc0890ab6e945c%5E%21/#F0] > We can either upgrade to a newer version or backport that patch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2817) C++ Upgrades for before Kudu 1.13 release
[ https://issues.apache.org/jira/browse/KUDU-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2817: -- Component/s: build > C++ Upgrades for before Kudu 1.13 release > - > > Key: KUDU-2817 > URL: https://issues.apache.org/jira/browse/KUDU-2817 > Project: Kudu > Issue Type: Improvement > Components: build >Affects Versions: 1.10.0 >Reporter: Grant Henke >Priority: Major > > We should consider reviewing and upgrading our dependencies before the next > release. Below is a list of current dependencies and their latest release. > * gflags: 2.2.0 (Nov 2016) -> 2.2.2 (Nov 2018) > * glog: 0.3.5 (May 2017) -> 0.4.0 (Mar 2019) > * gmock: 1.8.0 -> 1.8.1 > * gperftools: 2.6.90 -> 2.7 > * protobuf: 3.4.1 -> 3.7.1 (3.8.0 soon) > * cmake: 3.9.0 (Nov 2018) -> 3.14.3 (May 2019) > * snappy: 1.1.4 (Jan 2017) -> 1.1.7 (Aug 2017) > * lz4: r130 (patched, 2015) -> 1.9.1 (May 2019, expected perf gains) > * bitshuffle: 55f9b4c (patched, 2016) -> 0.3.5 (Nov 2018) > * zlib: 1.2.8 (Apr 2013) -> 1.2.11 (Jan 2017) > * libev: 4.20 -> 4.22 > * rapidjson: 1.1.0 (current) > * squeasel: current > * mustache: 87a592e8aa04497764c533acd6e887618ca7b8a8 (Feb 2017) -> > cf5c3dd499ea2bc9eb5c2072fb551dc7af75aa57 (Jun 2017) > ** Consider using official mustach c++ support? > * curl: 7.59.0 (Mar 2018) -> 7.64.1 (Mar 2019) > * crcutil: current > * libunwind: 1.3-rc1 (patched, Nov 2017) -> 1.3.1 (Jan 2019) > * llvm: 6.0.0 (Mar 2018) -> 8.0.0 (Mar 2019) > * iwyu: 0.9 -> 0.12 (May 2019) > * nvml: 1.1 (2016) -> 1.6 (now called pmdk, Mar 2019) > ** Patch to replace with memkind is posted > * boost: 1.61.0 (patched, 2016) -> 1.70.0 (Apr 2019) > * breakpad: 9eac2058b70615519b2c4d8c6bdbfca1bd079e39 (Apr 2013) -> > 21b48a72aa50dde84149267f6b7402522b846b24 (Apr 2019) > * sparsepp: 47a55825ca3b35eab1ca22b7ab82b9544e32a9af (Nov 2016) -> > 5ca6de766db32b3fb08a040636423cd3988d2d4f (Jun 2018) > * thrift: 0.11 (Dec 2017) -> 0.12 (Dec 2018) > * bison: 3.0.4 (patched, 2015) -> 3.3 (Jan 2019) > * hive: 498021fa15186aee8b282d3c032fbd2cede6bec4 (commit in Hive 2) -> 3.1.1 > (Oct 2018) > * hadoop: 2.8.5 (Sept 2018) -> 3.1.2 (Feb 2019) > * sentry: 505b42e81a9d85c4ebe8db3f48ad7a6e824a5db5 (commit in Master) > * python: 2.7.13 -> (a lot of choices here) > A quick risk/reward review should be done and we should upgrade the > dependencies that are expected to be beneficial. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2815) RaftConsensusNonVoterITest.PromoteAndDemote fails if manually-run election fails.
[ https://issues.apache.org/jira/browse/KUDU-2815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2815: -- Component/s: test > RaftConsensusNonVoterITest.PromoteAndDemote fails if manually-run election > fails. > - > > Key: KUDU-2815 > URL: https://issues.apache.org/jira/browse/KUDU-2815 > Project: Kudu > Issue Type: Bug > Components: test >Affects Versions: 1.9.0 >Reporter: William Berkeley >Priority: Major > Attachments: raft_consensus_nonvoter-itest.txt, > raft_consensus_nonvoter-itest.txt > > > RaftConsensusNonVoterITest.PromoteAndDemote disables normal leader elections > and runs an election manually, to avoid some previous flakiness. > Unfortunately, this introduces flakiness, because, rarely, the manual > election fails when the vote requests time out. The candidate concludes it > has lost the election, and then after that the two other voters vote yes. > The timeout for vote requests is 170ms, which is pretty short. If it were > raised to, say, 5s, the test would probably not be flaky anymore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2816) Failure due to column already present in HmsSentryConfigurations.AlterTableRandomized
[ https://issues.apache.org/jira/browse/KUDU-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2816: -- Component/s: test > Failure due to column already present in > HmsSentryConfigurations.AlterTableRandomized > - > > Key: KUDU-2816 > URL: https://issues.apache.org/jira/browse/KUDU-2816 > Project: Kudu > Issue Type: Bug > Components: test >Reporter: William Berkeley >Priority: Major > Attachments: alter_table-randomized-test.1.txt > > > {noformat} > F0504 12:41:37.638859 231 alter_table-randomized-test.cc:499] Check failed: > _s.ok() Bad status: Already present: The column already exists: c310 > *** Check failure stack trace: *** > *** Aborted at 1556973697 (unix time) try "date -d @1556973697" if you are > using GNU date *** > PC: @ 0x7f698597bc37 gsignal > *** SIGABRT (@0x3e800e7) received by PID 231 (TID 0x7f69a02ef900) from > PID 231; stack trace: *** > @ 0x7f698d6c0330 (unknown) at ??:0 > @ 0x7f698597bc37 gsignal at ??:0 > @ 0x7f698597f028 abort at ??:0 > @ 0x7f6988cbfa29 google::logging_fail() at ??:0 > @ 0x7f6988cc131d google::LogMessage::Fail() at ??:0 > @ 0x7f6988cc31dd google::LogMessage::SendToLog() at ??:0 > @ 0x7f6988cc0e59 google::LogMessage::Flush() at ??:0 > @ 0x7f6988cc3c7f google::LogMessageFatal::~LogMessageFatal() at ??:0 > @ 0x586325 kudu::MirrorTable::RandomAlterTable() at > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/alter_table-randomized-test.cc:499 > @ 0x5805b4 > kudu::AlterTableRandomized_TestRandomSequence_Test::TestBody() at > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/alter_table-randomized-test.cc:749 > @ 0x7f698ada0b98 > testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0 > @ 0x7f698ad8e1b2 testing::Test::Run() at ??:0 > @ 0x7f698ad8e2f8 testing::TestInfo::Run() at ??:0 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2822) Kudu create table problem
[ https://issues.apache.org/jira/browse/KUDU-2822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2822. --- Fix Version/s: NA Resolution: Incomplete > Kudu create table problem > -- > > Key: KUDU-2822 > URL: https://issues.apache.org/jira/browse/KUDU-2822 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: kun'qin >Priority: Major > Fix For: NA > > > There are five ts, each with 775 partitions. Through the impala kudu table, > 100 partitions, the number of partitions per ts has been growing, increasing > to 1500+ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2838) HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence is flaky
[ https://issues.apache.org/jira/browse/KUDU-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2838. --- Fix Version/s: NA Resolution: Won't Fix The Sentry integration is removed. > HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence is flaky > > > Key: KUDU-2838 > URL: https://issues.apache.org/jira/browse/KUDU-2838 > Project: Kudu > Issue Type: Bug > Components: master >Affects Versions: 1.10.0 >Reporter: Alexey Serbin >Priority: Major > Fix For: NA > > Attachments: alter_table-randomized-test.1.txt.xz > > > The {{HmsSentryConfigurations/AlterTableRandomized.TestRandomSequence}} test > is flaky. Sometimes, it fails with the following error: > {noformat} > Bad status: Already present: Error creating table default.test_table on the > master: table default.test_table already exists with id > b1359f3663b34ce2a01009f6538dbffc > {noformat} > The log is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2829) GCC 9 compilation fails on linux system syscall support library -- dependency of breakpoint
[ https://issues.apache.org/jira/browse/KUDU-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2829. --- Fix Version/s: 1.13.0 Resolution: Fixed Resolved via https://github.com/apache/kudu/commit/c5fd2e44245502c3c7f8dd0f2cd7639815fef0aa > GCC 9 compilation fails on linux system syscall support library -- dependency > of breakpoint > --- > > Key: KUDU-2829 > URL: https://issues.apache.org/jira/browse/KUDU-2829 > Project: Kudu > Issue Type: Bug > Components: build >Reporter: Scott Reynolds >Assignee: Scott Reynolds >Priority: Major > Fix For: 1.13.0 > > > GCC 9.X adds a compilation failure when code attempts to clobber %rsp. > [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52813] > GCC 9.X added the enforcement of this and causes Kudu not to compile. Linux > System Syscall Support library added a change to adjust that: > [https://chromium.googlesource.com/linux-syscall-support/+/8048ece6c16c91acfe0d36d1d3cc0890ab6e945c%5E%21/#F0] > We can either upgrade to a newer version or backport that patch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2782) Implement distributed tracing support in Kudu
[ https://issues.apache.org/jira/browse/KUDU-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2782: -- Labels: roadmap-candidate supportability (was: ) > Implement distributed tracing support in Kudu > - > > Key: KUDU-2782 > URL: https://issues.apache.org/jira/browse/KUDU-2782 > Project: Kudu > Issue Type: Task > Components: ops-tooling >Reporter: Mike Percy >Priority: Major > Labels: roadmap-candidate, supportability > > It would be useful to implement distributed tracing support in Kudu, > especially something like OpenTracing support that we could use with Zipkin, > Jaeger, DataDog, etc. Particularly useful would be auto-sampled and on-demand > traces of write RPCs since that would help us identify slow nodes or hotspots > in the replication group and troubleshoot performance and stability issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2789) Document how to use/run the backup and restore jobs
[ https://issues.apache.org/jira/browse/KUDU-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2789. --- Fix Version/s: 1.10.0 Resolution: Fixed > Document how to use/run the backup and restore jobs > --- > > Key: KUDU-2789 > URL: https://issues.apache.org/jira/browse/KUDU-2789 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Grant Henke >Priority: Major > Labels: backup > Fix For: 1.10.0 > > > Before the backup and restore functionality is considered GA we should be > sure it's well documented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2779) MasterStressTest is flaky when HMS is enabled
[ https://issues.apache.org/jira/browse/KUDU-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2779: -- Component/s: test > MasterStressTest is flaky when HMS is enabled > - > > Key: KUDU-2779 > URL: https://issues.apache.org/jira/browse/KUDU-2779 > Project: Kudu > Issue Type: Test > Components: test >Reporter: Hao Hao >Priority: Major > Attachments: master-stress-test.1.txt.xz > > > Encountered failure in master-stress-test.cc when HMS integration is enabled: > {noformat} > 22:30:11.487 [HMS - ERROR - pool-8-thread-2] (HiveAlterHandler.java:341) > Failed to alter table default.table_1529084adeeb48719dd0a1d18572b357 > 22:30:11.494 [HMS - ERROR - pool-8-thread-3] (HiveAlterHandler.java:341) > Failed to alter table default.table_4657eb1f8bbe4b60b03db2cbf07803a3 > 22:30:11.506 [HMS - ERROR - pool-8-thread-2] (RetryingHMSHandler.java:200) > MetaException(message:java.lang.IllegalStateException: Event not set up > correctly) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6189) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:4063) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:4020) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy24.alter_table_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:11631) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:11615) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:103) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: Event not set up correctly > at > org.apache.hadoop.hive.metastore.messaging.AlterTableMessage.checkValid(AlterTableMessage.java:49) > at > org.apache.hadoop.hive.metastore.messaging.json.JSONAlterTableMessage.(JSONAlterTableMessage.java:57) > at > org.apache.hadoop.hive.metastore.messaging.json.JSONMessageFactory.buildAlterTableMessage(JSONMessageFactory.java:115) > at > org.apache.hive.hcatalog.listener.DbNotificationListener.onAlterTable(DbNotificationListener.java:187) > at > org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$8.notify(MetaStoreListenerNotifier.java:107) > at > org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:175) > at > org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:205) > at > org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:317) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:4049) > ... 16 more > Caused by: org.apache.thrift.protocol.TProtocolException: Unexpected > character:{ > at > org.apache.thrift.protocol.TJSONProtocol.readJSONSyntaxChar(TJSONProtocol.java:337) > at > org.apache.thrift.protocol.TJSONProtocol$JSONPairContext.read(TJSONProtocol.java:246) > at > org.apache.thrift.protocol.TJSONProtocol.readJSONObjectStart(TJSONProtocol.java:793) > at > org.apache.thrift.protocol.TJSONProtocol.readStructBegin(TJSONProtocol.java:840) > at > org.apache.hadoop.hive.metastore.api.Table$TableStandardScheme.read(Table.java:1577) > at > org.apache.hadoop.hive.metastore.api.Table$TableStandardScheme.read(Table.java:1573) > at org.apache.hadoop.hive.metastore.api.Table.read(Table.java:1407) > at org.apache.thrift.TDeserializer.deserialize(TDeserializer.java:81) > at org.apache.thr
[jira] [Updated] (KUDU-2778) Explore limitations of multi-master Kudu deployments with more than 3 masters
[ https://issues.apache.org/jira/browse/KUDU-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2778: -- Labels: roadmap-candidate supportability (was: ) > Explore limitations of multi-master Kudu deployments with more than 3 masters > - > > Key: KUDU-2778 > URL: https://issues.apache.org/jira/browse/KUDU-2778 > Project: Kudu > Issue Type: Task > Components: master >Reporter: Alexey Serbin >Priority: Major > Labels: roadmap-candidate, supportability > > Currently, the recommended limit of Kudu masters in multi-master deployment > is 3 (i.e. no more than 3 masters is recommended): > https://github.com/apache/kudu/blob/branch-1.9.x/docs/known_issues.adoc#scale > It would be nice to clarify whether there is anything substantial behind that > limit. As of now the recommendation stems from the fact that all of our > multi-master tests and tested deployments use 3 masters. Overall, being able > to deploy 5 or more masters in case of bigger clusters makes sense from the > HA perspective. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2970) Fine-grained authorization with Ranger
[ https://issues.apache.org/jira/browse/KUDU-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2970. --- Fix Version/s: 1.12.0 Resolution: Fixed > Fine-grained authorization with Ranger > --- > > Key: KUDU-2970 > URL: https://issues.apache.org/jira/browse/KUDU-2970 > Project: Kudu > Issue Type: New Feature > Components: security >Affects Versions: 1.11.0 >Reporter: Hao Hao >Assignee: Hao Hao >Priority: Major > Fix For: 1.12.0 > > > With the completion of Kudu’s integration with Apache Sentry, fine-grained > authorization capabilities have been added to Kudu. However, because Apache > Ranger has wider adoption and provides a more comprehensive security features > (such as attribute based access control, audit, etc) than Sentry, it is > important for Kudu to also integrate Ranger. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-3080) Add SSL support to MiniRanger
[ https://issues.apache.org/jira/browse/KUDU-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-3080: -- Component/s: test > Add SSL support to MiniRanger > - > > Key: KUDU-3080 > URL: https://issues.apache.org/jira/browse/KUDU-3080 > Project: Kudu > Issue Type: Sub-task > Components: test >Reporter: Attila Bukor >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-3078) Ranger integration testing
[ https://issues.apache.org/jira/browse/KUDU-3078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-3078. --- Fix Version/s: 1.12.0 Resolution: Fixed > Ranger integration testing > -- > > Key: KUDU-3078 > URL: https://issues.apache.org/jira/browse/KUDU-3078 > Project: Kudu > Issue Type: Sub-task >Reporter: Attila Bukor >Assignee: Attila Bukor >Priority: Major > Fix For: 1.12.0 > > > The Ranger integration should be properly tested before we can remove the > experimental flag. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2882) Increase the timeout interval for TestSentryClientMetrics.Basic
[ https://issues.apache.org/jira/browse/KUDU-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2882. --- Fix Version/s: NA Resolution: Won't Fix The Sentry integration is removed. > Increase the timeout interval for TestSentryClientMetrics.Basic > --- > > Key: KUDU-2882 > URL: https://issues.apache.org/jira/browse/KUDU-2882 > Project: Kudu > Issue Type: Improvement > Components: master >Affects Versions: 1.10.0 >Reporter: LiFu He >Priority: Minor > Fix For: NA > > > When I run the test cases of 1.10.0-RC2, the 'TestSentryClientMetrics.Basic' > is a little bit strange. Sometimes it works, but sometime it doesn't. Today, > I took a close look at the output log and found some useful info: > {code:java} > // code placeholder > I0701 16:37:24.925388 33240 thread.cc:675] Ended thread 33240 - thread > pool:Sentry [worker] > I0701 16:37:24.925501 33015 thread.cc:624] Started thread 33436 - thread > pool:Sentry [worker] > I0701 16:37:25.322556 33015 mini_sentry.cc:164] Pausing Sentry > W0701 16:37:27.331832 33436 sentry_client.cc:134] Time spent starting Sentry > client: real 1.999s user 0.000s sys 0.000s > W0701 16:37:27.331894 33436 client.h:352] Failed to connect to Sentry > (127.32.61.193:59755): Timed out: failed to open Sentry connection: > THRIFT_EAGAIN (timed out) > I0701 16:37:27.331986 33015 mini_sentry.cc:172] Resuming Sentry > /mnt/ddb/2/helif/apache/kudu/src/kudu/master/sentry_authz_provider-test.cc:1415: > Failure > Expected: (200) < (hist->histogram()->MaxValue()), actual: 200 vs > 1999002 > I0701 16:37:27.332604 33015 mini_sentry.cc:155] Stopping Sentry > {code} > Then I looked through the file 'sentry_authz_provider-test.cc', it seems the > timeout value is too short: > [https://github.com/apache/kudu/blob/5c652defff422f908dacc11011dc6ae59bf49be5/src/kudu/master/sentry_authz_provider-test.cc#L1396] > Perhaps, we can increase this value (default 60 seconds) to 4 or 5 seconds to > avoid the failures, though Alexey Serbin(not sure) and I have this problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2784) MasterSentryTest.TestTableOwnership is flaky
[ https://issues.apache.org/jira/browse/KUDU-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2784. --- Fix Version/s: NA Resolution: Won't Fix The Sentry integration is removed. > MasterSentryTest.TestTableOwnership is flaky > > > Key: KUDU-2784 > URL: https://issues.apache.org/jira/browse/KUDU-2784 > Project: Kudu > Issue Type: Test >Reporter: Hao Hao >Assignee: Hao Hao >Priority: Major > Fix For: NA > > Attachments: master_sentry-itest.2.txt > > > Encountered a failure in with the following error: > {noformat} > W0423 04:49:43.773183 1862 sentry_authz_provider.cc:269] Action on > table with authorizable scope is not permitted for > user > I0423 04:49:43.773447 1862 rpcz_store.cc:269] Call > kudu.master.MasterService.DeleteTable from 127.0.0.1:44822 (request call id > 6) took 2093ms. Request Metrics: > {"Sentry.queue_time_us":33,"Sentry.run_cpu_time_us":390,"Sentry.run_wall_time_us":18856} > /home/jenkins-slave/workspace/kudu-master/1/src/kudu/integration-tests/master_sentry-itest.cc:446: > Failure > Failed > Bad status: Not authorized: unauthorized action > {noformat} > This could be owner privilege hasn't reflected yet for ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2773) MiniSentry: JAVA_TOOL_OPTIONS env variable sometimes is not picked up
[ https://issues.apache.org/jira/browse/KUDU-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2773. --- Fix Version/s: NA Resolution: Won't Fix The Sentry integration is removed. > MiniSentry: JAVA_TOOL_OPTIONS env variable sometimes is not picked up > - > > Key: KUDU-2773 > URL: https://issues.apache.org/jira/browse/KUDU-2773 > Project: Kudu > Issue Type: Bug > Components: authz, security, test >Reporter: Alexey Serbin >Priority: Major > Fix For: NA > > > In MiniSentry, the {{JAVA_TOOL_OPTIONS}} environment variable is set into the > environment upon every start of Sentry process by the {{MiniSentry::Start()}} > method. In most cases it's being picked up, but sometimes that fails in the > {{TestSentryClientMetrics.Basic}} test scenario. When that happens, the > tests fails. An example of failure log is below: > {noformat} > [--] 1 test from TestSentryClientMetrics > > [ RUN ] TestSentryClientMetrics.Basic > > Picked up JAVA_TOOL_OPTIONS: > > 16:45:18.723 [SENTRY - WARN - main] (Log4JLogger.java:96) Metadata has > jdbc-type of null yet this is not valid. Ignored > 16:45:20.587 [SENTRY - WARN - main] (Log4JLogger.java:96) Metadata has > jdbc-type of null yet this is not valid. Ignored > 16:45:22.701 [SENTRY - WARN - sentry-service] (Log4JLogger.java:96) Metadata > has jdbc-type of null yet this is not valid. Ignored > Sentry service is ready to serve client requests > > W0409 16:45:23.483585 28712 client.h:346] Failed to connect to Sentry > (127.23.212.1:35614): Not authorized: failed to open Sentry connection: > Configuration file does not specify default realm > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/master/sentry_authz_provider-test.cc:686: > Failure > Expected: 1 > > To be equal to: GetTasksSuccessful() > > Which is: 0 > > I0409 16:45:23.520182 24400 test_util.cc:135] > --- > I0409 16:45:23.520295 24400 test_util.cc:136] Had fatal failures, leaving > test files at > /data/somelongdirectorytoavoidrpathissues/src/kudutest/sentry_authz_provider-test.1.TestSentryClientMetrics.Basic.1554853465542407-24400 > [ FAILED ] TestSentryClientMetrics.Basic (11490 ms) > > [--] 1 test from TestSentryClientMetrics (11490 ms total) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2766) Add metrics to HMS client for better observability
[ https://issues.apache.org/jira/browse/KUDU-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2766: -- Component/s: metrics > Add metrics to HMS client for better observability > -- > > Key: KUDU-2766 > URL: https://issues.apache.org/jira/browse/KUDU-2766 > Project: Kudu > Issue Type: Improvement > Components: metrics >Affects Versions: 1.10.0 >Reporter: Alexey Serbin >Assignee: Alexey Serbin >Priority: Major > > It would be nice to add metrics into HMS client for better observability of > RPC communication between the client and HMS service. > The following changelist might be a useful reference: > https://gerrit.cloudera.org/12951/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2769) Investigate NotAuthorized responses from Sentry on ListPrivilegesByUser in case of non-existent user
[ https://issues.apache.org/jira/browse/KUDU-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2769. --- Fix Version/s: NA Resolution: Won't Fix The Sentry integration is removed. > Investigate NotAuthorized responses from Sentry on ListPrivilegesByUser in > case of non-existent user > > > Key: KUDU-2769 > URL: https://issues.apache.org/jira/browse/KUDU-2769 > Project: Kudu > Issue Type: Task >Reporter: Alexey Serbin >Priority: Major > Fix For: NA > > > It would be nice to clarify on the behavior of both Sentry and Kudu's wrapper > around Sentry's HA client in {{src/kudu/thrift/client.h}} in the case when > retrieving privileges for a non-existent user. Right now it seems Sentry > responds with something that {{HaClient}} converts into > {{Status::NotAuthorized}}, and that error causes the client to re-connect to > the Sentry service (which is sub-optimal?). So, a couple of questions to > clarify: > * Is it a legit behavior from the Sentry side to responds with something > that's converted into {{Status::NotAuthorized}} by the {{HaClient}}? > * Is it really necessary for the {{HaClien}} to reconnect to Sentry upon > 'sensing' {{Status::NotAuthorized}} status code from Sentry? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2756) RemoteKsckTest.TestClusterWithLocation failed with master consensus conflicts
[ https://issues.apache.org/jira/browse/KUDU-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2756: -- Component/s: test > RemoteKsckTest.TestClusterWithLocation failed with master consensus conflicts > - > > Key: KUDU-2756 > URL: https://issues.apache.org/jira/browse/KUDU-2756 > Project: Kudu > Issue Type: Test > Components: test >Reporter: Hao Hao >Priority: Major > Attachments: ksck_remote-test.txt > > > RemoteKsckTest.TestClusterWithLocation is still flaky after fix from > KUDU-2748 and failed with the following error. > {noformat} > I0401 16:42:06.135743 18496 sys_catalog.cc:340] T > P 1afc84687f934a5a8055897bbf6c2a92 > [sys.catalog]: This master's current role is: LEADER > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/tools/ksck_remote-test.cc:542: > Failure > Failed > Bad status: Corruption: there are master consensus conflicts > /data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/util/test_util.cc:326: > Failure > Failed > Timed out waiting for assertion to pass. > I0401 16:42:35.964449 12160 tablet_server.cc:165] TabletServer shutting > down... > {noformat} > Attached the full log. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2758) TLS socket writes in 16kb chunks with intervening epoll/setsockopt syscalls
[ https://issues.apache.org/jira/browse/KUDU-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2758: -- Labels: impala (was: ) > TLS socket writes in 16kb chunks with intervening epoll/setsockopt syscalls > --- > > Key: KUDU-2758 > URL: https://issues.apache.org/jira/browse/KUDU-2758 > Project: Kudu > Issue Type: Bug > Components: perf, rpc, security >Reporter: Todd Lipcon >Priority: Major > Labels: impala > > I noticed that krpc has the following syscall pattern: >  > {code} > rpc reactor-231 23122 [002] 35488410.994309: syscalls:sys_enter_epoll_wait: > epfd: 0x0007, events: 0x02137520, maxevents: 0x0040, timeout: > 0x0050 > rpc reactor-231 23122 [002] 35488410.994310: syscalls:sys_exit_epoll_wait: > 0x1 > rpc reactor-231 23122 [002] 35488410.994313: syscalls:sys_enter_setsockopt: > fd: 0x0011, level: 0x0006, optname: 0x0003, optval: > 0x7fc80910175c, optlen: 0x0004 > rpc reactor-231 23122 [002] 35488410.994314: syscalls:sys_exit_setsockopt: > 0x0 > rpc reactor-231 23122 [002] 35488410.994351: syscalls:sys_enter_write: fd: > 0x0011, buf: 0x7fc7e8059e93, count: 0x401d > rpc reactor-231 23122 [002] 35488410.994370: syscalls:sys_exit_write: 0x401d > rpc reactor-231 23122 [002] 35488410.994372: syscalls:sys_enter_setsockopt: > fd: 0x0011, level: 0x0006, optname: 0x0003, optval: > 0x7fc80910175c, optlen: 0x0004 > rpc reactor-231 23122 [002] 35488410.994378: syscalls:sys_exit_setsockopt: > 0x0 > {code} > This block of syscalls repeats in a pretty tight loop -- epoll_wait, CORK, > write, UNCORK. The writes are always 0x401d bytes (just more than 16kb). I > found the following in the ssl_write manpage: > {quote} > SSL_write() will only return with success, when the complete contents of buf > of length num has been written. This default behaviour can be changed with > the SSL_MODE_ENABLE_PARTIAL_WRITE option of ssl_ctx_set_mode(3). When this > flag is set, SSL_write() will also return with success, when a partial write > has been successfully completed. In this case the SSL_write() operation is > considered completed. The bytes are sent and a new SSL_write() operation with > a new buffer (with the already sent bytes removed) must be started. A partial > write is performed with the size of a message block, which is 16kB for > SSLv3/TLSv1. > {quote} > Seems likely we should be looping the writes before uncorking -- either until > we run into a temporary socket error or run out of stuff to write. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2754) Keep a maximum number of old log files
[ https://issues.apache.org/jira/browse/KUDU-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2754: -- Labels: beginner trivial (was: ) > Keep a maximum number of old log files > -- > > Key: KUDU-2754 > URL: https://issues.apache.org/jira/browse/KUDU-2754 > Project: Kudu > Issue Type: Improvement >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Labels: beginner, trivial > > Kudu generates various different log files > (INFO,WARNING,ERROR,diagnostic,minidumps,etc). To prevent issues running out > of logging space, it would be nice if a user could configure the maximum > number of each log file type to keep. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KUDU-2751) Java tests that start an HMS client fail when run on JDK10+
[ https://issues.apache.org/jira/browse/KUDU-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke resolved KUDU-2751. --- Fix Version/s: 1.13.0 Resolution: Fixed Fixed via upgrading the Hive dependency. > Java tests that start an HMS client fail when run on JDK10+ > --- > > Key: KUDU-2751 > URL: https://issues.apache.org/jira/browse/KUDU-2751 > Project: Kudu > Issue Type: Bug > Components: java, test >Affects Versions: 1.10.0 >Reporter: Adar Dembo >Priority: Major > Fix For: 1.13.0 > > > They may fail on JDK9 as well, with something like this: > {noformat} > MetaException(message:Got exception: java.lang.ClassCastException > java.base/[Ljava.lang.Object; cannot be cast to java.base/[Ljava.net.URI;) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1389) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:204) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:129) > at > org.apache.kudu.hive.metastore.TestKuduMetastorePlugin.setUp(TestKuduMetastorePlugin.java:108) > {noformat} > I tracked this down and filed HIVE-21508. We should see if we can find some > sort of workaround that isn't necessarily upgrading to a newer Hive artifact > (or maybe we should upgrade our Hive dependency). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2731) Getting column schema information from KuduSchema requires copying a KuduColumnSchema object
[ https://issues.apache.org/jira/browse/KUDU-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2731: -- Component/s: perf > Getting column schema information from KuduSchema requires copying a > KuduColumnSchema object > > > Key: KUDU-2731 > URL: https://issues.apache.org/jira/browse/KUDU-2731 > Project: Kudu > Issue Type: Improvement > Components: perf >Affects Versions: 1.9.0 >Reporter: William Berkeley >Priority: Major > > I'm looking at a CPU profile of Impala inserting into Kudu. > {{KuduTableSink::Send}} has code that schematically does the following: > {noformat} > for each row in the batch > for each column > if (schema.Column(col_idx).isNullable()) { > write->mutable_row()->SetNull(col); > } > } > } > {noformat} > See > [kudu-table-sink.cc|https://github.com/apache/impala/blob/branch-3.1.0/be/src/exec/kudu-table-sink.cc#L236]. > However, {{KuduSchema::Column}} copies the column schema and returns it by > value, so the if statement constructs and destroys a column schema object > just to check if the column is nullable. > This is by far the biggest user of CPU in the Impala process (35% or so). The > workload might be I/O bound writing to Kudu anyway, though. Nevertheless, we > should provide a way to avoid this copying in the API, either by adding a > method like > {noformat} > class KuduSchema { > const KuduColumnSchema& get_column(int idx); > } > {noformat} > or a method like > {noformat} > class KuduSchema { > bool is_column_nullable(int idx); > } > {noformat} > The former is the most flexible while the latter frees the client from > worrying about holding the ref longer than the KuduColumnSchema object lives. > We might need to add a number of methods similar to the latter method to > cover other potentially useful things like checking encoding, type, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KUDU-2727) Contention on the Raft consensus lock can cause tablet service queue overflows
[ https://issues.apache.org/jira/browse/KUDU-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Henke updated KUDU-2727: -- Component/s: perf > Contention on the Raft consensus lock can cause tablet service queue overflows > -- > > Key: KUDU-2727 > URL: https://issues.apache.org/jira/browse/KUDU-2727 > Project: Kudu > Issue Type: Improvement > Components: perf >Reporter: William Berkeley >Assignee: Mike Percy >Priority: Major > > Here's stacks illustrating the phenomenon: > {noformat} > tids=[2201] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() > 0xb4e68e kudu::consensus::Peer::SignalRequest() > 0xb9c0df kudu::consensus::PeerManager::SignalRequest() > 0xb8c178 kudu::consensus::RaftConsensus::Replicate() > 0xaab816 kudu::tablet::TransactionDriver::Prepare() > 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask() >0x1fa37ed kudu::ThreadPool::DispatchThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[4515] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() > 0xb74c60 kudu::consensus::RaftConsensus::NotifyCommitIndex() > 0xb59307 kudu::consensus::PeerMessageQueue::NotifyObserversTask() > 0xb54058 > _ZN4kudu8internal7InvokerILi2ENS0_9BindStateINS0_15RunnableAdapterIMNS_9consensus16PeerMessageQueueEFvRKSt8functionIFvPNS4_24PeerMessageQueueObserverEEFvPS5_SC_EFvNS0_17UnretainedWrapperIS5_EEZNS5_34NotifyObserversOfCommitIndexChangeElEUlS8_E_EEESH_E3RunEPNS0_13BindStateBaseE >0x1fa37ed kudu::ThreadPool::DispatchThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[22185,22194,22193,22188,22187,22186] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() > 0xb8bff8 > kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm() > 0xaaaef9 kudu::tablet::TransactionDriver::ExecuteAsync() > 0xaa3742 kudu::tablet::TabletReplica::SubmitWrite() > 0x92812d kudu::tserver::TabletServiceImpl::Write() >0x1e28f3c kudu::rpc::GeneratedServiceIf::Handle() >0x1e2986a kudu::rpc::ServicePool::RunThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[22192,22191] > 0x379ba0f710 >0x1fb951a base::internal::SpinLockDelay() >0x1fb93b7 base::SpinLock::SlowLock() >0x1e13dec kudu::rpc::ResultTracker::TrackRpc() >0x1e28ef5 kudu::rpc::GeneratedServiceIf::Handle() >0x1e2986a kudu::rpc::ServicePool::RunThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > tids=[4426] > 0x379ba0f710 >0x206d3d0 >0x212fd25 google::protobuf::Message::SpaceUsedLong() >0x211dee4 > google::protobuf::internal::GeneratedMessageReflection::SpaceUsedLong() > 0xb6658e kudu::consensus::LogCache::AppendOperations() > 0xb5c539 kudu::consensus::PeerMessageQueue::AppendOperations() > 0xb5c7c7 kudu::consensus::PeerMessageQueue::AppendOperation() > 0xb7c675 > kudu::consensus::RaftConsensus::AppendNewRoundToQueueUnlocked() > 0xb8c147 kudu::consensus::RaftConsensus::Replicate() > 0xaab816 kudu::tablet::TransactionDriver::Prepare() > 0xaac0ed kudu::tablet::TransactionDriver::PrepareTask() >0x1fa37ed kudu::ThreadPool::DispatchThread() >0x1f9c2a1 kudu::Thread::SuperviseThread() > 0x379ba079d1 start_thread > 0x379b6e88fd clone > {noformat} > {{kudu::consensus::RaftConsensus::CheckLeadershipAndBindTerm()}} needs to > take the lock to check the term and the Raft role. When many RPCs come in for > the same tablet, the contention can hog service threads and cause queue > overflows on busy systems. > Yugabyte switched their equivalent lock to be an atomic that allows them to > read the term and role wait-free. -- This message was sent by Atlassian Jira (v8.3.4#803005)