[jira] [Created] (KUDU-1912) Tablet startup fails on transaction memory consumption
Jean-Daniel Cryans created KUDU-1912: Summary: Tablet startup fails on transaction memory consumption Key: KUDU-1912 URL: https://issues.apache.org/jira/browse/KUDU-1912 Project: Kudu Issue Type: Bug Components: tablet Affects Versions: 1.2.0 Reporter: Jean-Daniel Cryans As reported by a user on Slack: {code} W0307 20:03:46.820791 25594 transaction_tracker.cc:108] Transaction failed, tablet 7bb5e24d7521458d91ad06736a9f7685 transaction memory consumption (66447925) has exceeded its limit (67108864) or the limit of an ancestral tracker E0307 20:03:46.820821 25594 ts_tablet_manager.cc:776] T 7bb5e24d7521458d91ad06736a9f7685 P d4a26cb0d6994266a68dc76d983e454a: Tablet failed to start: Service unavailable: Transaction failed, tablet 7bb5e24d7521458d91ad06736a9f7685 transaction memory consumption (66447925) has exceeded its limit (67108864) or the limit of an ancestral tracker {code} Then since it's a failed state, the replica doesn't get kicked out of the configuration and so the tablet stays under-replicated. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-1887) Allow RPC handlers to discard inbound transfer
[ https://issues.apache.org/jira/browse/KUDU-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved KUDU-1887. --- Resolution: Fixed Fix Version/s: 1.4.0 > Allow RPC handlers to discard inbound transfer > -- > > Key: KUDU-1887 > URL: https://issues.apache.org/jira/browse/KUDU-1887 > Project: Kudu > Issue Type: Improvement > Components: rpc >Affects Versions: 1.2.0 >Reporter: Henry Robinson >Assignee: Henry Robinson >Priority: Minor > Fix For: 1.4.0 > > > This is a general feature request for using KRPC in Impala, not something > that affects Kudu itself right not AFAIK. > A common pattern with communication patterns where a lot of flows fan-in to a > single server is for the server to delay returning a response to a client for > a while, in order to implement some kind of flow control when the server is > at capacity. > If a client sends a lot of data (perhaps by sidecar - KUDU-1866), there's > currently no way AFAICT to retain the {{RpcContext}} needed to delay sending > the response, but to drop the associated transfer buffer (that, presumably, > is putting the server over its capacity). > So we could have {{RpcContext::DiscardTransfer()}} which drops the > {{InboundCall}}'s {{InboundTransfer}}. Since this likely to be called after > handling the request, the request protobuf should still be independently > attached to the {{RpcContext}}. After {{DiscardTransfer}}, it's an error to > look at any inbound sidecars. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1890) Allow renaming of primary key column
[ https://issues.apache.org/jira/browse/KUDU-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901836#comment-15901836 ] Todd Lipcon commented on KUDU-1890: --- Should we resolve this as a duplicate of KUDU-1626? Or close that one as dup? seems they are the same > Allow renaming of primary key column > > > Key: KUDU-1890 > URL: https://issues.apache.org/jira/browse/KUDU-1890 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Ram Mettu >Assignee: Ram Mettu >Priority: Minor > > Current version provides functionality to rename any non-primary key columns > of the table, request is to remove the restriction to rename primary key > column. > The workaround is very time consuming, create a new table and recopy the data > from old table into new table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1890) Allow renaming of primary key column
[ https://issues.apache.org/jira/browse/KUDU-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901962#comment-15901962 ] Ram Mettu commented on KUDU-1890: - no problem, we can close this one. > Allow renaming of primary key column > > > Key: KUDU-1890 > URL: https://issues.apache.org/jira/browse/KUDU-1890 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Ram Mettu >Assignee: Ram Mettu >Priority: Minor > > Current version provides functionality to rename any non-primary key columns > of the table, request is to remove the restriction to rename primary key > column. > The workaround is very time consuming, create a new table and recopy the data > from old table into new table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Closed] (KUDU-1890) Allow renaming of primary key column
[ https://issues.apache.org/jira/browse/KUDU-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ram Mettu closed KUDU-1890. --- Resolution: Duplicate Duplicate of KUDU-1626 > Allow renaming of primary key column > > > Key: KUDU-1890 > URL: https://issues.apache.org/jira/browse/KUDU-1890 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Ram Mettu >Assignee: Ram Mettu >Priority: Minor > > Current version provides functionality to rename any non-primary key columns > of the table, request is to remove the restriction to rename primary key > column. > The workaround is very time consuming, create a new table and recopy the data > from old table into new table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1626) Allow renaming primary key columns
[ https://issues.apache.org/jira/browse/KUDU-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901968#comment-15901968 ] Ram Mettu commented on KUDU-1626: - https://gerrit.cloudera.org/#/c/6078/ > Allow renaming primary key columns > -- > > Key: KUDU-1626 > URL: https://issues.apache.org/jira/browse/KUDU-1626 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.0.0 >Reporter: Dan Burkert >Assignee: Ram Mettu > > Kudu unnecessarily restricts primary key columns from being renamed. This is > of particular importance since column renaming is the only workaround for > Impala and Spark not being able to use columns with upper case and non-ascii > characters. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KUDU-1913) Tablet server should handle failure gracefully instead of crashing
Juan Yu created KUDU-1913: - Summary: Tablet server should handle failure gracefully instead of crashing Key: KUDU-1913 URL: https://issues.apache.org/jira/browse/KUDU-1913 Project: Kudu Issue Type: Bug Reporter: Juan Yu When adding lots of range partitions, all tablet server crashed with the following error: F0308 14:51:04.109369 12952 raft_consensus.cc:1985] Check failed: _s.ok() Bad status: Runtime error: Could not create thread: Resource temporarily unavailable (error 11) Tablet server should handle error/failure more gracefully instead of crashing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KUDU-1914) Add positive test cases for Web UI .htpasswd support
Hao Hao created KUDU-1914: - Summary: Add positive test cases for Web UI .htpasswd support Key: KUDU-1914 URL: https://issues.apache.org/jira/browse/KUDU-1914 Project: Kudu Issue Type: Test Components: security Affects Versions: 1.3.0 Reporter: Hao Hao Priority: Minor We have negative test for web UI basic HTTP authentication. It would be nice to add a positive test to ensure when HTTP authentication is enabled, given correct user, password can connect to Web server successfully. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1913) Tablet server should handle failure gracefully instead of crashing
[ https://issues.apache.org/jira/browse/KUDU-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902278#comment-15902278 ] Todd Lipcon commented on KUDU-1913: --- I think it's running out of threads on a tablet server. We have some plans to reduce the number of threads created - right now we create at least two per partition on a tablet server, so if you try to put many thousands of partitions on one server, you'll hit this. The known limitations document does recommend keeping to hundreds of tablets per server max in current versions. > Tablet server should handle failure gracefully instead of crashing > -- > > Key: KUDU-1913 > URL: https://issues.apache.org/jira/browse/KUDU-1913 > Project: Kudu > Issue Type: Bug >Reporter: Juan Yu > > When adding lots of range partitions, all tablet server crashed with the > following error: > F0308 14:51:04.109369 12952 raft_consensus.cc:1985] Check failed: _s.ok() Bad > status: Runtime error: Could not create thread: Resource temporarily > unavailable (error 11) > Tablet server should handle error/failure more gracefully instead of crashing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1913) Tablet server runs out of threads when creating lots of tablets
[ https://issues.apache.org/jira/browse/KUDU-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-1913: -- Target Version/s: 1.4.0 Labels: data-scalability (was: ) Component/s: log consensus Summary: Tablet server runs out of threads when creating lots of tablets (was: Tablet server should handle failure gracefully instead of crashing) > Tablet server runs out of threads when creating lots of tablets > --- > > Key: KUDU-1913 > URL: https://issues.apache.org/jira/browse/KUDU-1913 > Project: Kudu > Issue Type: Bug > Components: consensus, log >Reporter: Juan Yu > Labels: data-scalability > > When adding lots of range partitions, all tablet server crashed with the > following error: > F0308 14:51:04.109369 12952 raft_consensus.cc:1985] Check failed: _s.ok() Bad > status: Runtime error: Could not create thread: Resource temporarily > unavailable (error 11) > Tablet server should handle error/failure more gracefully instead of crashing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-1913) Tablet server runs out of threads when creating lots of tablets
[ https://issues.apache.org/jira/browse/KUDU-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902470#comment-15902470 ] Juan Yu commented on KUDU-1913: --- I know there is recommendation for hundreds of tablets limit. and when create table, there is also 60 bucket limit check to avoid creating too many partitions. But there is no warning when add range partition. so it's very easy to hit the limit and it will cause many servers to crash at the same time, not just a single one. could an upper limit (total tablet per server) check be added to avoid this? > Tablet server runs out of threads when creating lots of tablets > --- > > Key: KUDU-1913 > URL: https://issues.apache.org/jira/browse/KUDU-1913 > Project: Kudu > Issue Type: Bug > Components: consensus, log >Reporter: Juan Yu > Labels: data-scalability > > When adding lots of range partitions, all tablet server crashed with the > following error: > F0308 14:51:04.109369 12952 raft_consensus.cc:1985] Check failed: _s.ok() Bad > status: Runtime error: Could not create thread: Resource temporarily > unavailable (error 11) > Tablet server should handle error/failure more gracefully instead of crashing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1554) Tombstoned replicas remain on TS even after table is deleted
[ https://issues.apache.org/jira/browse/KUDU-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-1554: -- Affects Version/s: (was: 0.10.0) 1.2.0 Target Version/s: 1.4.0 The above-mentioned "still references orphaned blocks" was spotted again in the wild on 1.2. > Tombstoned replicas remain on TS even after table is deleted > > > Key: KUDU-1554 > URL: https://issues.apache.org/jira/browse/KUDU-1554 > Project: Kudu > Issue Type: Bug > Components: master, tserver >Affects Versions: 1.2.0 >Reporter: Todd Lipcon >Priority: Minor > > If a replica is deleted on a live table, a tombstone replica is left with > TABLET_DATA_TOMBSTONED state. If the table is then deleted, those tombstones > aren't cleaned up, and will remain on the tserver until the next time the > tserver restarts. > Not a big deal, but it may be confusing to users to see these tombstones > sticking around. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KUDU-1038) Deleting a tablet should also delete its log recovery directory, if any
[ https://issues.apache.org/jira/browse/KUDU-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-1038: -- Component/s: tablet > Deleting a tablet should also delete its log recovery directory, if any > --- > > Key: KUDU-1038 > URL: https://issues.apache.org/jira/browse/KUDU-1038 > Project: Kudu > Issue Type: Bug > Components: consensus, tablet >Affects Versions: Feature Complete >Reporter: Mike Percy >Assignee: Mike Percy >Priority: Minor > > Deleting a tablet should also delete its log recovery directory, if any. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KUDU-693) UpdateConsensus and RequestConsensusVote RPC callbacks block reactor
[ https://issues.apache.org/jira/browse/KUDU-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved KUDU-693. -- Resolution: Duplicate Fix Version/s: n/a I haven't seen this be a problem since making glog async > UpdateConsensus and RequestConsensusVote RPC callbacks block reactor > > > Key: KUDU-693 > URL: https://issues.apache.org/jira/browse/KUDU-693 > Project: Kudu > Issue Type: Bug > Components: consensus >Affects Versions: Private Beta >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Minor > Fix For: n/a > > > After adding logging when RPC callbacks block the reactor for too long, I see > a fair number of logs on the YCSB tablet servers indicating that these two > RPC calls have blocked the reactor for anywhere between 100ms and almost a > second. This implies they could probably cause deadlocks as well, and > definitely impact latency of all other RPCs on the server. -- This message was sent by Atlassian JIRA (v6.3.15#6346)