[jira] [Assigned] (KUDU-2901) Load balance on tserver
[ https://issues.apache.org/jira/browse/KUDU-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai reassigned KUDU-2901: -- Assignee: Yingchun Lai > Load balance on tserver > --- > > Key: KUDU-2901 > URL: https://issues.apache.org/jira/browse/KUDU-2901 > Project: Kudu > Issue Type: Improvement > Components: fs, tserver >Reporter: Yingchun Lai >Assignee: Yingchun Lai >Priority: Minor > > On a single tserver, new tablet assignment is base on current assigned tablet > count on a data directory, then chose the one which has the least tablet > count. > In the real world, a cluster may have many tables, and each tablet may have > different sizes. Large tablets have chance to be assigned to the same > directory, this state is not balanced. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KUDU-2906) Don't allow elections when server clocks are too out of sync
Andrew Wong created KUDU-2906: - Summary: Don't allow elections when server clocks are too out of sync Key: KUDU-2906 URL: https://issues.apache.org/jira/browse/KUDU-2906 Project: Kudu Issue Type: Bug Components: consensus Affects Versions: 1.10.0 Reporter: Andrew Wong In cases where machine clocks are not properly synchronized, if a tablet replica is elected leader whose clock happens to be very far in the future (greater than --max_clock_sync_error_usec=10 sec), it's possible that any writes that goes to that tablet will be rejected by the followers, but persisted to the leader's WAL. Then, upon fixing the clock on that machine, the replica may try to replay the future op, but fail to replay it because the op timestamp is too far in the future, with errors like: {code:java} F0715 12:03:09.369819 3500 tablet_bootstrap.cc:904] Check failed: _s.ok() Bad status: Invalid argument: Tried to update clock beyond the max. error.{code} Dumping a recovery WAL, I could see: {code:java} 130.138@6400743143334211584 REPLICATE NO_OP id { term: 130 index: 138 } timestamp: 6400743143334211584 op_type: NO_OP noop_request { } COMMIT 130.138 op_type: NO_OP commited_op_id { term: 130 index: 138 } 131.139@6400743925559676928 REPLICATE NO_OP id { term: 131 index: 139 } timestamp: 6400743925559676928 op_type: NO_OP noop_request { } COMMIT 131.139 op_type: NO_OP commited_op_id { term: 131 index: 139 } 132.140@11589864471731939930 REPLICATE NO_OP id { term: 132 index: 140 } timestamp: 11589864471731939930 op_type: NO_OP noop_request { }{code} Note the drastic jump in timestamp. In this specific case, we verified that the replayed WAL wasn't that far behind the recovery WAL, which had the future timestamps, so we could just delete the recovery WAL and bootstrap from the replayed WAL. It would have been nice had those bad ops not been written at all, maybe by preventing an election between such mismatched servers in the first place. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KUDU-1973) Coalesce RPCs destined for the same server
[ https://issues.apache.org/jira/browse/KUDU-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893246#comment-16893246 ] Andrew Wong commented on KUDU-1973: --- I'll note that on a very dense cluster (5000-7000 replicas per node), the consensus traffic between servers made it very difficult to get the output of a ksck, since ksck tries to get consensus state from the tablet servers. The overall state of the cluster was known to be quite poor, so it's very possible that there were many elections going on in the background; we ended up restarting the entire cluster with a higher Raft heartbeat interval to let things settle first before we were able to get a usable ksck output. > Coalesce RPCs destined for the same server > -- > > Key: KUDU-1973 > URL: https://issues.apache.org/jira/browse/KUDU-1973 > Project: Kudu > Issue Type: Sub-task > Components: rpc, tserver >Affects Versions: 1.4.0 >Reporter: Adar Dembo >Priority: Major > Labels: data-scalability > > The krpc subsystem ensures that only one _connection_ exists between any pair > of nodes, but it doesn't coalesce the _RPCs_ themselves. In clusters with > dense nodes (especially with a lot of tablets), there's often a great number > of RPCs sent between pairs of nodes. > We should explore ways of coalescing those RPCs. I don't know whether that > would happen within the krpc system itself (i.e. in a payload-agnostic way), > or whether we'd only coalesce RPCs known to be "hot" (like UpdateConsensus). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-2905) Impala queries failed when master's IPKI CA info changed
[ https://issues.apache.org/jira/browse/KUDU-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-2905: Component/s: security > Impala queries failed when master's IPKI CA info changed > > > Key: KUDU-2905 > URL: https://issues.apache.org/jira/browse/KUDU-2905 > Project: Kudu > Issue Type: Bug > Components: client, master, security >Affects Versions: 1.11.0 >Reporter: Adar Dembo >Priority: Major > > Saw this in a user report. > The cluster in question lost its master and the state was rebuilt from that > of the tservers (see KUDU-2902). In doing so, the master lost its IPKI cert > info and a new record was generated. > After the Kudu cluster was operational, the existing Impala cluster could not > issue queries. All queries failed with an error like this: > {noformat} > Unable to open Kudu table: Runtime error: Client connection negotiation > failed: client connection to 10.38.202.4:7051: TLS Handshake error: > error:04067084:rsa routines:RSA_EAY_PUBLIC_DECRYPT:data too large for > modulus:rsa_eay.c:738 error:0D0C5006:asn1 encoding > routines:ASN1_item_verify:EVP lib:a_verify.c:249 error:14090086:SSL > routines:ssl3_get_server_certificate:certificate verify failed:s3_clnt.c:1264 > {noformat} > Restarting Impala fixed it. > I'm not sure if this is an issue with how Impala caches KuduClient instances, > or if it's an issue with how the client itself caches the master's CA > certificate. For now I'm assuming this is a Kudu issue and the client needs > to detect this error and invalidate existing certificates. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KUDU-2905) Impala queries failed when master's IPKI CA info changed
Adar Dembo created KUDU-2905: Summary: Impala queries failed when master's IPKI CA info changed Key: KUDU-2905 URL: https://issues.apache.org/jira/browse/KUDU-2905 Project: Kudu Issue Type: Bug Components: client, master Affects Versions: 1.11.0 Reporter: Adar Dembo Saw this in a user report. The cluster in question lost its master and the state was rebuilt from that of the tservers (see KUDU-2902). In doing so, the master lost its IPKI cert info and a new record was generated. After the Kudu cluster was operational, the existing Impala cluster could not issue queries. All queries failed with an error like this: {noformat} Unable to open Kudu table: Runtime error: Client connection negotiation failed: client connection to 10.38.202.4:7051: TLS Handshake error: error:04067084:rsa routines:RSA_EAY_PUBLIC_DECRYPT:data too large for modulus:rsa_eay.c:738 error:0D0C5006:asn1 encoding routines:ASN1_item_verify:EVP lib:a_verify.c:249 error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed:s3_clnt.c:1264 {noformat} Restarting Impala fixed it. I'm not sure if this is an issue with how Impala caches KuduClient instances, or if it's an issue with how the client itself caches the master's CA certificate. For now I'm assuming this is a Kudu issue and the client needs to detect this error and invalidate existing certificates. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KUDU-2904) Master shouldn't allow master tablet operations after a disk failure
Adar Dembo created KUDU-2904: Summary: Master shouldn't allow master tablet operations after a disk failure Key: KUDU-2904 URL: https://issues.apache.org/jira/browse/KUDU-2904 Project: Kudu Issue Type: Bug Components: fs, master Affects Versions: 1.11.0 Reporter: Adar Dembo The master doesn't register any FS error handlers, which means that in the event of a disk failure that doesn't intrinsically crash the server (i.e. a disk failure to one of several directories), the master tablet is not failed and may undergo additional MM ops. This is forbidden: the invariant is that a tablet with a failed disk should itself fail. In the master perhaps the behavior should be more severe (i.e. perhaps the master should crash itself). This surfaced with a user report of multiple minor delta compactions on a master even after one of them had failed during a SyncDir() call on its superblock flush. The metadata was corrupt: the blocks added to the superblock by the compaction were marked as deleted in the LBM. It's unclear whether the in-memory state of the superblock was corrupted by the failure and subsequent compactions, or whether the corruption was caused by something else. Either way, no operations should have been permitted following the initial failure. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KUDU-2903) Durability testing framework and tests
Adar Dembo created KUDU-2903: Summary: Durability testing framework and tests Key: KUDU-2903 URL: https://issues.apache.org/jira/browse/KUDU-2903 Project: Kudu Issue Type: Bug Components: test Affects Versions: 1.11.0 Reporter: Adar Dembo >From time to time we get user reports of durability issues in Kudu. We try to >be good citizens and obey the POSIX spec w.r.t. durably storing data on disk, >but we lack any sort of tests that prove we're doing this correctly. Ideally, we'd have a framework that allows us to run a standard Kudu workload while doing pathological things to a subset of nodes like: * Panicking the Linux kernel. * Abruptly cutting power. * Abruptly unmounting a filesystem or yanking a disk. Then we'd restart Kudu on the affected nodes and prove that all on-disk data remains consistent. Without such a framework, we can only theorize issues and their possible fixes. Some examples include KUDU-2195 and KUDU-2260. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (KUDU-2163) names of "rpc_authentication" do not match in documents
[ https://issues.apache.org/jira/browse/KUDU-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo resolved KUDU-2163. -- Resolution: Fixed Fix Version/s: (was: n/a) 1.11.0 Fixed in commit 0c8e7ca. > names of "rpc_authentication" do not match in documents > --- > > Key: KUDU-2163 > URL: https://issues.apache.org/jira/browse/KUDU-2163 > Project: Kudu > Issue Type: Bug > Components: documentation >Reporter: Jiahongchao >Assignee: Adar Dembo >Priority: Minor > Fix For: 1.11.0 > > > in > [http://kudu.apache.org/docs/configuration_reference.html#kudu-master_rpc_authentication],it > is "rpc_authentication" > in [http://kudu.apache.org/docs/security.html],it is "rpc-authentication" -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (KUDU-934) Crash between creating log metadata and log container prevents restart
[ https://issues.apache.org/jira/browse/KUDU-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo resolved KUDU-934. - Resolution: Duplicate Fix Version/s: n/a Target Version/s: M5 (was: 1.8.0) This got fixed back in September of 2016, and was tracked in KUDU-668. > Crash between creating log metadata and log container prevents restart > -- > > Key: KUDU-934 > URL: https://issues.apache.org/jira/browse/KUDU-934 > Project: Kudu > Issue Type: Bug > Components: fs >Affects Versions: Private Beta >Reporter: Todd Lipcon >Priority: Major > Fix For: n/a > > > If the TS crashes after making a log metadata file, but before making the > associated data file, then it will fail to restart with a message like: > I0804 11:39:24.298252 50360 server_base.cc:139] Could not load existing FS > layout: Not found: Could not open container e29e2bd193374c85b026a48314d22069: > /tmp/kudutest-4100/alter_table-randomized-test.AlterTableRandomized.TestRandomSequence.1438713552754543-46329/minicluster-data/ts-0/data/e29e2bd193374c85b026a48314d22069.data: > No such file or directory (error 2) -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KUDU-2901) Load balance on tserver
[ https://issues.apache.org/jira/browse/KUDU-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893186#comment-16893186 ] Andrew Wong commented on KUDU-2901: --- I agree, using space would be beneficial, particularly in guarding against disk space issues. The current assignment policy prioritizes tablet count balance for the sake of mitigating the effects of a bad disk more evenly across the replicas on a tablet server. Using free space instead of replica count would achieve this at a more granular level, and provides a more useful metric than number of replicas affected by a disk failure – it's would provide us a metric for the number of bytes that need to be re-replicated because of a failure. Additionally, it might be useful to consider free space when creating new blocks, not just when creating new tablets. > Load balance on tserver > --- > > Key: KUDU-2901 > URL: https://issues.apache.org/jira/browse/KUDU-2901 > Project: Kudu > Issue Type: Improvement > Components: fs, tserver >Reporter: Yingchun Lai >Priority: Minor > > On a single tserver, new tablet assignment is base on current assigned tablet > count on a data directory, then chose the one which has the least tablet > count. > In the real world, a cluster may have many tables, and each tablet may have > different sizes. Large tablets have chance to be assigned to the same > directory, this state is not balanced. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-2192) KRPC should have a timer to close stuck connections
[ https://issues.apache.org/jira/browse/KUDU-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-2192: Assignee: Adar Dembo (was: Michael Ho) > KRPC should have a timer to close stuck connections > --- > > Key: KUDU-2192 > URL: https://issues.apache.org/jira/browse/KUDU-2192 > Project: Kudu > Issue Type: Improvement > Components: rpc >Reporter: Michael Ho >Assignee: Adar Dembo >Priority: Major > > If the remote host goes down or its network gets unplugged, all pending RPCs > to that host will be stuck if there is no timeout specified. While those RPCs > which have finished sending their payloads or those which haven't started > sending payloads can be cancelled quickly, those in mid-transmission (i.e. an > RPC at the front of the outbound queue with part of its payload sent already) > cannot be cancelled until the payload has been completely sent. Therefore, > it's beneficial to have a timeout to kill a connection if it's not making any > progress for an extended period of time so the RPC will fail and get unstuck. > The timeout may need to be conservatively large to avoid aggressive closing > of connections due to transient network issue. One can consider augmenting > the existing maintenance thread logic which checks for idle connection to > check for this kind of timeout. Please feel free to propose other > alternatives (e.g. TPC keepalive timeout) in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-2192) KRPC should have a timer to close stuck connections
[ https://issues.apache.org/jira/browse/KUDU-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-2192: Assignee: (was: Adar Dembo) > KRPC should have a timer to close stuck connections > --- > > Key: KUDU-2192 > URL: https://issues.apache.org/jira/browse/KUDU-2192 > Project: Kudu > Issue Type: Improvement > Components: rpc >Reporter: Michael Ho >Priority: Major > > If the remote host goes down or its network gets unplugged, all pending RPCs > to that host will be stuck if there is no timeout specified. While those RPCs > which have finished sending their payloads or those which haven't started > sending payloads can be cancelled quickly, those in mid-transmission (i.e. an > RPC at the front of the outbound queue with part of its payload sent already) > cannot be cancelled until the payload has been completely sent. Therefore, > it's beneficial to have a timeout to kill a connection if it's not making any > progress for an extended period of time so the RPC will fail and get unstuck. > The timeout may need to be conservatively large to avoid aggressive closing > of connections due to transient network issue. One can consider augmenting > the existing maintenance thread logic which checks for idle connection to > check for this kind of timeout. Please feel free to propose other > alternatives (e.g. TPC keepalive timeout) in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KUDU-2902) Productionize master state rebuilding tool
Adar Dembo created KUDU-2902: Summary: Productionize master state rebuilding tool Key: KUDU-2902 URL: https://issues.apache.org/jira/browse/KUDU-2902 Project: Kudu Issue Type: Bug Components: CLI, master Affects Versions: 1.11.0 Reporter: Adar Dembo Will authored a [CLI tool|https://gerrit.cloudera.org/c/9490/] that uses cluster-wide tserver state to rebuild master state (i.e. tables and tablets). We've seen this tool prove useful in some really gnarly support situations. We should productionize it and merge it into the CLI. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KUDU-2192) KRPC should have a timer to close stuck connections
[ https://issues.apache.org/jira/browse/KUDU-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893181#comment-16893181 ] Michael Ho commented on KUDU-2192: -- Yes, another potential TODO is to add TCP_USER_TIMEOUT for all outgoing traffic just in case the network is truly stuck for whatever reason. However, this may be slightly tricky to tune and false positive may lead to RPC failing prematurely. Not sure if I will get to that part soon though so please feel free to reassign. > KRPC should have a timer to close stuck connections > --- > > Key: KUDU-2192 > URL: https://issues.apache.org/jira/browse/KUDU-2192 > Project: Kudu > Issue Type: Improvement > Components: rpc >Reporter: Michael Ho >Assignee: Michael Ho >Priority: Major > > If the remote host goes down or its network gets unplugged, all pending RPCs > to that host will be stuck if there is no timeout specified. While those RPCs > which have finished sending their payloads or those which haven't started > sending payloads can be cancelled quickly, those in mid-transmission (i.e. an > RPC at the front of the outbound queue with part of its payload sent already) > cannot be cancelled until the payload has been completely sent. Therefore, > it's beneficial to have a timeout to kill a connection if it's not making any > progress for an extended period of time so the RPC will fail and get unstuck. > The timeout may need to be conservatively large to avoid aggressive closing > of connections due to transient network issue. One can consider augmenting > the existing maintenance thread logic which checks for idle connection to > check for this kind of timeout. Please feel free to propose other > alternatives (e.g. TPC keepalive timeout) in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-1441) "show create table" does not show partitioning columns
[ https://issues.apache.org/jira/browse/KUDU-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1441: Assignee: Adar Dembo (was: Will Berkeley) > "show create table" does not show partitioning columns > -- > > Key: KUDU-1441 > URL: https://issues.apache.org/jira/browse/KUDU-1441 > Project: Kudu > Issue Type: Bug > Components: impala, master >Affects Versions: 0.8.0 >Reporter: nick >Assignee: Adar Dembo >Priority: Major > Fix For: NA > > > KUDU master web UI and Impala's "show create table" statement does not show > "Distributed By" clause for a table created with hash or range keys. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-1441) "show create table" does not show partitioning columns
[ https://issues.apache.org/jira/browse/KUDU-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1441: Assignee: (was: Adar Dembo) > "show create table" does not show partitioning columns > -- > > Key: KUDU-1441 > URL: https://issues.apache.org/jira/browse/KUDU-1441 > Project: Kudu > Issue Type: Bug > Components: impala, master >Affects Versions: 0.8.0 >Reporter: nick >Priority: Major > Fix For: NA > > > KUDU master web UI and Impala's "show create table" statement does not show > "Distributed By" clause for a table created with hash or range keys. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-1441) "show create table" does not show partitioning columns
[ https://issues.apache.org/jira/browse/KUDU-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1441: - Fix Version/s: (was: NA) > "show create table" does not show partitioning columns > -- > > Key: KUDU-1441 > URL: https://issues.apache.org/jira/browse/KUDU-1441 > Project: Kudu > Issue Type: Bug > Components: impala, master >Affects Versions: 0.8.0 >Reporter: nick >Priority: Major > > KUDU master web UI and Impala's "show create table" statement does not show > "Distributed By" clause for a table created with hash or range keys. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (KUDU-1566) JIRA updater script linking a gerrit patch to JIRA automatically
[ https://issues.apache.org/jira/browse/KUDU-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo resolved KUDU-1566. -- Resolution: Fixed > JIRA updater script linking a gerrit patch to JIRA automatically > > > Key: KUDU-1566 > URL: https://issues.apache.org/jira/browse/KUDU-1566 > Project: Kudu > Issue Type: Task >Reporter: Dinesh Bhat >Assignee: Dinesh Bhat >Priority: Minor > Fix For: NA > > > At times, I have found it hard to track a particular JIRA to a gerrit patch > and vice versa to gain more context on a submitted change, code review > discussions, etc. I am hoping this will bridge the gap between the review > system and JIRA tracking. > Currently, all of our commits do not carry JIRA numbers, but this could be > applicable to whichever gerrit patch carries one in its commit message. I > have come across such scripts before, so spinning one shouldn't be that hard. > Though not as fancy as the below link, we could just add a gerritt link to > JIRA comment section whenever a change is submitted(or perhaps posted for > review). > https://marketplace.atlassian.com/plugins/com.xiplink.jira.git.jira_git_plugin/cloud/overview -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-1754) Columns should default to NULL opposed to NOT NULL
[ https://issues.apache.org/jira/browse/KUDU-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1754: - Fix Version/s: (was: NA) > Columns should default to NULL opposed to NOT NULL > --- > > Key: KUDU-1754 > URL: https://issues.apache.org/jira/browse/KUDU-1754 > Project: Kudu > Issue Type: Bug > Components: api >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Priority: Major > > Columns default to "NOT NULL" if the nullability field is not specified. > This behavior opposes Oracle, Teradata, MSSqlserver, MySQL... -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-1754) Columns should default to NULL opposed to NOT NULL
[ https://issues.apache.org/jira/browse/KUDU-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1754: Assignee: Adar Dembo > Columns should default to NULL opposed to NOT NULL > --- > > Key: KUDU-1754 > URL: https://issues.apache.org/jira/browse/KUDU-1754 > Project: Kudu > Issue Type: Bug > Components: api >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Assignee: Adar Dembo >Priority: Major > > Columns default to "NOT NULL" if the nullability field is not specified. > This behavior opposes Oracle, Teradata, MSSqlserver, MySQL... -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-2581) Reject the scan request if it has reached the num or memory threshold
[ https://issues.apache.org/jira/browse/KUDU-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-2581: - Affects Version/s: (was: NA) 1.8.0 > Reject the scan request if it has reached the num or memory threshold > - > > Key: KUDU-2581 > URL: https://issues.apache.org/jira/browse/KUDU-2581 > Project: Kudu > Issue Type: New Feature > Components: tserver >Affects Versions: 1.8.0 >Reporter: Hexin >Priority: Major > > Sometimes we have a lot of scan, which may excess the soft memory limit. The > greater the excess, the higher the chance of causing throttling for writing. > We should achieve a method that rejecting the scan request if it has reached > the soft memory limit or the scanner num has reached the threshold. The > threshold should be estimated by the actual memory of the system running the > kudu. By this way,the system can provide more space for writing. > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-1754) Columns should default to NULL opposed to NOT NULL
[ https://issues.apache.org/jira/browse/KUDU-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1754: Assignee: (was: Adar Dembo) > Columns should default to NULL opposed to NOT NULL > --- > > Key: KUDU-1754 > URL: https://issues.apache.org/jira/browse/KUDU-1754 > Project: Kudu > Issue Type: Bug > Components: api >Affects Versions: 1.2.0 >Reporter: Mostafa Mokhtar >Priority: Major > > Columns default to "NOT NULL" if the nullability field is not specified. > This behavior opposes Oracle, Teradata, MSSqlserver, MySQL... -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (KUDU-2602) testRandomBackupAndRestore is flaky
[ https://issues.apache.org/jira/browse/KUDU-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo resolved KUDU-2602. -- Resolution: Fixed Fix Version/s: (was: NA) 1.10.0 Will fixed the follow-on source of flakiness in 1b4910977. > testRandomBackupAndRestore is flaky > --- > > Key: KUDU-2602 > URL: https://issues.apache.org/jira/browse/KUDU-2602 > Project: Kudu > Issue Type: Bug >Reporter: Hao Hao >Assignee: Will Berkeley >Priority: Major > Fix For: 1.10.0 > > Attachments: TEST-org.apache.kudu.backup.TestKuduBackup.xml > > > Saw the following failure with testRandomBackupAndRestore: > {noformat} > java.lang.AssertionError: > expected:<21> but was:<20> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.kudu.backup.TestKuduBackup.testRandomBackupAndRestore(TestKuduBackup.scala:99) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.apache.kudu.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:72) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) > at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBa
[jira] [Updated] (KUDU-2602) testRandomBackupAndRestore is flaky
[ https://issues.apache.org/jira/browse/KUDU-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-2602: - Affects Version/s: 1.10.0 > testRandomBackupAndRestore is flaky > --- > > Key: KUDU-2602 > URL: https://issues.apache.org/jira/browse/KUDU-2602 > Project: Kudu > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Hao Hao >Assignee: Will Berkeley >Priority: Major > Fix For: 1.10.0 > > Attachments: TEST-org.apache.kudu.backup.TestKuduBackup.xml > > > Saw the following failure with testRandomBackupAndRestore: > {noformat} > java.lang.AssertionError: > expected:<21> but was:<20> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.kudu.backup.TestKuduBackup.testRandomBackupAndRestore(TestKuduBackup.scala:99) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.apache.kudu.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:72) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) > at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155) > at > org.gradle.internal.remote.internal.hub.Message
[jira] [Updated] (KUDU-2581) Reject the scan request if it has reached the num or memory threshold
[ https://issues.apache.org/jira/browse/KUDU-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-2581: - Fix Version/s: (was: NA) > Reject the scan request if it has reached the num or memory threshold > - > > Key: KUDU-2581 > URL: https://issues.apache.org/jira/browse/KUDU-2581 > Project: Kudu > Issue Type: New Feature > Components: tserver >Affects Versions: NA >Reporter: Hexin >Priority: Major > > Sometimes we have a lot of scan, which may excess the soft memory limit. The > greater the excess, the higher the chance of causing throttling for writing. > We should achieve a method that rejecting the scan request if it has reached > the soft memory limit or the scanner num has reached the threshold. The > threshold should be estimated by the actual memory of the system running the > kudu. By this way,the system can provide more space for writing. > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-1767) Reordering of client operations from the same KuduSession is possible
[ https://issues.apache.org/jira/browse/KUDU-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1767: Assignee: Adar Dembo > Reordering of client operations from the same KuduSession is possible > - > > Key: KUDU-1767 > URL: https://issues.apache.org/jira/browse/KUDU-1767 > Project: Kudu > Issue Type: Bug > Components: client, tablet >Affects Versions: 1.1.0 >Reporter: Mike Percy >Assignee: Adar Dembo >Priority: Major > Fix For: n/a > > > It is possible for client operations written via the same KuduSession to be > reordered on the server side in MANUAL_FLUSH and AUTO_BACKGROUND_FLUSH modes. > This violates our desired consistency guarantees. > This may occur because we allow concurrent flushes from the client for > throughput reasons and there is nothing enforcing the well-ordering of lock > acquisition from a single client session on the server side. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-1767) Reordering of client operations from the same KuduSession is possible
[ https://issues.apache.org/jira/browse/KUDU-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-1767: Assignee: (was: Adar Dembo) > Reordering of client operations from the same KuduSession is possible > - > > Key: KUDU-1767 > URL: https://issues.apache.org/jira/browse/KUDU-1767 > Project: Kudu > Issue Type: Bug > Components: client, tablet >Affects Versions: 1.1.0 >Reporter: Mike Percy >Priority: Major > Fix For: n/a > > > It is possible for client operations written via the same KuduSession to be > reordered on the server side in MANUAL_FLUSH and AUTO_BACKGROUND_FLUSH modes. > This violates our desired consistency guarantees. > This may occur because we allow concurrent flushes from the client for > throughput reasons and there is nothing enforcing the well-ordering of lock > acquisition from a single client session on the server side. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-1767) Reordering of client operations from the same KuduSession is possible
[ https://issues.apache.org/jira/browse/KUDU-1767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-1767: - Fix Version/s: (was: n/a) > Reordering of client operations from the same KuduSession is possible > - > > Key: KUDU-1767 > URL: https://issues.apache.org/jira/browse/KUDU-1767 > Project: Kudu > Issue Type: Bug > Components: client, tablet >Affects Versions: 1.1.0 >Reporter: Mike Percy >Priority: Major > > It is possible for client operations written via the same KuduSession to be > reordered on the server side in MANUAL_FLUSH and AUTO_BACKGROUND_FLUSH modes. > This violates our desired consistency guarantees. > This may occur because we allow concurrent flushes from the client for > throughput reasons and there is nothing enforcing the well-ordering of lock > acquisition from a single client session on the server side. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-2163) names of "rpc_authentication" do not match in documents
[ https://issues.apache.org/jira/browse/KUDU-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-2163: Assignee: Adar Dembo > names of "rpc_authentication" do not match in documents > --- > > Key: KUDU-2163 > URL: https://issues.apache.org/jira/browse/KUDU-2163 > Project: Kudu > Issue Type: Bug > Components: documentation >Reporter: Jiahongchao >Assignee: Adar Dembo >Priority: Minor > Fix For: n/a > > > in > [http://kudu.apache.org/docs/configuration_reference.html#kudu-master_rpc_authentication],it > is "rpc_authentication" > in [http://kudu.apache.org/docs/security.html],it is "rpc-authentication" -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (KUDU-2646) kudu restart the tablets stats from INITIALIZED change to RUNNING cost a few days
[ https://issues.apache.org/jira/browse/KUDU-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo resolved KUDU-2646. -- Resolution: Invalid > kudu restart the tablets stats from INITIALIZED change to RUNNING cost a few > days > - > > Key: KUDU-2646 > URL: https://issues.apache.org/jira/browse/KUDU-2646 > Project: Kudu > Issue Type: Bug >Reporter: qinzl_1 >Priority: Major > Fix For: n/a > > Attachments: kudu-tserver (1).INFO.gz > > > [^kudu-tserver (1).INFO.gz]i install kudu from cloudera manager ,i have 3 > master and 4 tablet server .do not have any especial config. when i restart > the server, it can not offer service.i found all tablet server is INITIALIZED > , and it spend a long time to change to RUNNING -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KUDU-2192) KRPC should have a timer to close stuck connections
[ https://issues.apache.org/jira/browse/KUDU-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16893174#comment-16893174 ] Adar Dembo commented on KUDU-2192: -- [~kwho] is there anything left to do for this JIRA? > KRPC should have a timer to close stuck connections > --- > > Key: KUDU-2192 > URL: https://issues.apache.org/jira/browse/KUDU-2192 > Project: Kudu > Issue Type: Improvement > Components: rpc >Reporter: Michael Ho >Assignee: Michael Ho >Priority: Major > > If the remote host goes down or its network gets unplugged, all pending RPCs > to that host will be stuck if there is no timeout specified. While those RPCs > which have finished sending their payloads or those which haven't started > sending payloads can be cancelled quickly, those in mid-transmission (i.e. an > RPC at the front of the outbound queue with part of its payload sent already) > cannot be cancelled until the payload has been completely sent. Therefore, > it's beneficial to have a timeout to kill a connection if it's not making any > progress for an extended period of time so the RPC will fail and get unstuck. > The timeout may need to be conservatively large to avoid aggressive closing > of connections due to transient network issue. One can consider augmenting > the existing maintenance thread logic which checks for idle connection to > check for this kind of timeout. Please feel free to propose other > alternatives (e.g. TPC keepalive timeout) in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-2169) Allow replicas that do not exist to vote
[ https://issues.apache.org/jira/browse/KUDU-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-2169: - Fix Version/s: (was: n/a) > Allow replicas that do not exist to vote > > > Key: KUDU-2169 > URL: https://issues.apache.org/jira/browse/KUDU-2169 > Project: Kudu > Issue Type: Sub-task > Components: consensus >Reporter: Mike Percy >Priority: Major > > In certain scenarios it is desirable for replicas that do not exist on a > tablet server to be able to vote. After the implementation of KUDU-871, > tombstoned tablets are now able to vote. However, there are circumstances (at > least in a pre- KUDU-1097 world) where voters that do not have a copy of a > replica (running or tombstoned) would be needed to vote to ensure > availability in certain edge-case failure scenarios. > The quick justification for why it would be safe for a non-existent replica > to vote is that it would be equivalent to a replica that has simply not yet > replicated any WAL entries, in which case it would be legal to vote for any > candidate. Of course, a candidate would only ask such a replica to vote for > it if it believed that replica to be a voter in its config. > Some additional discussion can be found here: > https://github.com/apache/kudu/blob/master/docs/design-docs/raft-tablet-copy.md#should-a-server-be-allowed-to-vote-if-it-does_not_exist-or-is-deleted > What follows is an example of a scenario where "non-existent" replicas being > able to vote would be desired: > In a 3-2-3 re-replication paradigm, the leader (A) of a 3-replica config \{A, > B, C\} evicts one replica (C). Then, the leader (A) adds a new voter (D). > Before A is able to replicate this config change to B or D, A is partitioned > from a network perspective. However A writes this config change to its local > WAL. After this, the entire cluster is brought down, the network is restored, > and the entire cluster is restarted. However, B fails to come back online due > to a hardware failure. > The only way to automatically recover in this scenario is to allow D, which > has no concept of the tablet being discussed, to vote for A to become leader, > which will then tablet copy to D and make the tablet available for writes. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-2192) KRPC should have a timer to close stuck connections
[ https://issues.apache.org/jira/browse/KUDU-2192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-2192: - Fix Version/s: (was: n/a) > KRPC should have a timer to close stuck connections > --- > > Key: KUDU-2192 > URL: https://issues.apache.org/jira/browse/KUDU-2192 > Project: Kudu > Issue Type: Improvement > Components: rpc >Reporter: Michael Ho >Assignee: Michael Ho >Priority: Major > > If the remote host goes down or its network gets unplugged, all pending RPCs > to that host will be stuck if there is no timeout specified. While those RPCs > which have finished sending their payloads or those which haven't started > sending payloads can be cancelled quickly, those in mid-transmission (i.e. an > RPC at the front of the outbound queue with part of its payload sent already) > cannot be cancelled until the payload has been completely sent. Therefore, > it's beneficial to have a timeout to kill a connection if it's not making any > progress for an extended period of time so the RPC will fail and get unstuck. > The timeout may need to be conservatively large to avoid aggressive closing > of connections due to transient network issue. One can consider augmenting > the existing maintenance thread logic which checks for idle connection to > check for this kind of timeout. Please feel free to propose other > alternatives (e.g. TPC keepalive timeout) in this JIRA. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-2195) Enforce durability happened before relationships on multiple disks
[ https://issues.apache.org/jira/browse/KUDU-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-2195: - Fix Version/s: (was: 1.9.0) > Enforce durability happened before relationships on multiple disks > -- > > Key: KUDU-2195 > URL: https://issues.apache.org/jira/browse/KUDU-2195 > Project: Kudu > Issue Type: Bug > Components: consensus, tablet >Affects Versions: 1.9.0 >Reporter: David Alves >Priority: Major > > When using weaker durability semantics (e.g. when log_force_fsync is off) we > should still enforce certain happened before relationships which are not > currently being enforced when using different disks for the wal and data. > The two cases that come to mind where this is relevant are: > 1) cmeta (c) -> wal (w) : We flush cmeta before flushing the wal (for > instance on term change) with the intention that either {}, \{c} or \{c, w} > were made durable. > 2) wal (w) -> tablet meta (t): We flush the wal before tablet metadata to > make sure that that all commit messages that refer to on disk row sets (and > deltas) are on disk before the row sets they point to, i.e. with the > intention that either {}, \{w} or \{w, t} were made durable. > With strong durability semantics these are always made durable in the right > order. With weaker semantics that is not the case though. If using the same > disk for both the wal and data then the invariants are still preserved, as > buffers will be flushed in the right order but if using different disks for > the wal and data (and because cmeta is stored with the data) that is not > always the case. > 1) in ext4 is actually safe, because we perform an fsync (indirect, rename() > implies fsync in ext4) when flushing cmeta. But it is not for xfs. > 2) Is not safe in either filesystem. > --- Possible solutions -- > For 1): Store cmeta with the wal; actually always fsync cmeta. > For 2): Store tablet meta with the wal; always fsync the wal before flushing > tablet meta. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-2195) Enforce durability happened before relationships on multiple disks
[ https://issues.apache.org/jira/browse/KUDU-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-2195: Assignee: Adar Dembo > Enforce durability happened before relationships on multiple disks > -- > > Key: KUDU-2195 > URL: https://issues.apache.org/jira/browse/KUDU-2195 > Project: Kudu > Issue Type: Bug > Components: consensus, tablet >Affects Versions: 1.9.0 >Reporter: David Alves >Assignee: Adar Dembo >Priority: Major > > When using weaker durability semantics (e.g. when log_force_fsync is off) we > should still enforce certain happened before relationships which are not > currently being enforced when using different disks for the wal and data. > The two cases that come to mind where this is relevant are: > 1) cmeta (c) -> wal (w) : We flush cmeta before flushing the wal (for > instance on term change) with the intention that either {}, \{c} or \{c, w} > were made durable. > 2) wal (w) -> tablet meta (t): We flush the wal before tablet metadata to > make sure that that all commit messages that refer to on disk row sets (and > deltas) are on disk before the row sets they point to, i.e. with the > intention that either {}, \{w} or \{w, t} were made durable. > With strong durability semantics these are always made durable in the right > order. With weaker semantics that is not the case though. If using the same > disk for both the wal and data then the invariants are still preserved, as > buffers will be flushed in the right order but if using different disks for > the wal and data (and because cmeta is stored with the data) that is not > always the case. > 1) in ext4 is actually safe, because we perform an fsync (indirect, rename() > implies fsync in ext4) when flushing cmeta. But it is not for xfs. > 2) Is not safe in either filesystem. > --- Possible solutions -- > For 1): Store cmeta with the wal; actually always fsync cmeta. > For 2): Store tablet meta with the wal; always fsync the wal before flushing > tablet meta. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-2195) Enforce durability happened before relationships on multiple disks
[ https://issues.apache.org/jira/browse/KUDU-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo reassigned KUDU-2195: Assignee: (was: Adar Dembo) > Enforce durability happened before relationships on multiple disks > -- > > Key: KUDU-2195 > URL: https://issues.apache.org/jira/browse/KUDU-2195 > Project: Kudu > Issue Type: Bug > Components: consensus, tablet >Affects Versions: 1.9.0 >Reporter: David Alves >Priority: Major > > When using weaker durability semantics (e.g. when log_force_fsync is off) we > should still enforce certain happened before relationships which are not > currently being enforced when using different disks for the wal and data. > The two cases that come to mind where this is relevant are: > 1) cmeta (c) -> wal (w) : We flush cmeta before flushing the wal (for > instance on term change) with the intention that either {}, \{c} or \{c, w} > were made durable. > 2) wal (w) -> tablet meta (t): We flush the wal before tablet metadata to > make sure that that all commit messages that refer to on disk row sets (and > deltas) are on disk before the row sets they point to, i.e. with the > intention that either {}, \{w} or \{w, t} were made durable. > With strong durability semantics these are always made durable in the right > order. With weaker semantics that is not the case though. If using the same > disk for both the wal and data then the invariants are still preserved, as > buffers will be flushed in the right order but if using different disks for > the wal and data (and because cmeta is stored with the data) that is not > always the case. > 1) in ext4 is actually safe, because we perform an fsync (indirect, rename() > implies fsync in ext4) when flushing cmeta. But it is not for xfs. > 2) Is not safe in either filesystem. > --- Possible solutions -- > For 1): Store cmeta with the wal; actually always fsync cmeta. > For 2): Store tablet meta with the wal; always fsync the wal before flushing > tablet meta. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-2615) Add more tools for dealing with the WAL
[ https://issues.apache.org/jira/browse/KUDU-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-2615: - Fix Version/s: (was: 1.9.0) > Add more tools for dealing with the WAL > --- > > Key: KUDU-2615 > URL: https://issues.apache.org/jira/browse/KUDU-2615 > Project: Kudu > Issue Type: Improvement > Components: log >Affects Versions: 1.8.0 >Reporter: Will Berkeley >Assignee: Will Berkeley >Priority: Major > > There's only one tool for dealing with WALs, the dump tool, which lets you > see what's in the WAL by reading it like a Kudu server would and then dumping > it as debug strings. > We could probably use a few more, though. This JIRA serves as a tracking JIRA > for new WAL tools and as a parent JIRA for some existing WAL tool JIRAs. > Tools we could add or enhancements we could make: > * 'kudu wal dump --debug' with batch offsets. This would have been useful to > me a couple times when I've done WAL surgery. > * 'kudu wal dump repair' maybe with truncate capability (KUDU-1503) or the > capability to 'punch out' an entry. > * A 'kudu wal compare' tool that can compare two WALs and show their > differences, so the WALs of two replicas of the same tablet could be compared. > * A 'kudu wal edit' tool that can edit some kinds of wal entries-- the use > case for this is editing the hostports of peers in config change operations > (KUDU-2396). -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-2195) Enforce durability happened before relationships on multiple disks
[ https://issues.apache.org/jira/browse/KUDU-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-2195: - Affects Version/s: 1.9.0 > Enforce durability happened before relationships on multiple disks > -- > > Key: KUDU-2195 > URL: https://issues.apache.org/jira/browse/KUDU-2195 > Project: Kudu > Issue Type: Bug > Components: consensus, tablet >Affects Versions: 1.9.0 >Reporter: David Alves >Priority: Major > Fix For: 1.9.0 > > > When using weaker durability semantics (e.g. when log_force_fsync is off) we > should still enforce certain happened before relationships which are not > currently being enforced when using different disks for the wal and data. > The two cases that come to mind where this is relevant are: > 1) cmeta (c) -> wal (w) : We flush cmeta before flushing the wal (for > instance on term change) with the intention that either {}, \{c} or \{c, w} > were made durable. > 2) wal (w) -> tablet meta (t): We flush the wal before tablet metadata to > make sure that that all commit messages that refer to on disk row sets (and > deltas) are on disk before the row sets they point to, i.e. with the > intention that either {}, \{w} or \{w, t} were made durable. > With strong durability semantics these are always made durable in the right > order. With weaker semantics that is not the case though. If using the same > disk for both the wal and data then the invariants are still preserved, as > buffers will be flushed in the right order but if using different disks for > the wal and data (and because cmeta is stored with the data) that is not > always the case. > 1) in ext4 is actually safe, because we perform an fsync (indirect, rename() > implies fsync in ext4) when flushing cmeta. But it is not for xfs. > 2) Is not safe in either filesystem. > --- Possible solutions -- > For 1): Store cmeta with the wal; actually always fsync cmeta. > For 2): Store tablet meta with the wal; always fsync the wal before flushing > tablet meta. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KUDU-2811) Fuzz test needed for backup-restore
[ https://issues.apache.org/jira/browse/KUDU-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adar Dembo updated KUDU-2811: - Fix Version/s: (was: 1.10.0) > Fuzz test needed for backup-restore > --- > > Key: KUDU-2811 > URL: https://issues.apache.org/jira/browse/KUDU-2811 > Project: Kudu > Issue Type: Bug > Components: backup >Affects Versions: 1.9.0 >Reporter: Will Berkeley >Priority: Major > Labels: backup > > We need to fuzz test backup-restore by having a test that creates a table > through a random sequence of operations while also randomly doing incremental > backups. We should then check the restored table against the original table. > This would have caught KUDU-2809. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KUDU-2618) Factor the amount of data into time-based flush decisions
[ https://issues.apache.org/jira/browse/KUDU-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Berkeley reassigned KUDU-2618: --- Assignee: (was: Will Berkeley) > Factor the amount of data into time-based flush decisions > - > > Key: KUDU-2618 > URL: https://issues.apache.org/jira/browse/KUDU-2618 > Project: Kudu > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Will Berkeley >Priority: Major > > Pure time-based flush can cause small rowset problems when the rate of > inserts is so low that hardly any data accumulates before it is flushed. > On the other hand, cribbing an example from Todd from the KUDU-1400 design > doc: > bq. if you configure your TS to allow 100G of heap, and insert 30G of data > spread across 30 tablets (1G each tablet being lower than the default > size-based threshold), would you want it to ever flush to disk? or just sit > there in RAM? The restart could be relatively slow if it never flushed, and > also scans of MRS are slower than DRS. > As Todd goes on to say > bq. That said, we could probably make the "time-based flush" somehow related > to the amount of data, so that we wait a long time to flush if it's only > 10kb, but still flush relatively quickly if it's many MB. > We should tune time-based flush to wait on average a shorter time to flush if > the amount to flush is enough for 1 or more "full-sized" diskrowsets than if > the flush is of less data than a full diskrowset. -- This message was sent by Atlassian JIRA (v7.6.14#76016)