[jira] [Resolved] (KUDU-1929) [rpc] Allow using encrypted private keys for TLS
[ https://issues.apache.org/jira/browse/KUDU-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sailesh Mukil resolved KUDU-1929. - Resolution: Fixed Fix Version/s: 1.5.0 Commit in: https://github.com/apache/kudu/commit/57a07ae7217b63c51651611427f1af029d54d4fe > [rpc] Allow using encrypted private keys for TLS > > > Key: KUDU-1929 > URL: https://issues.apache.org/jira/browse/KUDU-1929 > Project: Kudu > Issue Type: Improvement > Components: rpc >Reporter: Sailesh Mukil >Assignee: Sailesh Mukil > Fix For: 1.5.0 > > > Currently, for internal RPC communication, we aren't able to handle encrypted > private keys. This can be done by using the OpenSSL APIs: > SSL_CTX_set_default_passwd_cb() > SSL_CTX_set_default_passwd_cb_userdata() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KUDU-2033) Add a 'torture' scenario to verify Java client's behavior during fail-over
[ https://issues.apache.org/jira/browse/KUDU-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Serbin updated KUDU-2033: Description: For the Kudu Java client we have {{TestLeaderFailover}} test which verifies how the client handles the tablet server fail-over scenario. However, the test covers only one fail-over event and mainly performs write operations while the backend handles the 'unexpected crash' of the tablet server. It would be nice to add more tests which cover the client's fail-over behavior: * add the mixed workload scenario, i.e. combine inserts/scans during the fail-over * induce more fail-over events while running the scenario, i.e. pause and then resume the tservers processes many more times and run the test longer * add the multi-master scenario, where both the leader tserver and leader master 'unexpectedly crash' during the run * in the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT mode to exercise RYW (Read-Your-Writes) behavior and add assertions to make sure the RYW behavior is observed as expected was: For the Kudu Java client we have {{TestLeaderFailover}} test which verifies how the client handles the tablet server fail-over scenario. However, the test covers only one fail-over event and mainly performs write operations while the backend handles the 'unexpected crash' of the tablet server. It would be nice to add more tests which cover the client's fail-over behavior: * add the mixed workload scenario, i.e. combine inserts/scans during the fail-over * induce more fail-over events while running the scenario, i.e. pause and then resume the tservers processes many more times and run the test longer * add the multi-master scenario, where both the leader tserver and leader master 'unexpectedly crash' during the run * in the mixed workload scenarios, run scan operations in READ_AT_TIMESTAMP mode to exercise RYW (Read-Your-Writes) behavior and add assertions to make sure the RYW behavior is observed as expected > Add a 'torture' scenario to verify Java client's behavior during fail-over > --- > > Key: KUDU-2033 > URL: https://issues.apache.org/jira/browse/KUDU-2033 > Project: Kudu > Issue Type: Test > Components: client, java >Reporter: Alexey Serbin >Assignee: Edward Fancher > Labels: newbie, newbie++ > > For the Kudu Java client we have {{TestLeaderFailover}} test which verifies > how the client handles the tablet server fail-over scenario. However, the > test covers only one fail-over event and mainly performs write operations > while the backend handles the 'unexpected crash' of the tablet server. > It would be nice to add more tests which cover the client's fail-over > behavior: > * add the mixed workload scenario, i.e. combine inserts/scans during the > fail-over > * induce more fail-over events while running the scenario, i.e. pause and > then resume the tservers processes many more times and run the test longer > * add the multi-master scenario, where both the leader tserver and leader > master 'unexpectedly crash' during the run > * in the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT > mode to exercise RYW (Read-Your-Writes) behavior and add assertions to make > sure the RYW behavior is observed as expected > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KUDU-694) Re-visit C++ client scan retry logic
[ https://issues.apache.org/jira/browse/KUDU-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078475#comment-16078475 ] Alexey Serbin commented on KUDU-694: As a side-note, it would be nice to understand whether some of our tests cover those issues at all. Probably, the best way of categorization and addressing the scan retry logic would be putting up a set of use-cases and create a set of tests asserting the desired behavior. Most likely, more than 50% of that is already covered by existing tests, but I'm not sure it's any close to the 100% mark. > Re-visit C++ client scan retry logic > > > Key: KUDU-694 > URL: https://issues.apache.org/jira/browse/KUDU-694 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: Private Beta >Reporter: Andrew Wang > > There are a number of remaining issues with scanner robustness, even after > KUDU-597: > * Once a node is marked as failed, it will not be used again in the call. > This is more of an issue with longer timeouts (since the node is more likely > to come back), or if the scan is LEADER_ONLY (since only one node being down > leads to unavailability). > * In the LEADER_ONLY case, since we don't refresh quorum information within > the call, we won't recover when a failover happens. > * The scanner code calls a number of other RPCs that are not retried on > error, i.e. LookupTabletByKey or RefreshProxy's DNS resolution in > GetTabletServer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KUDU-694) Re-visit C++ client scan retry logic
[ https://issues.apache.org/jira/browse/KUDU-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078446#comment-16078446 ] Alexey Serbin edited comment on KUDU-694 at 7/7/17 6:08 PM: An update to summarize current state of affairs (as far as I could see): * The first item still holds true. Marking the server failed is specific for the tablet, so if querying some other tablet on the same server will not be affected by the mark done for prior one. But it still affects the scans with the LEADER_ONLY selector. * Not failing-over to another leader during the call is addressed: if there was an error from the server hosting the leader tablet (or any other tablet), the {{LookupRpc::SendRpc()}} will not use the 'fast path' and do server resolution again calling {{MasterServerProxy::GetTableLocationsAsync()}} * {{GetTabletServer()}} is retried from the upper level (i.e. in KuduScanner::Data::OpenTablet()), but a failure of DNS resolution in the path of {{KuduClient::Data::GetTabletServer()}} will result in a non-retriable error returned to the top-level from {{KuduScanner::Data::OpenTablet()}}. Also, I suspect there other places like that -- an additional revision is needed. Besides, we need to understand whether it makes sense to retry in such cases. was (Author: aserbin): An update to summarize current state of affairs (as far as I could see): * The first item still holds Marking the server failed is specific for the tablet, so if querying some other tablet on the same server will not be affected by the mark done for prior one. But it still affects the scans with the LEADER_ONLY selector. * Not failing-over to another leader during the call is addressed: if there was an error from the server hosting the leader tablet (or any other tablet), the {{LookupRpc::SendRpc()}} will not use the 'fast path' and do server resolution again calling {{MasterServerProxy::GetTableLocationsAsync()}} * The non-retried {{GetTabletServer()}} is retried from the upper level (i.e. in KuduScanner::Data::OpenTablet()), but a failure of DNS resolution in the path of {{KuduClient::Data::GetTabletServer()}} will result in a non-retriable error returned to the top-level from {{KuduScanner::Data::OpenTablet()}}. Also, I suspect there other places like that -- an additional revision is needed. Besides, we need to understand whether it makes sense to retry in such cases. > Re-visit C++ client scan retry logic > > > Key: KUDU-694 > URL: https://issues.apache.org/jira/browse/KUDU-694 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: Private Beta >Reporter: Andrew Wang > > There are a number of remaining issues with scanner robustness, even after > KUDU-597: > * Once a node is marked as failed, it will not be used again in the call. > This is more of an issue with longer timeouts (since the node is more likely > to come back), or if the scan is LEADER_ONLY (since only one node being down > leads to unavailability). > * In the LEADER_ONLY case, since we don't refresh quorum information within > the call, we won't recover when a failover happens. > * The scanner code calls a number of other RPCs that are not retried on > error, i.e. LookupTabletByKey or RefreshProxy's DNS resolution in > GetTabletServer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KUDU-694) Re-visit C++ client scan retry logic
[ https://issues.apache.org/jira/browse/KUDU-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16078446#comment-16078446 ] Alexey Serbin commented on KUDU-694: An update to summarize current state of affairs (as far as I could see): * The first item still holds Marking the server failed is specific for the tablet, so if querying some other tablet on the same server will not be affected by the mark done for prior one. But it still affects the scans with the LEADER_ONLY selector. * Not failing-over to another leader during the call is addressed: if there was an error from the server hosting the leader tablet (or any other tablet), the {{LookupRpc::SendRpc()}} will not use the 'fast path' and do server resolution again calling {{MasterServerProxy::GetTableLocationsAsync()}} * The non-retried {{GetTabletServer()}} is retried from the upper level (i.e. in KuduScanner::Data::OpenTablet()), but a failure of DNS resolution in the path of {{KuduClient::Data::GetTabletServer()}} will result in a non-retriable error returned to the top-level from {{KuduScanner::Data::OpenTablet()}}. Also, I suspect there other places like that -- an additional revision is needed. Besides, we need to understand whether it makes sense to retry in such cases. > Re-visit C++ client scan retry logic > > > Key: KUDU-694 > URL: https://issues.apache.org/jira/browse/KUDU-694 > Project: Kudu > Issue Type: Bug > Components: client >Affects Versions: Private Beta >Reporter: Andrew Wang > > There are a number of remaining issues with scanner robustness, even after > KUDU-597: > * Once a node is marked as failed, it will not be used again in the call. > This is more of an issue with longer timeouts (since the node is more likely > to come back), or if the scan is LEADER_ONLY (since only one node being down > leads to unavailability). > * In the LEADER_ONLY case, since we don't refresh quorum information within > the call, we won't recover when a failover happens. > * The scanner code calls a number of other RPCs that are not retried on > error, i.e. LookupTabletByKey or RefreshProxy's DNS resolution in > GetTabletServer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KUDU-2033) Add a 'torture' scenario to verify Java client's behavior during fail-over
[ https://issues.apache.org/jira/browse/KUDU-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Fancher reassigned KUDU-2033: Assignee: Edward Fancher > Add a 'torture' scenario to verify Java client's behavior during fail-over > --- > > Key: KUDU-2033 > URL: https://issues.apache.org/jira/browse/KUDU-2033 > Project: Kudu > Issue Type: Test > Components: client, java >Reporter: Alexey Serbin >Assignee: Edward Fancher > Labels: newbie, newbie++ > > For the Kudu Java client we have {{TestLeaderFailover}} test which verifies > how the client handles the tablet server fail-over scenario. However, the > test covers only one fail-over event and mainly performs write operations > while the backend handles the 'unexpected crash' of the tablet server. > It would be nice to add more tests which cover the client's fail-over > behavior: > * add the mixed workload scenario, i.e. combine inserts/scans during the > fail-over > * induce more fail-over events while running the scenario, i.e. pause and > then resume the tservers processes many more times and run the test longer > * add the multi-master scenario, where both the leader tserver and leader > master 'unexpectedly crash' during the run > * in the mixed workload scenarios, run scan operations in READ_AT_TIMESTAMP > mode to exercise RYW (Read-Your-Writes) behavior and add assertions to make > sure the RYW behavior is observed as expected > -- This message was sent by Atlassian JIRA (v6.4.14#64029)