[jira] [Resolved] (KUDU-1929) [rpc] Allow using encrypted private keys for TLS

2017-07-07 Thread Sailesh Mukil (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailesh Mukil resolved KUDU-1929.
-
   Resolution: Fixed
Fix Version/s: 1.5.0

Commit in:
https://github.com/apache/kudu/commit/57a07ae7217b63c51651611427f1af029d54d4fe

> [rpc] Allow using encrypted private keys for TLS
> 
>
> Key: KUDU-1929
> URL: https://issues.apache.org/jira/browse/KUDU-1929
> Project: Kudu
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
> Fix For: 1.5.0
>
>
> Currently, for internal RPC communication, we aren't able to handle encrypted 
> private keys. This can be done by using the OpenSSL APIs:
> SSL_CTX_set_default_passwd_cb()
> SSL_CTX_set_default_passwd_cb_userdata()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KUDU-2033) Add a 'torture' scenario to verify Java client's behavior during fail-over

2017-07-07 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2033:

Description: 
For the Kudu Java client we have {{TestLeaderFailover}} test which verifies how 
the client handles the tablet server fail-over scenario.  However, the test 
covers only one fail-over event and mainly performs write operations while the 
backend handles the 'unexpected crash' of the tablet server.

It would be nice to add more tests which cover the client's fail-over behavior:
  * add the mixed workload scenario, i.e. combine inserts/scans during the 
fail-over
  * induce more fail-over events while running the scenario, i.e. pause and 
then resume the tservers processes many more times and run the test longer
  * add the multi-master scenario, where both the leader tserver and leader 
master 'unexpectedly crash' during the run
  * in the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
mode to exercise RYW (Read-Your-Writes) behavior and add assertions to make 
sure the RYW behavior is observed as expected
   

  was:
For the Kudu Java client we have {{TestLeaderFailover}} test which verifies how 
the client handles the tablet server fail-over scenario.  However, the test 
covers only one fail-over event and mainly performs write operations while the 
backend handles the 'unexpected crash' of the tablet server.

It would be nice to add more tests which cover the client's fail-over behavior:
  * add the mixed workload scenario, i.e. combine inserts/scans during the 
fail-over
  * induce more fail-over events while running the scenario, i.e. pause and 
then resume the tservers processes many more times and run the test longer
  * add the multi-master scenario, where both the leader tserver and leader 
master 'unexpectedly crash' during the run
  * in the mixed workload scenarios, run scan operations in READ_AT_TIMESTAMP 
mode to exercise RYW (Read-Your-Writes) behavior and add assertions to make 
sure the RYW behavior is observed as expected
   


> Add a 'torture' scenario to verify Java client's behavior during fail-over 
> ---
>
> Key: KUDU-2033
> URL: https://issues.apache.org/jira/browse/KUDU-2033
> Project: Kudu
>  Issue Type: Test
>  Components: client, java
>Reporter: Alexey Serbin
>Assignee: Edward Fancher
>  Labels: newbie, newbie++
>
> For the Kudu Java client we have {{TestLeaderFailover}} test which verifies 
> how the client handles the tablet server fail-over scenario.  However, the 
> test covers only one fail-over event and mainly performs write operations 
> while the backend handles the 'unexpected crash' of the tablet server.
> It would be nice to add more tests which cover the client's fail-over 
> behavior:
>   * add the mixed workload scenario, i.e. combine inserts/scans during the 
> fail-over
>   * induce more fail-over events while running the scenario, i.e. pause and 
> then resume the tservers processes many more times and run the test longer
>   * add the multi-master scenario, where both the leader tserver and leader 
> master 'unexpectedly crash' during the run
>   * in the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
> mode to exercise RYW (Read-Your-Writes) behavior and add assertions to make 
> sure the RYW behavior is observed as expected
>



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-694) Re-visit C++ client scan retry logic

2017-07-07 Thread Alexey Serbin (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078475#comment-16078475
 ] 

Alexey Serbin commented on KUDU-694:


As a side-note, it would be nice to understand whether some of our tests cover 
those issues at all.

Probably, the best way of categorization and addressing the scan retry logic 
would be putting up a set of use-cases and create a set of tests asserting the 
desired behavior.  Most likely, more than 50% of that is already covered by 
existing tests, but I'm not sure it's any close to the 100% mark.

> Re-visit C++ client scan retry logic
> 
>
> Key: KUDU-694
> URL: https://issues.apache.org/jira/browse/KUDU-694
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: Private Beta
>Reporter: Andrew Wang
>
> There are a number of remaining issues with scanner robustness, even after 
> KUDU-597:
> * Once a node is marked as failed, it will not be used again in the call. 
> This is more of an issue with longer timeouts (since the node is more likely 
> to come back), or if the scan is LEADER_ONLY (since only one node being down 
> leads to unavailability).
> * In the LEADER_ONLY case, since we don't refresh quorum information within 
> the call, we won't recover when a failover happens.
> * The scanner code calls a number of other RPCs that are not retried on 
> error, i.e. LookupTabletByKey or RefreshProxy's DNS resolution in 
> GetTabletServer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (KUDU-694) Re-visit C++ client scan retry logic

2017-07-07 Thread Alexey Serbin (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078446#comment-16078446
 ] 

Alexey Serbin edited comment on KUDU-694 at 7/7/17 6:08 PM:


An update to summarize current state of affairs (as far as I could see):
* The first item still holds true.  Marking the server failed is specific for 
the tablet, so if querying some other tablet on the same server will not be 
affected by the mark done for prior one.  But it still affects the scans with 
the LEADER_ONLY selector.
* Not failing-over to another leader during the call is addressed: if there was 
an error from the server hosting the leader tablet (or any other tablet), the 
{{LookupRpc::SendRpc()}} will not use the 'fast path' and do server resolution 
again calling {{MasterServerProxy::GetTableLocationsAsync()}}
* {{GetTabletServer()}} is retried from the upper level (i.e. in 
KuduScanner::Data::OpenTablet()), but a failure of DNS resolution in the path 
of {{KuduClient::Data::GetTabletServer()}} will result in a non-retriable error 
returned to the top-level from {{KuduScanner::Data::OpenTablet()}}.  Also, I 
suspect there other places like that -- an additional revision is needed.  
Besides, we need to understand whether it makes sense to retry in such cases.


was (Author: aserbin):
An update to summarize current state of affairs (as far as I could see):
* The first item still holds Marking the server failed is specific for the 
tablet, so if querying some other tablet on the same server will not be 
affected by the mark done for prior one.  But it still affects the scans with 
the LEADER_ONLY selector.
* Not failing-over to another leader during the call is addressed: if there was 
an error from the server hosting the leader tablet (or any other tablet), the 
{{LookupRpc::SendRpc()}} will not use the 'fast path' and do server resolution 
again calling {{MasterServerProxy::GetTableLocationsAsync()}}
* The non-retried {{GetTabletServer()}} is retried from the upper level (i.e. 
in KuduScanner::Data::OpenTablet()), but a failure of DNS resolution in the 
path of {{KuduClient::Data::GetTabletServer()}} will result in a non-retriable 
error returned to the top-level from {{KuduScanner::Data::OpenTablet()}}.  
Also, I suspect there other places like that -- an additional revision is 
needed.  Besides, we need to understand whether it makes sense to retry in such 
cases.

> Re-visit C++ client scan retry logic
> 
>
> Key: KUDU-694
> URL: https://issues.apache.org/jira/browse/KUDU-694
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: Private Beta
>Reporter: Andrew Wang
>
> There are a number of remaining issues with scanner robustness, even after 
> KUDU-597:
> * Once a node is marked as failed, it will not be used again in the call. 
> This is more of an issue with longer timeouts (since the node is more likely 
> to come back), or if the scan is LEADER_ONLY (since only one node being down 
> leads to unavailability).
> * In the LEADER_ONLY case, since we don't refresh quorum information within 
> the call, we won't recover when a failover happens.
> * The scanner code calls a number of other RPCs that are not retried on 
> error, i.e. LookupTabletByKey or RefreshProxy's DNS resolution in 
> GetTabletServer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-694) Re-visit C++ client scan retry logic

2017-07-07 Thread Alexey Serbin (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16078446#comment-16078446
 ] 

Alexey Serbin commented on KUDU-694:


An update to summarize current state of affairs (as far as I could see):
* The first item still holds Marking the server failed is specific for the 
tablet, so if querying some other tablet on the same server will not be 
affected by the mark done for prior one.  But it still affects the scans with 
the LEADER_ONLY selector.
* Not failing-over to another leader during the call is addressed: if there was 
an error from the server hosting the leader tablet (or any other tablet), the 
{{LookupRpc::SendRpc()}} will not use the 'fast path' and do server resolution 
again calling {{MasterServerProxy::GetTableLocationsAsync()}}
* The non-retried {{GetTabletServer()}} is retried from the upper level (i.e. 
in KuduScanner::Data::OpenTablet()), but a failure of DNS resolution in the 
path of {{KuduClient::Data::GetTabletServer()}} will result in a non-retriable 
error returned to the top-level from {{KuduScanner::Data::OpenTablet()}}.  
Also, I suspect there other places like that -- an additional revision is 
needed.  Besides, we need to understand whether it makes sense to retry in such 
cases.

> Re-visit C++ client scan retry logic
> 
>
> Key: KUDU-694
> URL: https://issues.apache.org/jira/browse/KUDU-694
> Project: Kudu
>  Issue Type: Bug
>  Components: client
>Affects Versions: Private Beta
>Reporter: Andrew Wang
>
> There are a number of remaining issues with scanner robustness, even after 
> KUDU-597:
> * Once a node is marked as failed, it will not be used again in the call. 
> This is more of an issue with longer timeouts (since the node is more likely 
> to come back), or if the scan is LEADER_ONLY (since only one node being down 
> leads to unavailability).
> * In the LEADER_ONLY case, since we don't refresh quorum information within 
> the call, we won't recover when a failover happens.
> * The scanner code calls a number of other RPCs that are not retried on 
> error, i.e. LookupTabletByKey or RefreshProxy's DNS resolution in 
> GetTabletServer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KUDU-2033) Add a 'torture' scenario to verify Java client's behavior during fail-over

2017-07-07 Thread Edward Fancher (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Fancher reassigned KUDU-2033:


Assignee: Edward Fancher

> Add a 'torture' scenario to verify Java client's behavior during fail-over 
> ---
>
> Key: KUDU-2033
> URL: https://issues.apache.org/jira/browse/KUDU-2033
> Project: Kudu
>  Issue Type: Test
>  Components: client, java
>Reporter: Alexey Serbin
>Assignee: Edward Fancher
>  Labels: newbie, newbie++
>
> For the Kudu Java client we have {{TestLeaderFailover}} test which verifies 
> how the client handles the tablet server fail-over scenario.  However, the 
> test covers only one fail-over event and mainly performs write operations 
> while the backend handles the 'unexpected crash' of the tablet server.
> It would be nice to add more tests which cover the client's fail-over 
> behavior:
>   * add the mixed workload scenario, i.e. combine inserts/scans during the 
> fail-over
>   * induce more fail-over events while running the scenario, i.e. pause and 
> then resume the tservers processes many more times and run the test longer
>   * add the multi-master scenario, where both the leader tserver and leader 
> master 'unexpectedly crash' during the run
>   * in the mixed workload scenarios, run scan operations in READ_AT_TIMESTAMP 
> mode to exercise RYW (Read-Your-Writes) behavior and add assertions to make 
> sure the RYW behavior is observed as expected
>



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)