CurtHagenlocher commented on code in PR #2610: URL: https://github.com/apache/arrow-adbc/pull/2610#discussion_r2010810377
########## csharp/src/Drivers/Apache/Hive2/README.md: ########## @@ -35,11 +35,15 @@ but can also be passed in the call to `AdbcDatabase.Connect`. | `username` | The user name used for basic authentication | | | `password` | The password for the user name used for basic authentication. | | | `adbc.hive.data_type_conv` | Comma-separated list of data conversion options. Each option indicates the type of conversion to perform on data returned from the Hive server. <br><br>Allowed values: `none`, `scalar`. <br><br>Option `none` indicates there is no conversion from Hive type to native type (i.e., no conversion from String to Timestamp for Apache Hive over HTTP). Example `adbc.hive.conv_data_type=none`. <br><br>Option `scalar` will perform conversion (if necessary) from the Hive data type to corresponding Arrow data types for types `DATE/Date32/DateTime`, `DECIMAL/Decimal128/SqlDecimal`, and `TIMESTAMP/Timestamp/DateTimeOffset`. Example `adbc.hive.conv_data_type=scalar` | `scalar` | -| `adbc.hive.tls_options` | Comma-separated list of TLS/SSL options. Each option indicates the TLS/SSL option when connecting to a Hive server. <br><br>Allowed values: `allow_self_signed`, `allow_hostname_mismatch`. <br><br>Option `allow_self_signed` allows certificate errors due to an unknown certificate authority, typically when using a self-signed certificate. Option `allow_hostname_mismatch` allow certificate errors due to a mismatch of the hostname. (e.g., when connecting through an SSH tunnel). Example `adbc.hive.tls_options=allow_self_signed` | | | `adbc.hive.connect_timeout_ms` | Sets the timeout (in milliseconds) to open a new session. Values can be 0 (infinite) or greater than zero. | `30000` | | `adbc.apache.statement.batch_size` | Sets the maximum number of rows to retrieve in a single batch request. | `50000` | | `adbc.apache.statement.polltime_ms` | If polling is necessary to get a result, this option sets the length of time (in milliseconds) to wait between polls. | `500` | | `adbc.apache.statement.query_timeout_s` | Sets the maximum time (in seconds) for a query to complete. Values can be 0 (infinite) or greater than zero. | `60` | +| `adbc.apache.ssl` | If ssl needs to enabled or not. One of `True`, `False` | `False` | Review Comment: Two high-level comments on the proposed API changes: 1) I think it would good to group these together logically as they're logically related, so it would be good if they all started with the same distinct prefix. 2) While "SSL" was the original protocol developed for security HTTP, it has since been succeeded by "TLS" and so I think it's broadly preferable to use "TLS" instead of "SSL". Might I propose the following set of options? `adbc.http_options.tls.enabled`: Whether or not to use TLS, One of `True`, `False` `adbc.http_options.tls.allow_self_signed`: ... `adbc.http_options.tls.trusted_certificate_path`: etc. ########## csharp/src/Drivers/Apache/Spark/README.md: ########## @@ -36,11 +36,15 @@ but can also be passed in the call to `AdbcDatabase.Connect`. | `username` | The user name used for basic authentication | | | `password` | The password for the user name used for basic authentication. | | | `adbc.spark.data_type_conv` | Comma-separated list of data conversion options. Each option indicates the type of conversion to perform on data returned from the Spark server. <br><br>Allowed values: `none`, `scalar`. <br><br>Option `none` indicates there is no conversion from Spark type to native type (i.e., no conversion from String to Timestamp for Apache Spark over HTTP). Example `adbc.spark.conv_data_type=none`. <br><br>Option `scalar` will perform conversion (if necessary) from the Spark data type to corresponding Arrow data types for types `DATE/Date32/DateTime`, `DECIMAL/Decimal128/SqlDecimal`, and `TIMESTAMP/Timestamp/DateTimeOffset`. Example `adbc.spark.conv_data_type=scalar` | `scalar` | -| `adbc.spark.tls_options` | Comma-separated list of TLS/SSL options. Each option indicates the TLS/SSL option when connecting to a Spark server. <br><br>Allowed values: `allow_self_signed`, `allow_hostname_mismatch`. <br><br>Option `allow_self_signed` allows certificate errors due to an unknown certificate authority, typically when using a self-signed certificate. Option `allow_hostname_mismatch` allow certificate errors due to a mismatch of the hostname. (e.g., when connecting through an SSH tunnel). Example `adbc.spark.tls_options=allow_self_signed` | | | `adbc.spark.connect_timeout_ms` | Sets the timeout (in milliseconds) to open a new session. Values can be 0 (infinite) or greater than zero. | `30000` | | `adbc.apache.statement.batch_size` | Sets the maximum number of rows to retrieve in a single batch request. | `50000` | | `adbc.apache.statement.polltime_ms` | If polling is necessary to get a result, this option sets the length of time (in milliseconds) to wait between polls. | `500` | | `adbc.apache.statement.query_timeout_s` | Sets the maximum time (in seconds) for a query to complete. Values can be 0 (infinite) or greater than zero. | `60` | +| `adbc.apache.ssl` | If ssl needs to enabled or not. One of `True`, `False` | `False` | Review Comment: (On the whole, the comments made on Hive2/README.md apply to Spark as well. ########## csharp/src/Drivers/Apache/Hive2/HiveServer2TlsImpl.cs: ########## @@ -0,0 +1,106 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.IO; +using System.Collections.Generic; +using System.Net.Security; +using System.Security.Cryptography.X509Certificates; +using System.Net.Http; + +namespace Apache.Arrow.Adbc.Drivers.Apache.Hive2 +{ + class TlsProperties + { + public bool IsSslEnabled { get; set; } Review Comment: Consider `IsTlsEnabled` ########## csharp/src/Drivers/Apache/Hive2/HiveServer2TlsImpl.cs: ########## @@ -0,0 +1,106 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.IO; +using System.Collections.Generic; +using System.Net.Security; +using System.Security.Cryptography.X509Certificates; +using System.Net.Http; + +namespace Apache.Arrow.Adbc.Drivers.Apache.Hive2 +{ + class TlsProperties + { + public bool IsSslEnabled { get; set; } + public bool EnableServerCertificateValidation { get; set; } + public bool AllowHostnameMismatch { get; set; } + public bool AllowSelfSigned { get; set; } + public string? TrustedCertificatePath { get; set; } + } + static class HiveServer2TlsImpl Review Comment: nit: insert blank line before `class`. ########## csharp/src/Drivers/Apache/Hive2/README.md: ########## @@ -35,11 +35,15 @@ but can also be passed in the call to `AdbcDatabase.Connect`. | `username` | The user name used for basic authentication | | | `password` | The password for the user name used for basic authentication. | | | `adbc.hive.data_type_conv` | Comma-separated list of data conversion options. Each option indicates the type of conversion to perform on data returned from the Hive server. <br><br>Allowed values: `none`, `scalar`. <br><br>Option `none` indicates there is no conversion from Hive type to native type (i.e., no conversion from String to Timestamp for Apache Hive over HTTP). Example `adbc.hive.conv_data_type=none`. <br><br>Option `scalar` will perform conversion (if necessary) from the Hive data type to corresponding Arrow data types for types `DATE/Date32/DateTime`, `DECIMAL/Decimal128/SqlDecimal`, and `TIMESTAMP/Timestamp/DateTimeOffset`. Example `adbc.hive.conv_data_type=scalar` | `scalar` | -| `adbc.hive.tls_options` | Comma-separated list of TLS/SSL options. Each option indicates the TLS/SSL option when connecting to a Hive server. <br><br>Allowed values: `allow_self_signed`, `allow_hostname_mismatch`. <br><br>Option `allow_self_signed` allows certificate errors due to an unknown certificate authority, typically when using a self-signed certificate. Option `allow_hostname_mismatch` allow certificate errors due to a mismatch of the hostname. (e.g., when connecting through an SSH tunnel). Example `adbc.hive.tls_options=allow_self_signed` | | Review Comment: It would be nice and relatively low-cost to stay backwards-compatible by continuing to support the existing option for another release before removing it. (We haven't declared "1.0" yet so I don't think it's strictly necessary, but I think erring on the side of accommodating consumption scenarios is broadly good.) ########## csharp/src/Drivers/Apache/Hive2/HiveServer2TlsImpl.cs: ########## @@ -0,0 +1,106 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.IO; +using System.Collections.Generic; +using System.Net.Security; +using System.Security.Cryptography.X509Certificates; +using System.Net.Http; + +namespace Apache.Arrow.Adbc.Drivers.Apache.Hive2 +{ + class TlsProperties + { + public bool IsSslEnabled { get; set; } + public bool EnableServerCertificateValidation { get; set; } + public bool AllowHostnameMismatch { get; set; } + public bool AllowSelfSigned { get; set; } + public string? TrustedCertificatePath { get; set; } + } + static class HiveServer2TlsImpl + { + static internal TlsProperties GetTlsOptions(IReadOnlyDictionary<string, string> properties) + { + TlsProperties tlsProperties = new TlsProperties(); + properties.TryGetValue(TlsOptions.IsSslEnabled, out string? isSslEnabled); + properties.TryGetValue(AdbcOptions.Uri, out string? uri); + if (!string.IsNullOrWhiteSpace(uri)) Review Comment: This generally assumes that the `out` parameter has a specific well-defined value even when the `Try` returns false. I think it's a better practice to structure the condition as `if (properties.TryGetValue(AdbcOptions.Uri, out string? uri) && !string.IsNullOrWhiteSpace(uri))`. (Same applies to all `TryGetValue` calls in this method.) ########## csharp/src/Drivers/Apache/Hive2/README.md: ########## @@ -35,11 +35,15 @@ but can also be passed in the call to `AdbcDatabase.Connect`. | `username` | The user name used for basic authentication | | | `password` | The password for the user name used for basic authentication. | | | `adbc.hive.data_type_conv` | Comma-separated list of data conversion options. Each option indicates the type of conversion to perform on data returned from the Hive server. <br><br>Allowed values: `none`, `scalar`. <br><br>Option `none` indicates there is no conversion from Hive type to native type (i.e., no conversion from String to Timestamp for Apache Hive over HTTP). Example `adbc.hive.conv_data_type=none`. <br><br>Option `scalar` will perform conversion (if necessary) from the Hive data type to corresponding Arrow data types for types `DATE/Date32/DateTime`, `DECIMAL/Decimal128/SqlDecimal`, and `TIMESTAMP/Timestamp/DateTimeOffset`. Example `adbc.hive.conv_data_type=scalar` | `scalar` | -| `adbc.hive.tls_options` | Comma-separated list of TLS/SSL options. Each option indicates the TLS/SSL option when connecting to a Hive server. <br><br>Allowed values: `allow_self_signed`, `allow_hostname_mismatch`. <br><br>Option `allow_self_signed` allows certificate errors due to an unknown certificate authority, typically when using a self-signed certificate. Option `allow_hostname_mismatch` allow certificate errors due to a mismatch of the hostname. (e.g., when connecting through an SSH tunnel). Example `adbc.hive.tls_options=allow_self_signed` | | | `adbc.hive.connect_timeout_ms` | Sets the timeout (in milliseconds) to open a new session. Values can be 0 (infinite) or greater than zero. | `30000` | | `adbc.apache.statement.batch_size` | Sets the maximum number of rows to retrieve in a single batch request. | `50000` | | `adbc.apache.statement.polltime_ms` | If polling is necessary to get a result, this option sets the length of time (in milliseconds) to wait between polls. | `500` | | `adbc.apache.statement.query_timeout_s` | Sets the maximum time (in seconds) for a query to complete. Values can be 0 (infinite) or greater than zero. | `60` | +| `adbc.apache.ssl` | If ssl needs to enabled or not. One of `True`, `False` | `False` | +| `adbc.apache.enable_server_certificate_validation` | If ssl server certificate validation needs to enabled or not. One of `True`, `False`. If set to False, all certificate validation errors are ignored | `True` | Review Comment: The option name `enable_server_certificate_validation` implies that we're opting into enabling it. I think that it's more semantically clear to have an option `disable_server_certificate_validation` with a default value of false to give the correct impression that the settings are secure by default and that you have to opt into less security. ########## csharp/src/Drivers/Apache/Hive2/HiveServer2TlsImpl.cs: ########## @@ -0,0 +1,106 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.IO; +using System.Collections.Generic; +using System.Net.Security; +using System.Security.Cryptography.X509Certificates; +using System.Net.Http; + +namespace Apache.Arrow.Adbc.Drivers.Apache.Hive2 +{ + class TlsProperties + { + public bool IsSslEnabled { get; set; } + public bool EnableServerCertificateValidation { get; set; } + public bool AllowHostnameMismatch { get; set; } + public bool AllowSelfSigned { get; set; } + public string? TrustedCertificatePath { get; set; } + } + static class HiveServer2TlsImpl + { + static internal TlsProperties GetTlsOptions(IReadOnlyDictionary<string, string> properties) + { + TlsProperties tlsProperties = new TlsProperties(); + properties.TryGetValue(TlsOptions.IsSslEnabled, out string? isSslEnabled); + properties.TryGetValue(AdbcOptions.Uri, out string? uri); + if (!string.IsNullOrWhiteSpace(uri)) + { + var uriValue = new Uri(uri); + tlsProperties.IsSslEnabled = uriValue.Scheme == Uri.UriSchemeHttps || (bool.TryParse(isSslEnabled, out bool isSslEnabledBool) && isSslEnabledBool); + } + else + { + tlsProperties.IsSslEnabled = bool.TryParse(isSslEnabled, out bool isSslEnabledBool) && isSslEnabledBool; + } + if (!tlsProperties.IsSslEnabled) + { + return tlsProperties; + } + tlsProperties.IsSslEnabled = true; + properties.TryGetValue(TlsOptions.EnableServerCertificateValidation, out string? enableServerCertificateValidation); + if (bool.TryParse(enableServerCertificateValidation, out bool enableServerCertificateValidationBool) && !enableServerCertificateValidationBool) + { + tlsProperties.EnableServerCertificateValidation = false; + return tlsProperties; + } + tlsProperties.EnableServerCertificateValidation = true; + properties.TryGetValue(TlsOptions.AllowHostnameMismatch, out string? allowHostnameMismatch); + tlsProperties.AllowHostnameMismatch = bool.TryParse(allowHostnameMismatch, out bool allowHostnameMismatchBool) && allowHostnameMismatchBool; + properties.TryGetValue(TlsOptions.AllowSelfSigned, out string? allowSelfSigned); + tlsProperties.AllowSelfSigned = bool.TryParse(allowSelfSigned, out bool allowSelfSignedBool) && allowSelfSignedBool; + if (tlsProperties.AllowSelfSigned) + { + properties.TryGetValue(TlsOptions.TrustedCertificatePath, out string? trustedCertificatePath); + if (trustedCertificatePath == null) return tlsProperties; + tlsProperties.TrustedCertificatePath = trustedCertificatePath != "" && File.Exists(trustedCertificatePath) ? trustedCertificatePath : throw new FileNotFoundException("Trusted certificate path is invalid or file does not exist."); + } + return tlsProperties; + } + + static internal HttpClientHandler NewHttpClientHandler(TlsProperties tlsProperties) + { + HttpClientHandler httpClientHandler = new(); + if (tlsProperties.IsSslEnabled) + { + httpClientHandler.ServerCertificateCustomValidationCallback = (request, certificate, chain, policyErrors) => + { + if (policyErrors == SslPolicyErrors.None || !tlsProperties.EnableServerCertificateValidation) return true; + if (string.IsNullOrEmpty(tlsProperties.TrustedCertificatePath)) + { + return + (!policyErrors.HasFlag(SslPolicyErrors.RemoteCertificateChainErrors) || tlsProperties.AllowSelfSigned) + && (!policyErrors.HasFlag(SslPolicyErrors.RemoteCertificateNameMismatch) || tlsProperties.AllowHostnameMismatch); + } + if (certificate == null) + return false; + X509Certificate2 customCertificate = new X509Certificate2(tlsProperties.TrustedCertificatePath); Review Comment: What's the expected format of this file? Does it align with existing standards for supplying certificates to drivers? It would be good to update the README.md documentation with this information. ########## csharp/src/Drivers/Apache/Hive2/README.md: ########## @@ -35,11 +35,15 @@ but can also be passed in the call to `AdbcDatabase.Connect`. | `username` | The user name used for basic authentication | | | `password` | The password for the user name used for basic authentication. | | | `adbc.hive.data_type_conv` | Comma-separated list of data conversion options. Each option indicates the type of conversion to perform on data returned from the Hive server. <br><br>Allowed values: `none`, `scalar`. <br><br>Option `none` indicates there is no conversion from Hive type to native type (i.e., no conversion from String to Timestamp for Apache Hive over HTTP). Example `adbc.hive.conv_data_type=none`. <br><br>Option `scalar` will perform conversion (if necessary) from the Hive data type to corresponding Arrow data types for types `DATE/Date32/DateTime`, `DECIMAL/Decimal128/SqlDecimal`, and `TIMESTAMP/Timestamp/DateTimeOffset`. Example `adbc.hive.conv_data_type=scalar` | `scalar` | -| `adbc.hive.tls_options` | Comma-separated list of TLS/SSL options. Each option indicates the TLS/SSL option when connecting to a Hive server. <br><br>Allowed values: `allow_self_signed`, `allow_hostname_mismatch`. <br><br>Option `allow_self_signed` allows certificate errors due to an unknown certificate authority, typically when using a self-signed certificate. Option `allow_hostname_mismatch` allow certificate errors due to a mismatch of the hostname. (e.g., when connecting through an SSH tunnel). Example `adbc.hive.tls_options=allow_self_signed` | | | `adbc.hive.connect_timeout_ms` | Sets the timeout (in milliseconds) to open a new session. Values can be 0 (infinite) or greater than zero. | `30000` | | `adbc.apache.statement.batch_size` | Sets the maximum number of rows to retrieve in a single batch request. | `50000` | | `adbc.apache.statement.polltime_ms` | If polling is necessary to get a result, this option sets the length of time (in milliseconds) to wait between polls. | `500` | | `adbc.apache.statement.query_timeout_s` | Sets the maximum time (in seconds) for a query to complete. Values can be 0 (infinite) or greater than zero. | `60` | +| `adbc.apache.ssl` | If ssl needs to enabled or not. One of `True`, `False` | `False` | Review Comment: (We even use "Tls" in most other places in this codebase.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
