gemini-code-assist[bot] commented on code in PR #37290: URL: https://github.com/apache/beam/pull/37290#discussion_r2688204610
########## CHANGES.md: ########## @@ -2349,4 +2350,4 @@ Schema Options, it will be removed in version `2.23.0`. ([BEAM-9704](https://iss ## Highlights -- For versions 2.19.0 and older release notes are available on [Apache Beam Blog](https://beam.apache.org/blog/). +- - For versions 2.19.0 and older release notes are available on [Apache Beam Blog](https://beam.apache.org/blog/). Review Comment:  It seems like there's an extra hyphen `-` here, which will render as a nested list item. Was this intentional? If not, it should probably be removed to maintain consistent formatting. ```suggestion - For versions 2.19.0 and older release notes are available on [Apache Beam Blog](https://beam.apache.org/blog/). ``` ########## sdks/python/apache_beam/io/hadoopfilesystem.py: ########## @@ -139,8 +146,25 @@ def __init__(self, pipeline_options): if not isinstance(self._full_urls, bool): raise ValueError( 'hdfs_full_urls should be bool, got: %s', self._full_urls) - self._hdfs_client = hdfs.InsecureClient( - 'http://%s:%s' % (hdfs_host, str(hdfs_port)), user=hdfs_user) + + # Create HDFS client based on authentication type + url = 'http://%s:%s' % (hdfs_host, str(hdfs_port)) + if hdfs_client == 'KERBEROS': + if KerberosClient is None: + raise ImportError( + 'Kerberos authentication requires the requests-kerberos library. ' + 'Install it with: pip install requests-kerberos') + _LOGGER.info('Using KerberosClient for HDFS authentication') + try: + self._hdfs_client = KerberosClient(url) + except Exception as e: + raise RuntimeError( + 'Failed to create KerberosClient. Ensure you have valid Kerberos ' + 'credentials (run kinit) or have configured a keytab. ' + 'Error: %s' % str(e)) Review Comment:  When re-raising the exception, it's better to use `raise from` to preserve the original exception's traceback. This is very helpful for debugging. The current implementation hides the original exception type and stack trace. Also, consider using an f-string for formatting the error message for better readability, as the project seems to be on a modern Python version. ```suggestion except Exception as e: raise RuntimeError( 'Failed to create KerberosClient. Ensure you have valid Kerberos ' 'credentials (run kinit) or have configured a keytab. ' f'Error: {e}') from e ``` ########## sdks/python/apache_beam/io/hadoopfilesystem.py: ########## @@ -139,8 +146,25 @@ def __init__(self, pipeline_options): if not isinstance(self._full_urls, bool): raise ValueError( 'hdfs_full_urls should be bool, got: %s', self._full_urls) - self._hdfs_client = hdfs.InsecureClient( - 'http://%s:%s' % (hdfs_host, str(hdfs_port)), user=hdfs_user) + + # Create HDFS client based on authentication type + url = 'http://%s:%s' % (hdfs_host, str(hdfs_port)) + if hdfs_client == 'KERBEROS': Review Comment:  To make the client selection more robust, it would be good to perform a case-insensitive comparison. While `argparse` enforces the `choices` case-sensitively, options can also be passed as a dictionary where case might not be consistent. Using `.upper()` would handle `KERBEROS`, `kerberos`, etc. ```suggestion if hdfs_client.upper() == 'KERBEROS': ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
