gemini-code-assist[bot] commented on code in PR #37290:
URL: https://github.com/apache/beam/pull/37290#discussion_r2688204610


##########
CHANGES.md:
##########
@@ -2349,4 +2350,4 @@ Schema Options, it will be removed in version `2.23.0`. 
([BEAM-9704](https://iss
 
 ## Highlights
 
-- For versions 2.19.0 and older release notes are available on [Apache Beam 
Blog](https://beam.apache.org/blog/).
+- - For versions 2.19.0 and older release notes are available on [Apache Beam 
Blog](https://beam.apache.org/blog/).

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   It seems like there's an extra hyphen `-` here, which will render as a 
nested list item. Was this intentional? If not, it should probably be removed 
to maintain consistent formatting.
   
   ```suggestion
   - For versions 2.19.0 and older release notes are available on [Apache Beam 
Blog](https://beam.apache.org/blog/).
   ```



##########
sdks/python/apache_beam/io/hadoopfilesystem.py:
##########
@@ -139,8 +146,25 @@ def __init__(self, pipeline_options):
     if not isinstance(self._full_urls, bool):
       raise ValueError(
           'hdfs_full_urls should be bool, got: %s', self._full_urls)
-    self._hdfs_client = hdfs.InsecureClient(
-        'http://%s:%s' % (hdfs_host, str(hdfs_port)), user=hdfs_user)
+
+    # Create HDFS client based on authentication type
+    url = 'http://%s:%s' % (hdfs_host, str(hdfs_port))
+    if hdfs_client == 'KERBEROS':
+      if KerberosClient is None:
+        raise ImportError(
+            'Kerberos authentication requires the requests-kerberos library. '
+            'Install it with: pip install requests-kerberos')
+      _LOGGER.info('Using KerberosClient for HDFS authentication')
+      try:
+        self._hdfs_client = KerberosClient(url)
+      except Exception as e:
+        raise RuntimeError(
+            'Failed to create KerberosClient. Ensure you have valid Kerberos '
+            'credentials (run kinit) or have configured a keytab. '
+            'Error: %s' % str(e))

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   When re-raising the exception, it's better to use `raise from` to preserve 
the original exception's traceback. This is very helpful for debugging. The 
current implementation hides the original exception type and stack trace.
   
   Also, consider using an f-string for formatting the error message for better 
readability, as the project seems to be on a modern Python version.
   
   ```suggestion
         except Exception as e:
           raise RuntimeError(
               'Failed to create KerberosClient. Ensure you have valid Kerberos 
'
               'credentials (run kinit) or have configured a keytab. '
               f'Error: {e}') from e
   ```



##########
sdks/python/apache_beam/io/hadoopfilesystem.py:
##########
@@ -139,8 +146,25 @@ def __init__(self, pipeline_options):
     if not isinstance(self._full_urls, bool):
       raise ValueError(
           'hdfs_full_urls should be bool, got: %s', self._full_urls)
-    self._hdfs_client = hdfs.InsecureClient(
-        'http://%s:%s' % (hdfs_host, str(hdfs_port)), user=hdfs_user)
+
+    # Create HDFS client based on authentication type
+    url = 'http://%s:%s' % (hdfs_host, str(hdfs_port))
+    if hdfs_client == 'KERBEROS':

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   To make the client selection more robust, it would be good to perform a 
case-insensitive comparison. While `argparse` enforces the `choices` 
case-sensitively, options can also be passed as a dictionary where case might 
not be consistent. Using `.upper()` would handle `KERBEROS`, `kerberos`, etc.
   
   ```suggestion
       if hdfs_client.upper() == 'KERBEROS':
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to