mnzpk opened a new issue, #1744:
URL: https://github.com/apache/iceberg-python/issues/1744

   ### Apache Iceberg version
   
   None
   
   ### Please describe the bug 🐞
   
   ## Bug Description
   Invoking multiple methods (or the same method multiple times) on an object 
of `pyiceberg.catalog.hive.HiveCatalog` when accessing a kerberized HMS results 
in failed SASL negotiation.
   
   ## Steps to reproduce
   1. Install `pyiceberg` and `kerberos` python wrapper:
   ```bash
   $ pip install "pyiceberg[hive-kerberos,pyarrow]==0.9.0rc3"
   $ pip install "kerberos>=1.3.0"
   ```
   2. Initialize `HiveCatalog`:
   ```python
   from pyiceberg.catalog.hive import HiveCatalog
   
   catalog = HiveCatalog(
       name="hive",
       **{
           "uri": "thrift://hms:9083",
           "hive.kerberos-authentication": "true"
       },
   )
   ```
   3. Invoke multiple methods (or the same method multiple times) that use 
the`_HiveClient` via a context manager:
   Specifically: 
https://github.com/apache/iceberg-python/blob/8bfb16cf063d121a177d43fec01620e1a5e6d84a/pyiceberg/catalog/hive.py#L701-L702
   ```python
   catalog.list_namespaces()
   catalog.load_table("db.iceberg_table")
   ```
   ## Expected
   Namespaces and tables can be loaded successfully.
   
   ## Actual
   Listing namespaces succeeds but loading the table results in:
   ```
   ---------------------------------------------------------------------------
   TypeError                                 Traceback (most recent call last)
   ----> 1 catalog.load_table("db.iceberg_table")
   
   File 
~/.conda/envs/iceberg-env/lib/python3.10/site-packages/pyiceberg/catalog/hive.py:573,
 in HiveCatalog.load_table(self, identifier)
       557 """Load the table's metadata and return the table instance.
       558 
       559 You can also use this method to check for table existence using 'try 
catalog.table() except TableNotFoundError'.
      (...)
       569     NoSuchTableError: If a table with the name does not exist, or 
the identifier is invalid.
       570 """
       571 database_name, table_name = 
self.identifier_to_database_and_table(identifier, NoSuchTableError)
   --> 573 with self._client as open_client:
       574     hive_table = self._get_hive_table(open_client, database_name, 
table_name)
       576 return self._convert_hive_into_iceberg(hive_table)
   
   File 
~/.conda/envs/iceberg-env/lib/python3.10/site-packages/pyiceberg/catalog/hive.py:170,
 in _HiveClient.__enter__(self)
       169 def __enter__(self) -> Client:
   --> 170     self._transport.open()
       171     if self._ugi:
       172         self._client.set_ugi(*self._ugi)
   
   File 
~/.conda/envs/iceberg-env/lib/python3.10/site-packages/thrift/transport/TTransport.py:381,
 in TSaslClientTransport.open(self)
       378     self.transport.open()
       380 self.send_sasl_msg(self.START, bytes(self.sasl.mechanism, 'ascii'))
   --> 381 self.send_sasl_msg(self.OK, self.sasl.process())
       383 while True:
       384     status, challenge = self.recv_sasl_msg()
   
   File 
~/.conda/envs/iceberg-env/lib/python3.10/site-packages/puresasl/client.py:16, 
in _require_mech.<locals>.wrapped(self, *args, **kwargs)
        14 if not self._chosen_mech:
        15     raise SASLError("A mechanism has not been chosen yet")
   ---> 16 return f(self, *args, **kwargs)
   
   File 
~/.conda/envs/iceberg-env/lib/python3.10/site-packages/puresasl/client.py:148, 
in SASLClient.process(self, challenge)
       137 @_require_mech
       138 def process(self, challenge=None):
       139     """
       140     Process a challenge from the server during SASL negotiation.
       141     A response will be returned which should typically be sent to the
      (...)
       146     to be sent to the server.
       147     """
   --> 148     return self._chosen_mech.process(challenge)
   
   File 
~/.conda/envs/iceberg-env/lib/python3.10/site-packages/puresasl/mechanisms.py:510,
 in GSSAPIMechanism.process(self, challenge)
       507     self._have_negotiated_details = True
       508     return base64.b64decode(_negotiated_details)
   --> 510 challenge = base64.b64encode(challenge).decode('ascii')  # kerberos 
methods expect strings, not bytes
       511 if self.user is None:
       512     ret = kerberos.authGSSClientStep(self.context, challenge)
   
   File ~/.conda/envs/iceberg-env/lib/python3.10/base64.py:58, in b64encode(s, 
altchars)
        51 def b64encode(s, altchars=None):
        52     """Encode the bytes-like object s using Base64 and return a 
bytes object.
        53 
        54     Optional altchars should be a byte string of length 2 which 
specifies an
        55     alternative alphabet for the '+' and '/' characters.  This 
allows an
        56     application to e.g. generate url or filesystem safe Base64 
strings.
        57     """
   ---> 58     encoded = binascii.b2a_base64(s, newline=False)
        59     if altchars is not None:
        60         assert len(altchars) == 2, repr(altchars)
   
   TypeError: a bytes-like object is required, not 'NoneType'
   ```
   
   ## Additional comments
   It seems that this happens because the transport gets closed every time we 
exit the context manager for `_HiveClient` and 
`thrift.transport.TTransport.TSaslClientTransport` doesn't seem to support 
re-opening as this error can also be reproduced outside of `pyiceberg` with:
   ```python
   from thrift.transport import TSocket, TTransport
   from urllib.parse import urlparse
   
   uri = "thrift://hms:9083"
   url_parts = urlparse(uri)
   socket = TSocket.TSocket(url_parts.hostname, url_parts.port)
   transport = TTransport.TSaslClientTransport(
       socket, host=url_parts.hostname, service="hive"
   )
   
   transport.open()
   transport.close()
   transport.open()
   ```
   So it looks the transport needs to be re-created instead of re-opened in 
`_HiveClient.__enter__`?
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to