kardaj opened a new issue, #37355:
URL: https://github.com/apache/arrow/issues/37355

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   From what I gathered, a timezone-aware `datetime.datetime` is cast into a 
naive timestamp if its microseconds=0.
   
   
   I managed to replicate the error in this snippet:
   ```python
   import io
   import pytz
   import datetime
   import pyarrow as pa
   import pyarrow.parquet as pq
   import pyarrow.compute as pc
   
   timezone = "Europe/Paris"
   field_name = "timestamp"
   
   table = pa.Table.from_pydict(
       {field_name: []},
       schema=pa.schema(
           [
               pa.field(
                   field_name,
                   pa.timestamp("ns", tz=timezone),
                   nullable=False,
               )
           ]
       ),
   )
   print(table)
   buffer = io.BytesIO()
   pq.write_table(table, buffer)
   
   filters = None
   table = pq.read_table(buffer, filters=filters)
   assert len(table.to_pylist()) == 0
   print(f"filters={filters}", "ok")
   
   for microsecond in [1, 0]:
       timestamp = pytz.timezone(timezone).localize(
           datetime.datetime.combine(
               datetime.date.today(),
               datetime.time(hour=12, microsecond=microsecond),
           )
       )
       filters = pc.field("timestamp") <= timestamp
       table = pq.read_table(buffer, filters=filters)
       assert len(table.to_pylist()) == 0
       print(f"filters={filters}", "ok")
   
   ```
   with pyarrow<13.0.0, I get the following output:
   ```
   pyarrow.Table
   timestamp: timestamp[ns, tz=Europe/Paris] not null
   ----
   timestamp: [[]]
   filters=None ok
   filters=(timestamp <= 2023-08-24 10:00:00.000001) ok
   filters=(timestamp <= 2023-08-24 10:00:00.000000) ok
   terminate called without an active exception
   Aborted (core dumped)
   ```
   
   with pyarrow==13.0.0, I get the following output:
   ```
   pyarrow.Table
   timestamp: timestamp[ns, tz=Europe/Paris] not null
   ----
   timestamp: [[]]
   filters=None ok
   filters=(timestamp <= 2023-08-24 10:00:00.000001) ok
   Traceback (most recent call last):
     File "/workspaces/mapping-tools/broken_pyarrow.py", line 43, in <module>
       table = pq.read_table(buffer, filters=filters)
     File 
"/workspaces/mapping-tools/env/lib/python3.9/site-packages/pyarrow/parquet/core.py",
 line 3002, in read_table
       return dataset.read(columns=columns, use_threads=use_threads,
     File 
"/workspaces/mapping-tools/env/lib/python3.9/site-packages/pyarrow/parquet/core.py",
 line 2630, in read
       table = self._dataset.to_table(
     File "pyarrow/_dataset.pyx", line 547, in pyarrow._dataset.Dataset.to_table
     File "pyarrow/_dataset.pyx", line 393, in pyarrow._dataset.Dataset.scanner
     File "pyarrow/_dataset.pyx", line 3391, in 
pyarrow._dataset.Scanner.from_dataset
     File "pyarrow/_dataset.pyx", line 3309, in 
pyarrow._dataset.Scanner._make_scan_options
     File "pyarrow/_dataset.pyx", line 3243, in 
pyarrow._dataset._populate_builder
     File "pyarrow/_compute.pyx", line 2595, in pyarrow._compute._bind
     File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 121, in pyarrow.lib.check_status
   pyarrow.lib.ArrowNotImplementedError: Function 'less_equal' has no kernel 
matching input types (timestamp[ns, tz=Europe/Paris], timestamp[s])
   ```
   
   ### Component(s)
   
   Parquet, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to