[ https://issues.apache.org/jira/browse/ARROW-11379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ben Kietzman updated ARROW-11379: --------------------------------- Fix Version/s: 4.0.0 > [C++][Dataset] Reading dataset with filtering on timestamp partition field > crashes > ---------------------------------------------------------------------------------- > > Key: ARROW-11379 > URL: https://issues.apache.org/jira/browse/ARROW-11379 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Reporter: Joris Van den Bossche > Assignee: Ben Kietzman > Priority: Major > Labels: dataset > Fix For: 4.0.0 > > > {code} > In [1]: df = pd.DataFrame({"dates": list(pd.date_range("2012-01-01", > periods=2, freq="D")) * 5, "col": range(10)}) > In [2]: df.to_parquet("test_partition_timestamps", partition_cols=["dates"]) > In [3]: !ls test_partition_timestamps/ > 'dates=2012-01-01 00:00:00' 'dates=2012-01-02 00:00:00' > In [4]: import pyarrow.dataset as ds > In [6]: part = ds.partitioning(pa.schema([("dates", pa.timestamp("s"))]), > flavor="hive") > In [7]: dataset = ds.dataset("test_partition_timestamps/", format="parquet", > partitioning=part) > {code} > Reading the dataset is fine and fives the correct types: > {code} > In [10]: dataset.to_table() > Out[10]: > pyarrow.Table > col: int64 > dates: timestamp[s] > {code} > but filtering on the timestamp column segfaults: > {code} > In [11]: dataset.to_table(filter=ds.field("dates") > > pd.Timestamp("2012-01-01")) > ../src/arrow/compute/kernels/scalar_cast_temporal.cc:129: Check failed: > (batch[0].kind()) == (Datum::ARRAY) > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xc2224a)[0x7f68d2ccf24a] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xc221c8)[0x7f68d2ccf1c8] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xc221ea)[0x7f68d2ccf1ea] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow4util8ArrowLogD1Ev+0x47)[0x7f68d2ccf549] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xf0252a)[0x7f68d2faf52a] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNSt17_Function_handlerIFvPN5arrow7compute13KernelContextERKNS1_9ExecBatchEPNS0_5DatumEEPS9_E9_M_invokeERKSt9_Any_dataOS3_S6_OS8_+0x69)[0x7f68d2e8ab86] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNKSt8functionIFvPN5arrow7compute13KernelContextERKNS1_9ExecBatchEPNS0_5DatumEEEclES3_S6_S8_+0x7a)[0x7f68d2deec04] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xd3d6f9)[0x7f68d2dea6f9] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xd3cd5b)[0x7f68d2de9d5b] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNK5arrow7compute8Function7ExecuteERKSt6vectorINS_5DatumESaIS3_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x8c7)[0x7f68d2df9963] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(+0xd2eed2)[0x7f68d2ddbed2] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZNK5arrow7compute12MetaFunction7ExecuteERKSt6vectorINS_5DatumESaIS3_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x15d)[0x7f68d2dfac8f] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute12CallFunctionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorINS_5DatumESaISA_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x26c)[0x7f68d2dedc6f] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute12CallFunctionERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKSt6vectorINS_5DatumESaISA_EEPKNS0_15FunctionOptionsEPNS0_11ExecContextE+0x93)[0x7f68d2deda96] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute4CastERKNS_5DatumERKNS0_11CastOptionsEPNS0_11ExecContextE+0xf7)[0x7f68d2ddd493] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow.so.300(_ZN5arrow7compute4CastERKNS_5DatumESt10shared_ptrINS_8DataTypeEERKNS0_11CastOptionsEPNS0_11ExecContextE+0x77)[0x7f68d2ddd6e2] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(+0x1c5c21)[0x7f68b30cfc21] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(+0x1c6789)[0x7f68b30d0789] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(+0x1c5097)[0x7f68b30cf097] > /home/joris/miniconda3/envs/arrow-dev/lib/libarrow_dataset.so.300(_ZNK5arrow7dataset10Expression4BindENS_10ValueDescrEPNS_7compute11ExecContextE+0x732)[0x7f68b30d22e8] > ... > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)