[ https://issues.apache.org/jira/browse/ARROW-13649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507043#comment-17507043 ]
Joris Van den Bossche commented on ARROW-13649: ----------------------------------------------- OK, thanks for the update! > [C++][Python] SIGSEGV inside datasets or compute kernel (was: pyarrow is > causing segfault randomly) > --------------------------------------------------------------------------------------------------- > > Key: ARROW-13649 > URL: https://issues.apache.org/jira/browse/ARROW-13649 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 5.0.0 > Environment: openSUSE Leap 15.2 > conda python3.9 env > Reporter: krishna deepak > Priority: Critical > > I'm using pyarrow to read feather files. I'm randomly getting the following > segfault error. > * > ** > *** SIGSEGV received at time=1629226305 on cpu 3 *** > PC: @ 0x7fa9e177272a (unknown) arrow::BitUtil::SetBitmap() > @ 0x7fa9f5dec2d0 (unknown) (unknown) > Segmentation fault (core dumped) > I initially thought its because of some bug in my cython code, but then even > after removing all cython calls, I get this error randomly. > The python code is very simple read > {quote}{color:#c1c7d0} index_data = ds.dataset(INDEX_DATA_PATH / self.ticker > / str(year) / 'indexed_table.feather', > format='feather') > > index_data = index_data.to_table() > trade_days = self.get_trading_days(year) > > options_data = ds.dataset(OPTIONS_DATA_PATH / self.ticker / self.expiry_type > / str(year), format='feather') > options_data = options_data.to_table( > filter=( > (ds.field('dt') >= trade_days[0]) & (ds.field('dt') <= trade_days[-1]) > ), > columns=options_data_columns > ) > > expiry_dts = [x.as_py() for x in pc.unique(options_data['expiry_dt'])] > expiry_dts.sort(){color} > {quote} > > The error only happens randomly like 1 out of 5 times -- This message was sent by Atlassian Jira (v8.20.1#820001)