Re: RocksDB segfault on state restore
Hello, A couple of potentially relevant pieces of information: 1. https://issues.apache.org/jira/browse/FLINK-16686 2. https://stackoverflow.com/a/64721838/5793905 (question was about schema evolution, but the answer is more generally applicable) Regards, Alexis. Am Fr., 2. Juni 2023 um 07:18 Uhr schrieb Gyula Fóra : > Hi! > > > In our case, no schema evolution was triggered , only the TTL was set from > the beginning as far as I remember. > > I will double check > > Gyula > > On Fri, 2 Jun 2023 at 06:12, Hangxiang Yu wrote: > >> Hi, Gyula. >> It seems related to https://issues.apache.org/jira/browse/FLINK-23346. >> We also saw core dump while using list state after triggering state >> migration and ttl compaction filter. Have you triggered the schema >> evolution ? >> It seems a bug of the rocksdb list state together with ttl compaction >> filter. >> >> On Wed, May 17, 2023 at 7:05 PM Gyula Fóra wrote: >> >>> Hi All! >>> >>> We are encountering an error on a larger stateful job (around 1 TB + >>> state) on restore from a rocksdb checkpoint. The taskmanagers keep crashing >>> with a segfault coming from the rocksdb native logic and seem to be related >>> to the FlinkCompactionFilter mechanism. >>> >>> The gist with the full error report: report: >>> https://gist.github.com/gyfora/f307aa570d324d063e0ade9810f8bb25 >>> >>> The core part is here: >>> V [libjvm.so+0x79478f] Exceptions:: >>> (Thread*, char const*, int, oopDesc*)+0x15f >>> V [libjvm.so+0x960a68] jni_Throw+0x88 >>> C [librocksdbjni-linux64.so+0x222aa1] >>> JavaListElementFilter::NextUnexpiredOffset(rocksdb::Slice const&, long, >>> long) const+0x121 >>> C [librocksdbjni-linux64.so+0x6486c1] >>> rocksdb::flink::FlinkCompactionFilter::ListDecide(rocksdb::Slice const&, >>> std::string*) const+0x81 >>> C [librocksdbjni-linux64.so+0x648bea] >>> rocksdb::flink::FlinkCompactionFilter::FilterV2(int, rocksdb::Slice >>> const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&, >>> std::string*, std::string*) const+0x14a >>> >>> Has anyone encountered a similar issue before? >>> >>> Thanks >>> Gyula >>> >>> >> >> -- >> Best, >> Hangxiang. >> >
Re: RocksDB segfault on state restore
Hi! In our case, no schema evolution was triggered , only the TTL was set from the beginning as far as I remember. I will double check Gyula On Fri, 2 Jun 2023 at 06:12, Hangxiang Yu wrote: > Hi, Gyula. > It seems related to https://issues.apache.org/jira/browse/FLINK-23346. > We also saw core dump while using list state after triggering state > migration and ttl compaction filter. Have you triggered the schema > evolution ? > It seems a bug of the rocksdb list state together with ttl compaction > filter. > > On Wed, May 17, 2023 at 7:05 PM Gyula Fóra wrote: > >> Hi All! >> >> We are encountering an error on a larger stateful job (around 1 TB + >> state) on restore from a rocksdb checkpoint. The taskmanagers keep crashing >> with a segfault coming from the rocksdb native logic and seem to be related >> to the FlinkCompactionFilter mechanism. >> >> The gist with the full error report: report: >> https://gist.github.com/gyfora/f307aa570d324d063e0ade9810f8bb25 >> >> The core part is here: >> V [libjvm.so+0x79478f] Exceptions:: >> (Thread*, char const*, int, oopDesc*)+0x15f >> V [libjvm.so+0x960a68] jni_Throw+0x88 >> C [librocksdbjni-linux64.so+0x222aa1] >> JavaListElementFilter::NextUnexpiredOffset(rocksdb::Slice const&, long, >> long) const+0x121 >> C [librocksdbjni-linux64.so+0x6486c1] >> rocksdb::flink::FlinkCompactionFilter::ListDecide(rocksdb::Slice const&, >> std::string*) const+0x81 >> C [librocksdbjni-linux64.so+0x648bea] >> rocksdb::flink::FlinkCompactionFilter::FilterV2(int, rocksdb::Slice >> const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&, >> std::string*, std::string*) const+0x14a >> >> Has anyone encountered a similar issue before? >> >> Thanks >> Gyula >> >> > > -- > Best, > Hangxiang. >
Re: RocksDB segfault on state restore
Hi, Gyula. It seems related to https://issues.apache.org/jira/browse/FLINK-23346. We also saw core dump while using list state after triggering state migration and ttl compaction filter. Have you triggered the schema evolution ? It seems a bug of the rocksdb list state together with ttl compaction filter. On Wed, May 17, 2023 at 7:05 PM Gyula Fóra wrote: > Hi All! > > We are encountering an error on a larger stateful job (around 1 TB + > state) on restore from a rocksdb checkpoint. The taskmanagers keep crashing > with a segfault coming from the rocksdb native logic and seem to be related > to the FlinkCompactionFilter mechanism. > > The gist with the full error report: report: > https://gist.github.com/gyfora/f307aa570d324d063e0ade9810f8bb25 > > The core part is here: > V [libjvm.so+0x79478f] Exceptions:: > (Thread*, char const*, int, oopDesc*)+0x15f > V [libjvm.so+0x960a68] jni_Throw+0x88 > C [librocksdbjni-linux64.so+0x222aa1] > JavaListElementFilter::NextUnexpiredOffset(rocksdb::Slice const&, long, > long) const+0x121 > C [librocksdbjni-linux64.so+0x6486c1] > rocksdb::flink::FlinkCompactionFilter::ListDecide(rocksdb::Slice const&, > std::string*) const+0x81 > C [librocksdbjni-linux64.so+0x648bea] > rocksdb::flink::FlinkCompactionFilter::FilterV2(int, rocksdb::Slice > const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&, > std::string*, std::string*) const+0x14a > > Has anyone encountered a similar issue before? > > Thanks > Gyula > > -- Best, Hangxiang.
RocksDB segfault on state restore
Hi All! We are encountering an error on a larger stateful job (around 1 TB + state) on restore from a rocksdb checkpoint. The taskmanagers keep crashing with a segfault coming from the rocksdb native logic and seem to be related to the FlinkCompactionFilter mechanism. The gist with the full error report: report: https://gist.github.com/gyfora/f307aa570d324d063e0ade9810f8bb25 The core part is here: V [libjvm.so+0x79478f] Exceptions:: (Thread*, char const*, int, oopDesc*)+0x15f V [libjvm.so+0x960a68] jni_Throw+0x88 C [librocksdbjni-linux64.so+0x222aa1] JavaListElementFilter::NextUnexpiredOffset(rocksdb::Slice const&, long, long) const+0x121 C [librocksdbjni-linux64.so+0x6486c1] rocksdb::flink::FlinkCompactionFilter::ListDecide(rocksdb::Slice const&, std::string*) const+0x81 C [librocksdbjni-linux64.so+0x648bea] rocksdb::flink::FlinkCompactionFilter::FilterV2(int, rocksdb::Slice const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&, std::string*, std::string*) const+0x14a Has anyone encountered a similar issue before? Thanks Gyula