Re: RocksDB segfault on state restore

2023-06-02 Thread Alexis Sarda-Espinosa
Hello,

A couple of potentially relevant pieces of information:

1. https://issues.apache.org/jira/browse/FLINK-16686
2. https://stackoverflow.com/a/64721838/5793905 (question was about schema
evolution, but the answer is more generally applicable)

Regards,
Alexis.

Am Fr., 2. Juni 2023 um 07:18 Uhr schrieb Gyula Fóra :

> Hi!
>
>
> In our case, no schema evolution was triggered , only the TTL was set from
> the beginning as far as I remember.
>
> I will double check
>
> Gyula
>
> On Fri, 2 Jun 2023 at 06:12, Hangxiang Yu  wrote:
>
>> Hi, Gyula.
>> It seems related to https://issues.apache.org/jira/browse/FLINK-23346.
>> We also saw core dump while using list state after triggering state
>> migration and ttl compaction filter. Have you triggered the schema
>> evolution ?
>> It seems a bug of the rocksdb list state together with ttl compaction
>> filter.
>>
>> On Wed, May 17, 2023 at 7:05 PM Gyula Fóra  wrote:
>>
>>> Hi All!
>>>
>>> We are encountering an error on a larger stateful job (around 1 TB +
>>> state) on restore from a rocksdb checkpoint. The taskmanagers keep crashing
>>> with a segfault coming from the rocksdb native logic and seem to be related
>>> to the FlinkCompactionFilter mechanism.
>>>
>>> The gist with the full error report:  report:
>>> https://gist.github.com/gyfora/f307aa570d324d063e0ade9810f8bb25
>>>
>>> The core part is here:
>>> V  [libjvm.so+0x79478f]  Exceptions::
>>> (Thread*, char const*, int, oopDesc*)+0x15f
>>> V  [libjvm.so+0x960a68]  jni_Throw+0x88
>>> C  [librocksdbjni-linux64.so+0x222aa1]
>>>  JavaListElementFilter::NextUnexpiredOffset(rocksdb::Slice const&, long,
>>> long) const+0x121
>>> C  [librocksdbjni-linux64.so+0x6486c1]
>>>  rocksdb::flink::FlinkCompactionFilter::ListDecide(rocksdb::Slice const&,
>>> std::string*) const+0x81
>>> C  [librocksdbjni-linux64.so+0x648bea]
>>>  rocksdb::flink::FlinkCompactionFilter::FilterV2(int, rocksdb::Slice
>>> const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&,
>>> std::string*, std::string*) const+0x14a
>>>
>>> Has anyone encountered a similar issue before?
>>>
>>> Thanks
>>> Gyula
>>>
>>>
>>
>> --
>> Best,
>> Hangxiang.
>>
>


Re: RocksDB segfault on state restore

2023-06-01 Thread Gyula Fóra
Hi!


In our case, no schema evolution was triggered , only the TTL was set from
the beginning as far as I remember.

I will double check

Gyula

On Fri, 2 Jun 2023 at 06:12, Hangxiang Yu  wrote:

> Hi, Gyula.
> It seems related to https://issues.apache.org/jira/browse/FLINK-23346.
> We also saw core dump while using list state after triggering state
> migration and ttl compaction filter. Have you triggered the schema
> evolution ?
> It seems a bug of the rocksdb list state together with ttl compaction
> filter.
>
> On Wed, May 17, 2023 at 7:05 PM Gyula Fóra  wrote:
>
>> Hi All!
>>
>> We are encountering an error on a larger stateful job (around 1 TB +
>> state) on restore from a rocksdb checkpoint. The taskmanagers keep crashing
>> with a segfault coming from the rocksdb native logic and seem to be related
>> to the FlinkCompactionFilter mechanism.
>>
>> The gist with the full error report:  report:
>> https://gist.github.com/gyfora/f307aa570d324d063e0ade9810f8bb25
>>
>> The core part is here:
>> V  [libjvm.so+0x79478f]  Exceptions::
>> (Thread*, char const*, int, oopDesc*)+0x15f
>> V  [libjvm.so+0x960a68]  jni_Throw+0x88
>> C  [librocksdbjni-linux64.so+0x222aa1]
>>  JavaListElementFilter::NextUnexpiredOffset(rocksdb::Slice const&, long,
>> long) const+0x121
>> C  [librocksdbjni-linux64.so+0x6486c1]
>>  rocksdb::flink::FlinkCompactionFilter::ListDecide(rocksdb::Slice const&,
>> std::string*) const+0x81
>> C  [librocksdbjni-linux64.so+0x648bea]
>>  rocksdb::flink::FlinkCompactionFilter::FilterV2(int, rocksdb::Slice
>> const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&,
>> std::string*, std::string*) const+0x14a
>>
>> Has anyone encountered a similar issue before?
>>
>> Thanks
>> Gyula
>>
>>
>
> --
> Best,
> Hangxiang.
>


Re: RocksDB segfault on state restore

2023-06-01 Thread Hangxiang Yu
Hi, Gyula.
It seems related to https://issues.apache.org/jira/browse/FLINK-23346.
We also saw core dump while using list state after triggering state
migration and ttl compaction filter. Have you triggered the schema
evolution ?
It seems a bug of the rocksdb list state together with ttl compaction
filter.

On Wed, May 17, 2023 at 7:05 PM Gyula Fóra  wrote:

> Hi All!
>
> We are encountering an error on a larger stateful job (around 1 TB +
> state) on restore from a rocksdb checkpoint. The taskmanagers keep crashing
> with a segfault coming from the rocksdb native logic and seem to be related
> to the FlinkCompactionFilter mechanism.
>
> The gist with the full error report:  report:
> https://gist.github.com/gyfora/f307aa570d324d063e0ade9810f8bb25
>
> The core part is here:
> V  [libjvm.so+0x79478f]  Exceptions::
> (Thread*, char const*, int, oopDesc*)+0x15f
> V  [libjvm.so+0x960a68]  jni_Throw+0x88
> C  [librocksdbjni-linux64.so+0x222aa1]
>  JavaListElementFilter::NextUnexpiredOffset(rocksdb::Slice const&, long,
> long) const+0x121
> C  [librocksdbjni-linux64.so+0x6486c1]
>  rocksdb::flink::FlinkCompactionFilter::ListDecide(rocksdb::Slice const&,
> std::string*) const+0x81
> C  [librocksdbjni-linux64.so+0x648bea]
>  rocksdb::flink::FlinkCompactionFilter::FilterV2(int, rocksdb::Slice
> const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&,
> std::string*, std::string*) const+0x14a
>
> Has anyone encountered a similar issue before?
>
> Thanks
> Gyula
>
>

-- 
Best,
Hangxiang.


RocksDB segfault on state restore

2023-05-17 Thread Gyula Fóra
Hi All!

We are encountering an error on a larger stateful job (around 1 TB + state)
on restore from a rocksdb checkpoint. The taskmanagers keep crashing with a
segfault coming from the rocksdb native logic and seem to be related to the
FlinkCompactionFilter mechanism.

The gist with the full error report:  report:
https://gist.github.com/gyfora/f307aa570d324d063e0ade9810f8bb25

The core part is here:
V  [libjvm.so+0x79478f]  Exceptions::
(Thread*, char const*, int, oopDesc*)+0x15f
V  [libjvm.so+0x960a68]  jni_Throw+0x88
C  [librocksdbjni-linux64.so+0x222aa1]
 JavaListElementFilter::NextUnexpiredOffset(rocksdb::Slice const&, long,
long) const+0x121
C  [librocksdbjni-linux64.so+0x6486c1]
 rocksdb::flink::FlinkCompactionFilter::ListDecide(rocksdb::Slice const&,
std::string*) const+0x81
C  [librocksdbjni-linux64.so+0x648bea]
 rocksdb::flink::FlinkCompactionFilter::FilterV2(int, rocksdb::Slice
const&, rocksdb::CompactionFilter::ValueType, rocksdb::Slice const&,
std::string*, std::string*) const+0x14a

Has anyone encountered a similar issue before?

Thanks
Gyula