Re: IgniteTxStateImpl's non-threadsafe fields may cause crashes and/or data loss

2023-05-19 Thread Anton Vinogradov
>> This invariant is violated in many places.
At least in half of the messages:

org.apache.ignite.internal.processors.cache.distributed.GridDistributedLockRequest#partition
org.apache.ignite.internal.processors.cache.distributed.GridDistributedUnlockRequest#partition
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxFinishResponse#partition
org.apache.ignite.internal.processors.cache.distributed.GridDistributedTxPrepareResponse#partition

+1 to use nearXidVersion.hash() as a stripe index, but, listed methods made
me doubt.

On Fri, May 19, 2023 at 7:32 PM Ivan Daschinsky  wrote:

> >> Tx processing is supposed to be thread bound by hashing the version to a
> partition
> This invariant is violated in many places. The most notorious example is tx
> recovery.
>
> Another example: I just added an assertion that checks tId of a creator
> thread with tId of an accessor thread.
> TxMultiCacheAsyncOpsTest fails immediately on processing of a tx prepare
> request. Looks like a big issue, IMO
>
>
> пт, 19 мая 2023 г. в 19:11, Alexei Scherbakov <
> alexey.scherbak...@gmail.com
> >:
>
> > Tx processing is supposed to be thread bound by hashing the version to a
> > partition, see methods like [1]
> > If for some cases this invariant is broken, this should be fixed.
> >
> > [1]
> >
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareRequest#partition
> >
> > пт, 19 мая 2023 г. в 15:57, Anton Vinogradov :
> >
> > > Igniters,
> > >
> > > My team was faced with node failure [1] because of non-threadsafe
> > > collections usage.
> > >
> > > IgniteTxStateImpl's fields
> > > - activeCacheIds
> > > - txMap
> > > are not thread safe, but are widely used from different threads without
> > the
> > > proper sync.
> > >
> > > The main question is ... why?
> > >
> > > According to the research, we have no guarantee that tx will be
> processed
> > > at the single thread.
> > > It may be processed at the several! threads at the striped pool and at
> > the
> > > tx recovery thread as well.
> > >
> > > Thread at the striped pool will be selected by the message's
> partition()
> > > method, which can be calculated like this:
> > > - return keys != null && !keys.isEmpty() ? keys.get(0).partition() :
> -1;
> > > - return U.safeAbs(version().hashCode());
> > > - ...,
> > > so, no guarantee it is processed at the same thread (proven by tests).
> > >
> > > Seems, we MAY lose the data.
> > > For example, ignoring some or all keys from txMap at commit.
> > >
> > > If anyone knows why this is not a problem (I mean sync lack, not data
> > loss)
> > > or how to fix this properly, please give me a hint, or correct my
> > > conclusions if necessary.
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-19445
> > >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>


Re: IgniteTxStateImpl's non-threadsafe fields may cause crashes and/or data loss

2023-05-19 Thread Ivan Daschinsky
>> Tx processing is supposed to be thread bound by hashing the version to a
partition
This invariant is violated in many places. The most notorious example is tx
recovery.

Another example: I just added an assertion that checks tId of a creator
thread with tId of an accessor thread.
TxMultiCacheAsyncOpsTest fails immediately on processing of a tx prepare
request. Looks like a big issue, IMO


пт, 19 мая 2023 г. в 19:11, Alexei Scherbakov :

> Tx processing is supposed to be thread bound by hashing the version to a
> partition, see methods like [1]
> If for some cases this invariant is broken, this should be fixed.
>
> [1]
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareRequest#partition
>
> пт, 19 мая 2023 г. в 15:57, Anton Vinogradov :
>
> > Igniters,
> >
> > My team was faced with node failure [1] because of non-threadsafe
> > collections usage.
> >
> > IgniteTxStateImpl's fields
> > - activeCacheIds
> > - txMap
> > are not thread safe, but are widely used from different threads without
> the
> > proper sync.
> >
> > The main question is ... why?
> >
> > According to the research, we have no guarantee that tx will be processed
> > at the single thread.
> > It may be processed at the several! threads at the striped pool and at
> the
> > tx recovery thread as well.
> >
> > Thread at the striped pool will be selected by the message's partition()
> > method, which can be calculated like this:
> > - return keys != null && !keys.isEmpty() ? keys.get(0).partition() : -1;
> > - return U.safeAbs(version().hashCode());
> > - ...,
> > so, no guarantee it is processed at the same thread (proven by tests).
> >
> > Seems, we MAY lose the data.
> > For example, ignoring some or all keys from txMap at commit.
> >
> > If anyone knows why this is not a problem (I mean sync lack, not data
> loss)
> > or how to fix this properly, please give me a hint, or correct my
> > conclusions if necessary.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-19445
> >
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 
Sincerely yours, Ivan Daschinskiy


Re: IgniteTxStateImpl's non-threadsafe fields may cause crashes and/or data loss

2023-05-19 Thread Alexei Scherbakov
Tx processing is supposed to be thread bound by hashing the version to a
partition, see methods like [1]
If for some cases this invariant is broken, this should be fixed.

[1] 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareRequest#partition

пт, 19 мая 2023 г. в 15:57, Anton Vinogradov :

> Igniters,
>
> My team was faced with node failure [1] because of non-threadsafe
> collections usage.
>
> IgniteTxStateImpl's fields
> - activeCacheIds
> - txMap
> are not thread safe, but are widely used from different threads without the
> proper sync.
>
> The main question is ... why?
>
> According to the research, we have no guarantee that tx will be processed
> at the single thread.
> It may be processed at the several! threads at the striped pool and at the
> tx recovery thread as well.
>
> Thread at the striped pool will be selected by the message's partition()
> method, which can be calculated like this:
> - return keys != null && !keys.isEmpty() ? keys.get(0).partition() : -1;
> - return U.safeAbs(version().hashCode());
> - ...,
> so, no guarantee it is processed at the same thread (proven by tests).
>
> Seems, we MAY lose the data.
> For example, ignoring some or all keys from txMap at commit.
>
> If anyone knows why this is not a problem (I mean sync lack, not data loss)
> or how to fix this properly, please give me a hint, or correct my
> conclusions if necessary.
>
> [1] https://issues.apache.org/jira/browse/IGNITE-19445
>


-- 

Best regards,
Alexei Scherbakov


IgniteTxStateImpl's non-threadsafe fields may cause crashes and/or data loss

2023-05-19 Thread Anton Vinogradov
Igniters,

My team was faced with node failure [1] because of non-threadsafe
collections usage.

IgniteTxStateImpl's fields
- activeCacheIds
- txMap
are not thread safe, but are widely used from different threads without the
proper sync.

The main question is ... why?

According to the research, we have no guarantee that tx will be processed
at the single thread.
It may be processed at the several! threads at the striped pool and at the
tx recovery thread as well.

Thread at the striped pool will be selected by the message's partition()
method, which can be calculated like this:
- return keys != null && !keys.isEmpty() ? keys.get(0).partition() : -1;
- return U.safeAbs(version().hashCode());
- ...,
so, no guarantee it is processed at the same thread (proven by tests).

Seems, we MAY lose the data.
For example, ignoring some or all keys from txMap at commit.

If anyone knows why this is not a problem (I mean sync lack, not data loss)
or how to fix this properly, please give me a hint, or correct my
conclusions if necessary.

[1] https://issues.apache.org/jira/browse/IGNITE-19445