Please, run benchmarks after fixing the problem. E.g. replacing HashMap to ConcurrentHashMap can significantly affect performance.
See for example comments to IGNITE-2968 issue ( https://issues.apache.org/jira/browse/IGNITE-2968?focusedCommentId=15415170&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15415170 ). I believe that mentioned invariants were broken later but in general I agree with Alexey, this state should be accessed mostly from one thread. Exceptional cases should be synchronized or redesigned. E.g. if metrics read a transaction's state I prefer remove these metrics or ignore some inaccuracy then performance reducing. On Fri, May 19, 2023 at 7:32 PM Ivan Daschinsky <ivanda...@gmail.com> wrote: > >> Tx processing is supposed to be thread bound by hashing the version to a > partition > This invariant is violated in many places. The most notorious example is tx > recovery. > > Another example: I just added an assertion that checks tId of a creator > thread with tId of an accessor thread. > TxMultiCacheAsyncOpsTest fails immediately on processing of a tx prepare > request. Looks like a big issue, IMO > > > пт, 19 мая 2023 г. в 19:11, Alexei Scherbakov < > alexey.scherbak...@gmail.com > >: > > > Tx processing is supposed to be thread bound by hashing the version to a > > partition, see methods like [1] > > If for some cases this invariant is broken, this should be fixed. > > > > [1] > > > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareRequest#partition > > > > пт, 19 мая 2023 г. в 15:57, Anton Vinogradov <a...@apache.org>: > > > > > Igniters, > > > > > > My team was faced with node failure [1] because of non-threadsafe > > > collections usage. > > > > > > IgniteTxStateImpl's fields > > > - activeCacheIds > > > - txMap > > > are not thread safe, but are widely used from different threads without > > the > > > proper sync. > > > > > > The main question is ... why? > > > > > > According to the research, we have no guarantee that tx will be > processed > > > at the single thread. > > > It may be processed at the several! threads at the striped pool and at > > the > > > tx recovery thread as well. > > > > > > Thread at the striped pool will be selected by the message's > partition() > > > method, which can be calculated like this: > > > - return keys != null && !keys.isEmpty() ? keys.get(0).partition() : > -1; > > > - return U.safeAbs(version().hashCode()); > > > - ..., > > > so, no guarantee it is processed at the same thread (proven by tests). > > > > > > Seems, we MAY lose the data. > > > For example, ignoring some or all keys from txMap at commit. > > > > > > If anyone knows why this is not a problem (I mean sync lack, not data > > loss) > > > or how to fix this properly, please give me a hint, or correct my > > > conclusions if necessary. > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-19445 > > > > > > > > > -- > > > > Best regards, > > Alexei Scherbakov > > > > > -- > Sincerely yours, Ivan Daschinskiy >