Hi, I forgot to mention that I enabled ON_DISCONNECT_CLEAR_PDXTYPEIDS property. Also, I tried a different scenario which does not exactly involves local PdxType retention, which is:
1. Start a cluster with 1 locator and 3 servers, and persistence is disabled for PdxTypes. 2. Setup a region called "test-region" with persistence disabled. It doesn't mind whether is replicated or partitioned. 3. In the client, instantiate the client region with PROXY region shortcut and establish the connection toward the cluster. 4. In the client, create a PdxInstance. 5. At this point, cluster is restarted, meaning that all the data is lost, included PdxTypes. 6. In the client, the PdxInstance created in step 4 is put into "test-region" with key "test". 7. In the client, the following query is executed: "SELECT * FROM /test-region WHERE value = -1". The outcome is the same, query fails with the message "Unknown pdx type=<PdxType ID>" and it won't work until the corrupted entry is removed. I don't know if you've seen this kind of scenarios before. I am just wondering in case this is something that needs to be fixed. Thanks, Mario. ________________________________ From: Anthony Baker <bak...@vmware.com> Sent: Wednesday, May 5, 2021 1:06 AM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: Region data corruption due to missing PdxTypes Retaining local pox types in the client after a disconnect will cause problems as you observed. Take a look at the “ON_DISCONNECT_CLEAR_PDXTYPEIDS” property to improve this. Anthony > On May 4, 2021, at 4:36 AM, Mario Salazar de Torres > <mario.salazar.de.tor...@est.tech> wrote: > > Hi everyone, > > While debugging some coredumps in the native client related to > PdxTypeRegistry cleanup, I tried to reproduce the scenario with the Java > client API to see how it was handled. > Thing is I've noticed that this scenario in the Java client might lead to > Geode storing a corrupted entry, meaning that queries won't work on those > regions containing corrupted entries. > And with corrupted entries, I refer to entries using a missing PdxType. The > scenario involves a cluster restart. It's described below: > > 1. Start a cluster with 1 locator and 3 servers, and persistence is > disabled for PdxTypes. > 2. Setup a region called "test-region" with persistence disabled. It > doesn't mind whether is replicated or partitioned. > 3. In the client, instantiate the client region with PROXY region shortcut > and establish the connection toward the cluster. > 4. In the client, create a PdxInstance and put in into the "test-region" > with key "test". > 5. In the client, get the entry which key is "test", which turns out to be > the PdxInstance inserted in step 4. > 6. At this point, cluster is restarted, meaning that all the data is lost, > included PdxTypes. > 7. In the client, the PdxInstance obtained in step 5 is put into > "test-region" with key "test2" > 8. In the client, the following query is executed: "SELECT * FROM > /test-region WHERE value = -1". > Such query fails with the message "Unknown pdx type=<PdxType ID>" and it > won't work until the corrupted entry is removed. > > Also, the above scenario could be solved by enabling persistence for > PdxTypes, but if you have an unrecoverable issue in your cluster and you need > to spin up a backup, > it could happen that PdxInstance's PdxType obtained step 5 is not present in > the backup, leading to the entry being inserted but, yet again, the PdxType > being missing. > > It's worth mentioning that in the native client, this scenario currently > results in a coredump, but no data corruption, > given that after losing the connection towards the cluster PdxTypeRegistry is > cleaned up and PdxTypes are obtained with its ID, rather than directly using > the object. > > My question here are: > > * Have you seen this issue before? > * Is there a way to verify that PdxTypes are present in the cluster before > writing an entry which holds some PdxInstances? > > Thanks, > Mario.