Hi everyone,

While debugging some coredumps in the native client related to PdxTypeRegistry 
cleanup, I tried to reproduce the scenario with the Java client API to see how 
it was handled.
Thing is I've noticed that this scenario in the Java client might lead to Geode 
storing a corrupted entry, meaning that queries won't work on those regions 
containing corrupted entries.
And with corrupted entries, I refer to entries using a missing PdxType. The 
scenario involves a cluster restart. It's described below:

  1.  Start a cluster with 1 locator and 3 servers, and persistence is disabled 
for PdxTypes.
  2.  Setup a region called "test-region" with persistence disabled. It doesn't 
mind whether is replicated or partitioned.
  3.  In the client, instantiate the client region with PROXY region shortcut 
and establish the connection toward the cluster.
  4.  In the client, create a PdxInstance and put in into the "test-region" 
with key "test".
  5.  In the client, get the entry which key is "test", which turns out to be 
the PdxInstance inserted in step 4.
  6.  At this point, cluster is restarted, meaning that all the data is lost, 
included PdxTypes.
  7.  In the client, the PdxInstance obtained in step 5 is put into 
"test-region" with key "test2"
  8.  In the client, the following query is executed: "SELECT * FROM 
/test-region WHERE value = -1".
Such query fails with the message "Unknown pdx type=<PdxType ID>" and it won't 
work until the corrupted entry is removed.

Also, the above scenario could be solved by enabling persistence for PdxTypes, 
but if you have an unrecoverable issue in your cluster and you need to spin up 
a backup,
it could happen that PdxInstance's PdxType obtained step 5 is not present in 
the backup, leading to the entry being inserted but, yet again, the PdxType 
being missing.

It's worth mentioning that in the native client, this scenario currently 
results in a coredump, but no data corruption,
given that after losing the connection towards the cluster PdxTypeRegistry is 
cleaned up and PdxTypes are obtained with its ID, rather than directly using 
the object.

My question here are:

  *   Have you seen this issue before?
  *   Is there a way to verify that PdxTypes are present in the cluster before 
writing an entry which holds some PdxInstances?

Thanks,
Mario.

Reply via email to