[ 
https://issues.apache.org/jira/browse/GEODE-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiaojian zhou updated GEODE-6973:
---------------------------------
    Labels: GeodeCommons  (was: )

> getExistingIdForType should not compare all entries in idToType region
> ----------------------------------------------------------------------
>
>                 Key: GEODE-6973
>                 URL: https://issues.apache.org/jira/browse/GEODE-6973
>             Project: Geode
>          Issue Type: Bug
>            Reporter: xiaojian zhou
>            Priority: Major
>              Labels: GeodeCommons
>
> We found the PeerTypeRegistration's getExistingIdForType() will iterate 
> through the idToType region's entries to find if the incoming newType is 
> there. 
> If idToType region contains 20K or 100K entries, this will impact the put 
> throughput (customers did notice the performance downgrade when there're many 
> pdxTypes). 
> To make the things worse, the comparison is to compare the whole object, 
> field to field. If the json object (which will be converted to pdxType) 
> contains 30 fields, the comparison will have to compare up to 30 fields. If 
> the idToType region contains 20K entries, A new pdxType will do 20K  x 30 
> string comparisons before register it. 
> We found each server maintained a typeToId map, this map is used to check if 
> the pdxType exists. If exists, it will return the type id without check the 
> IdToType region. The total number of pdxType did not impact the put 
> performance if the pdxTypd exists. 
> The typeToId map is maintained with a d-lock, each time we added a new 
> pdxType, it will update into the map while still holding the d-lok. So we 
> believe that the map should be the same as the region in content. If we 
> cannot find the pdxType in the map, it should not be in the region. We can 
> skip the iteration of region (which is the root cause of the performance 
> issue). 
> Another issue in current code is: when each time a new type come, it will 
> recreate the map. This is unnecessary and contributes to the slowness too. 
> We should only create the map during initialize(). 
> Here are the tests we want to introduce:
> 1) a junit test to prove that reorder fields in a big JSON file will not 
> cause significant hashcode conflicts (<1%)
> 2) a junit test to prove that add a index to a field in a big JSON file will 
> hardly cause hashcode conflicts. 
> This 2 tests are to prove that hashcode conflict is not the root cause of 
> linear probing for PDXTypeId. 
> 3) a junit test to prove that for the cases that hashcode conflict caused by 
> reordered fields, there will be no hashcode conflicts if using 
> SORT_JSON_FIELD_NAMES_PROPERTY=true. 
> 4) a dunit test to prove that SORT_JSON_FIELD_NAMES_PROPERTY=true or false 
> did not impact the performance to add a new pdxType. 
> 5) a dunit test to create a new pdxType from 2 peer server at the same time. 
> The test is to prove that the d-lock take effect, one server create the 
> pdxType, and another server should find the pdxType exists. 
> Do this test both from server directly and from clients. 
> 6) Create 2 different objects which ends up with the same hashcode (we can 
> get the 2 objects from test-1), try to put the 2 objects to create new 
> pdxType. The 2nd one should also create a new type. It should not be treated 
> as "found an existing pdxType". 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to