[ https://issues.apache.org/jira/browse/GEODE-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xiaojian zhou updated GEODE-6973: --------------------------------- Labels: GeodeCommons (was: ) > getExistingIdForType should not compare all entries in idToType region > ---------------------------------------------------------------------- > > Key: GEODE-6973 > URL: https://issues.apache.org/jira/browse/GEODE-6973 > Project: Geode > Issue Type: Bug > Reporter: xiaojian zhou > Priority: Major > Labels: GeodeCommons > > We found the PeerTypeRegistration's getExistingIdForType() will iterate > through the idToType region's entries to find if the incoming newType is > there. > If idToType region contains 20K or 100K entries, this will impact the put > throughput (customers did notice the performance downgrade when there're many > pdxTypes). > To make the things worse, the comparison is to compare the whole object, > field to field. If the json object (which will be converted to pdxType) > contains 30 fields, the comparison will have to compare up to 30 fields. If > the idToType region contains 20K entries, A new pdxType will do 20K x 30 > string comparisons before register it. > We found each server maintained a typeToId map, this map is used to check if > the pdxType exists. If exists, it will return the type id without check the > IdToType region. The total number of pdxType did not impact the put > performance if the pdxTypd exists. > The typeToId map is maintained with a d-lock, each time we added a new > pdxType, it will update into the map while still holding the d-lok. So we > believe that the map should be the same as the region in content. If we > cannot find the pdxType in the map, it should not be in the region. We can > skip the iteration of region (which is the root cause of the performance > issue). > Another issue in current code is: when each time a new type come, it will > recreate the map. This is unnecessary and contributes to the slowness too. > We should only create the map during initialize(). > Here are the tests we want to introduce: > 1) a junit test to prove that reorder fields in a big JSON file will not > cause significant hashcode conflicts (<1%) > 2) a junit test to prove that add a index to a field in a big JSON file will > hardly cause hashcode conflicts. > This 2 tests are to prove that hashcode conflict is not the root cause of > linear probing for PDXTypeId. > 3) a junit test to prove that for the cases that hashcode conflict caused by > reordered fields, there will be no hashcode conflicts if using > SORT_JSON_FIELD_NAMES_PROPERTY=true. > 4) a dunit test to prove that SORT_JSON_FIELD_NAMES_PROPERTY=true or false > did not impact the performance to add a new pdxType. > 5) a dunit test to create a new pdxType from 2 peer server at the same time. > The test is to prove that the d-lock take effect, one server create the > pdxType, and another server should find the pdxType exists. > Do this test both from server directly and from clients. > 6) Create 2 different objects which ends up with the same hashcode (we can > get the 2 objects from test-1), try to put the 2 objects to create new > pdxType. The 2nd one should also create a new type. It should not be treated > as "found an existing pdxType". -- This message was sent by Atlassian JIRA (v7.6.14#76016)