[
https://issues.apache.org/jira/browse/GEODE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129528#comment-17129528
]
ASF GitHub Bot commented on GEODE-8213:
---------------------------------------
albertogpz commented on a change in pull request #611:
URL: https://github.com/apache/geode-native/pull/611#discussion_r436629331
##########
File path: cppcache/src/SerializationRegistry.cpp
##########
@@ -355,45 +363,39 @@ std::shared_ptr<Serializable>
SerializationRegistry::GetEnum(
}
void TheTypeMap::clear() {
- std::lock_guard<util::concurrent::spinlock_mutex> guard(
- m_dataSerializableMapLock);
- m_dataSerializableMap.clear();
+ auto tempDataSerializableMap =
+ std::make_shared<std::unordered_map<int32_t, TypeFactoryMethod>>();
+ std::atomic_store(&m_dataSerializableMap, tempDataSerializableMap);
- std::lock_guard<util::concurrent::spinlock_mutex> guard2(
- m_dataSerializableFixedIdMapLock);
- m_dataSerializableFixedIdMap.clear();
+ auto tempDataSerializableFixedId = std::make_shared<
+ std::unordered_map<internal::DSFid, TypeFactoryMethod>>();
+ std::atomic_store(&m_dataSerializableFixedIdMap,
tempDataSerializableFixedId);
- std::lock_guard<util::concurrent::spinlock_mutex> guard3(
- m_pdxSerializableMapLock);
- m_pdxSerializableMap.clear();
+ auto tempPdxSerializableMap =
+ std::make_shared<std::unordered_map<std::string,
TypeFactoryMethodPdx>>();
+ std::atomic_store(&m_pdxSerializableMap, tempPdxSerializableMap);
}
void TheTypeMap::findDataSerializable(int32_t id,
TypeFactoryMethod& func) const {
- std::lock_guard<util::concurrent::spinlock_mutex> guard(
- m_dataSerializableMapLock);
- const auto& found = m_dataSerializableMap.find(id);
- if (found != m_dataSerializableMap.end()) {
+ const auto& found = m_dataSerializableMap->find(id);
+ if (found != m_dataSerializableMap->end()) {
Review comment:
You are right
##########
File path: cppcache/src/SerializationRegistry.cpp
##########
@@ -409,44 +411,56 @@ void TheTypeMap::bindDataSerializable(TypeFactoryMethod
func, int32_t id) {
"TheTypeMap::bind: Serialization type not implemented.");
}
- std::lock_guard<util::concurrent::spinlock_mutex> guard(
- m_dataSerializableMapLock);
- const auto& result = m_dataSerializableMap.emplace(id, func);
+ auto tempDataSerializableMap =
+ std::make_shared<std::unordered_map<int32_t, TypeFactoryMethod>>(
+ *m_dataSerializableMap);
+ const auto& result = tempDataSerializableMap->emplace(id, func);
if (!result.second) {
LOGERROR("A class with ID %d is already registered.", id);
throw IllegalStateException("A class with given ID is already
registered.");
}
+ std::atomic_store(&m_dataSerializableMap, tempDataSerializableMap);
Review comment:
I was assuming that the map would only be updated from one thread. Is it
reasonable?
If several threads could update the map then I do not see how we could make
this work consistently and being thread safe.
In your draft code using compare_and_exchange you need to clone the map and
I think that should be done atomically. Otherwise, the data could be modified
by another thread in the mean time. To do it atomically we would need blocking
Besides, to use compare_and_exchange on the map we would need it to be
atomic. We could change the type from shared_ptr to a raw pointer but in that
case the memory management would not be obvious at all. How could we know when
you could free the memory map after exchanging the pointer? It might be that
some other thread is using that memory at the same time and we would get a
segmentation violation.
Please, let me know if I am missing anything.
Otherwise, I think it would be better to go for a solution based on shared
locks.
##########
File path: cppcache/src/SerializationRegistry.cpp
##########
@@ -409,44 +411,56 @@ void TheTypeMap::bindDataSerializable(TypeFactoryMethod
func, int32_t id) {
"TheTypeMap::bind: Serialization type not implemented.");
}
- std::lock_guard<util::concurrent::spinlock_mutex> guard(
- m_dataSerializableMapLock);
- const auto& result = m_dataSerializableMap.emplace(id, func);
+ auto tempDataSerializableMap =
+ std::make_shared<std::unordered_map<int32_t, TypeFactoryMethod>>(
+ *m_dataSerializableMap);
+ const auto& result = tempDataSerializableMap->emplace(id, func);
if (!result.second) {
LOGERROR("A class with ID %d is already registered.", id);
throw IllegalStateException("A class with given ID is already
registered.");
}
+ std::atomic_store(&m_dataSerializableMap, tempDataSerializableMap);
Review comment:
I changed the implementation to boost::shared_mutex and the performance
measured by the benchmark introduced was way worse than with the spinlock. I
had read that the implementation of the boost::shared_lock was suboptimal on
Linux.
These are the figures:
Benchmark
Time CPU Time Old Time New CPU Old
CPU New
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:1
+1.0440 +1.0438 213 435 213
434
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:2
+0.9090 +0.9091 156 297 312
595
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:4
+1.0325 +1.0136 65 132 254
512
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:8
+0.7878 +0.7826 43 78 348
620
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:16
+0.9026 +1.0066 31 59 357
717
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:32
+0.9935 +0.9967 28 57 365
729
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:64
+1.0535 +1.0519 26 54 358
735
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:96
+1.1182 +1.0243 25 52 360
729
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:1
+0.7410 +0.7410 231 403 231
403
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:2
+1.4190 +1.3271 114 276 228
531
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:4
+0.9838 +0.9709 59 116 234
462
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:8
+0.9452 +0.9369 42 82 335
649
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:16
+1.0843 +0.9737 30 62 368
725
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:32
+1.0204 +1.0018 28 57 367
735
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:64
+0.9670 +0.9939 28 55 368
734
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:96
+0.8644 +1.0207 29 54 362
732
SerializationRegistryBM_findDataSerializable/real_time/threads:1
+0.9817 +0.9817 211 418 211
418
SerializationRegistryBM_findDataSerializable/real_time/threads:2
+0.3399 +0.3401 157 211 314
421
SerializationRegistryBM_findDataSerializable/real_time/threads:4
+1.2335 +1.2339 59 133 237
530
SerializationRegistryBM_findDataSerializable/real_time/threads:8
+0.9273 +0.9344 42 81 335
648
SerializationRegistryBM_findDataSerializable/real_time/threads:16
+1.1570 +1.0856 29 63 360
750
SerializationRegistryBM_findDataSerializable/real_time/threads:32
+1.2448 +1.0935 28 62 364
762
SerializationRegistryBM_findDataSerializable/real_time/threads:64
+1.0317 +1.0715 27 56 363
752
SerializationRegistryBM_findDataSerializable/real_time/threads:96
+1.0391 +1.0796 26 54 361
751
SerializationRegistryBM_findPdxSerializable/real_time/threads:1
+0.8144 +0.8144 234 425 234
425
SerializationRegistryBM_findPdxSerializable/real_time/threads:2
+0.4180 +0.4181 154 218 308
436
SerializationRegistryBM_findPdxSerializable/real_time/threads:4
+0.6402 +0.6284 73 120 293
476
SerializationRegistryBM_findPdxSerializable/real_time/threads:8
+1.3334 +1.3390 39 90 307
717
SerializationRegistryBM_findPdxSerializable/real_time/threads:16
+0.9053 +0.9381 34 65 407
789
SerializationRegistryBM_findPdxSerializable/real_time/threads:32
+1.0597 +0.9980 31 64 414
828
SerializationRegistryBM_findPdxSerializable/real_time/threads:64
+0.9588 +0.9441 32 63 413
803
SerializationRegistryBM_findPdxSerializable/real_time/threads:96
+0.7951 +0.9451 32 57 411
800
I tried with C++14 std::shared_timed_mutex and the performance was better in
most cases although the improvement was not spectacular:
Benchmark
Time CPU Time Old Time New CPU Old
CPU New
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:1
-0.0320 -0.0320 213 206 213
206
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:2
-0.2957 -0.2957 156 110 312
219
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:4
-0.1130 -0.0923 65 58 254
231
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:8
-0.0796 -0.0811 43 40 348
319
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:16
-0.0082 +0.0533 31 31 357
376
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:32
+0.0372 +0.0424 28 30 365
381
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:64
+0.0669 +0.0692 26 28 358
383
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:96
+0.1285 +0.0457 25 28 360
377
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:1
-0.0929 -0.0929 231 210 231
210
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:2
-0.0190 -0.0191 114 112 228
224
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:4
-0.0091 -0.0092 59 58 234
232
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:8
-0.1483 -0.1501 42 36 335
285
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:16
+0.0480 +0.0159 30 31 368
373
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:32
+0.0776 +0.0321 28 30 367
379
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:64
+0.0035 +0.0254 28 28 368
378
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:96
+0.0029 +0.0447 29 29 362
378
SerializationRegistryBM_findDataSerializable/real_time/threads:1
+0.0412 +0.0412 211 219 211
219
SerializationRegistryBM_findDataSerializable/real_time/threads:2
-0.3122 -0.3121 157 108 314
216
SerializationRegistryBM_findDataSerializable/real_time/threads:4
-0.0301 -0.0299 59 58 237
230
SerializationRegistryBM_findDataSerializable/real_time/threads:8
-0.0128 -0.0112 42 42 335
331
SerializationRegistryBM_findDataSerializable/real_time/threads:16
+0.0404 +0.0318 29 30 360
371
SerializationRegistryBM_findDataSerializable/real_time/threads:32
-0.0093 +0.0292 28 27 364
374
SerializationRegistryBM_findDataSerializable/real_time/threads:64
-0.0119 +0.0465 27 27 363
380
SerializationRegistryBM_findDataSerializable/real_time/threads:96
-0.0511 +0.0467 26 25 361
378
SerializationRegistryBM_findPdxSerializable/real_time/threads:1
-0.0410 -0.0410 234 225 234
225
SerializationRegistryBM_findPdxSerializable/real_time/threads:2
-0.2185 -0.2185 154 120 308
240
SerializationRegistryBM_findPdxSerializable/real_time/threads:4
-0.1662 -0.1662 73 61 293
244
SerializationRegistryBM_findPdxSerializable/real_time/threads:8
+0.0004 -0.0001 39 39 307
306
SerializationRegistryBM_findPdxSerializable/real_time/threads:16
-0.0075 +0.0106 34 34 407
412
SerializationRegistryBM_findPdxSerializable/real_time/threads:32
+0.0701 +0.0034 31 33 414
416
SerializationRegistryBM_findPdxSerializable/real_time/threads:64
-0.0072 +0.0068 32 32 413
416
SerializationRegistryBM_findPdxSerializable/real_time/threads:96
-0.0614 +0.0182 32 30 411
419
As a result, I do not see the point in changing the implementation until
Geode is upgraded to a later C++ version. Do you know if there are plans for it?
I think at least we should leave the benchmark tests for future improvements.
##########
File path: cppcache/src/SerializationRegistry.cpp
##########
@@ -409,44 +411,56 @@ void TheTypeMap::bindDataSerializable(TypeFactoryMethod
func, int32_t id) {
"TheTypeMap::bind: Serialization type not implemented.");
}
- std::lock_guard<util::concurrent::spinlock_mutex> guard(
- m_dataSerializableMapLock);
- const auto& result = m_dataSerializableMap.emplace(id, func);
+ auto tempDataSerializableMap =
+ std::make_shared<std::unordered_map<int32_t, TypeFactoryMethod>>(
+ *m_dataSerializableMap);
+ const auto& result = tempDataSerializableMap->emplace(id, func);
if (!result.second) {
LOGERROR("A class with ID %d is already registered.", id);
throw IllegalStateException("A class with given ID is already
registered.");
}
+ std::atomic_store(&m_dataSerializableMap, tempDataSerializableMap);
Review comment:
I changed the implementation to boost::shared_mutex and the performance
measured by the benchmark introduced was way worse than with the spinlock. I
had read that the implementation of the boost::shared_lock was suboptimal on
Linux.
These are the figures:
Benchmark
Time CPU Time Old Time New CPU Old
CPU New
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:1
+1.0440 +1.0438 213 435 213
434
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:2
+0.9090 +0.9091 156 297 312
595
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:4
+1.0325 +1.0136 65 132 254
512
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:8
+0.7878 +0.7826 43 78 348
620
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:16
+0.9026 +1.0066 31 59 357
717
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:32
+0.9935 +0.9967 28 57 365
729
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:64
+1.0535 +1.0519 26 54 358
735
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:96
+1.1182 +1.0243 25 52 360
729
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:1
+0.7410 +0.7410 231 403 231
403
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:2
+1.4190 +1.3271 114 276 228
531
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:4
+0.9838 +0.9709 59 116 234
462
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:8
+0.9452 +0.9369 42 82 335
649
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:16
+1.0843 +0.9737 30 62 368
725
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:32
+1.0204 +1.0018 28 57 367
735
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:64
+0.9670 +0.9939 28 55 368
734
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:96
+0.8644 +1.0207 29 54 362
732
SerializationRegistryBM_findDataSerializable/real_time/threads:1
+0.9817 +0.9817 211 418 211
418
SerializationRegistryBM_findDataSerializable/real_time/threads:2
+0.3399 +0.3401 157 211 314
421
SerializationRegistryBM_findDataSerializable/real_time/threads:4
+1.2335 +1.2339 59 133 237
530
SerializationRegistryBM_findDataSerializable/real_time/threads:8
+0.9273 +0.9344 42 81 335
648
SerializationRegistryBM_findDataSerializable/real_time/threads:16
+1.1570 +1.0856 29 63 360
750
SerializationRegistryBM_findDataSerializable/real_time/threads:32
+1.2448 +1.0935 28 62 364
762
SerializationRegistryBM_findDataSerializable/real_time/threads:64
+1.0317 +1.0715 27 56 363
752
SerializationRegistryBM_findDataSerializable/real_time/threads:96
+1.0391 +1.0796 26 54 361
751
SerializationRegistryBM_findPdxSerializable/real_time/threads:1
+0.8144 +0.8144 234 425 234
425
SerializationRegistryBM_findPdxSerializable/real_time/threads:2
+0.4180 +0.4181 154 218 308
436
SerializationRegistryBM_findPdxSerializable/real_time/threads:4
+0.6402 +0.6284 73 120 293
476
SerializationRegistryBM_findPdxSerializable/real_time/threads:8
+1.3334 +1.3390 39 90 307
717
SerializationRegistryBM_findPdxSerializable/real_time/threads:16
+0.9053 +0.9381 34 65 407
789
SerializationRegistryBM_findPdxSerializable/real_time/threads:32
+1.0597 +0.9980 31 64 414
828
SerializationRegistryBM_findPdxSerializable/real_time/threads:64
+0.9588 +0.9441 32 63 413
803
SerializationRegistryBM_findPdxSerializable/real_time/threads:96
+0.7951 +0.9451 32 57 411
800
I tried with C++14 std::shared_timed_mutex and the performance was better in
most cases although the improvement was not spectacular:
Benchmark
Time CPU Time Old Time New CPU Old
CPU New
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:1
-0.0320 -0.0320 213 206 213
206
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:2
-0.2957 -0.2957 156 110 312
219
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:4
-0.1130 -0.0923 65 58 254
231
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:8
-0.0796 -0.0811 43 40 348
319
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:16
-0.0082 +0.0533 31 31 357
376
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:32
+0.0372 +0.0424 28 30 365
381
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:64
+0.0669 +0.0692 26 28 358
383
SerializationRegistryBM_findDataSerializablePrimitive/real_time/threads:96
+0.1285 +0.0457 25 28 360
377
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:1
-0.0929 -0.0929 231 210 231
210
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:2
-0.0190 -0.0191 114 112 228
224
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:4
-0.0091 -0.0092 59 58 234
232
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:8
-0.1483 -0.1501 42 36 335
285
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:16
+0.0480 +0.0159 30 31 368
373
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:32
+0.0776 +0.0321 28 30 367
379
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:64
+0.0035 +0.0254 28 28 368
378
SerializationRegistryBM_findDataSerializableFixedId/real_time/threads:96
+0.0029 +0.0447 29 29 362
378
SerializationRegistryBM_findDataSerializable/real_time/threads:1
+0.0412 +0.0412 211 219 211
219
SerializationRegistryBM_findDataSerializable/real_time/threads:2
-0.3122 -0.3121 157 108 314
216
SerializationRegistryBM_findDataSerializable/real_time/threads:4
-0.0301 -0.0299 59 58 237
230
SerializationRegistryBM_findDataSerializable/real_time/threads:8
-0.0128 -0.0112 42 42 335
331
SerializationRegistryBM_findDataSerializable/real_time/threads:16
+0.0404 +0.0318 29 30 360
371
SerializationRegistryBM_findDataSerializable/real_time/threads:32
-0.0093 +0.0292 28 27 364
374
SerializationRegistryBM_findDataSerializable/real_time/threads:64
-0.0119 +0.0465 27 27 363
380
SerializationRegistryBM_findDataSerializable/real_time/threads:96
-0.0511 +0.0467 26 25 361
378
SerializationRegistryBM_findPdxSerializable/real_time/threads:1
-0.0410 -0.0410 234 225 234
225
SerializationRegistryBM_findPdxSerializable/real_time/threads:2
-0.2185 -0.2185 154 120 308
240
SerializationRegistryBM_findPdxSerializable/real_time/threads:4
-0.1662 -0.1662 73 61 293
244
SerializationRegistryBM_findPdxSerializable/real_time/threads:8
+0.0004 -0.0001 39 39 307
306
SerializationRegistryBM_findPdxSerializable/real_time/threads:16
-0.0075 +0.0106 34 34 407
412
SerializationRegistryBM_findPdxSerializable/real_time/threads:32
+0.0701 +0.0034 31 33 414
416
SerializationRegistryBM_findPdxSerializable/real_time/threads:64
-0.0072 +0.0068 32 32 413
416
SerializationRegistryBM_findPdxSerializable/real_time/threads:96
-0.0614 +0.0182 32 30 411
419
As a result, I do not see the point in changing the implementation until
Geode is upgraded to a later C++ version. Do you know if there are plans for it?
I think at least we should leave the benchmark tests for future improvements.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> C++ native client performance bottleneck in access to serialization registry
> ----------------------------------------------------------------------------
>
> Key: GEODE-8213
> URL: https://issues.apache.org/jira/browse/GEODE-8213
> Project: Geode
> Issue Type: Improvement
> Components: native client
> Reporter: Alberto Gomez
> Assignee: Alberto Gomez
> Priority: Major
> Fix For: 1.14.0
>
>
> It's been observed that when the number of threads used in a Geode client
> using PdxSerialization is greater than 8, there is an important drop in
> performance.
> Analysing the client process behavior with perf, it has been observed a very
> high CPU consumption by a spinlock
> (apache::geode::util::concurrent::spinlock_mutex::lock) used when accessing
> the serialization registry .
--
This message was sent by Atlassian Jira
(v8.3.4#803005)