-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48459/
-----------------------------------------------------------
(Updated June 12, 2016, 11:54 p.m.)
Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Navina
Ramesh, Jagadish Venkatraman, Xinyu Liu, and Yi Pan (Data Infrastructure).
Bugs: SAMZA-964
https://issues.apache.org/jira/browse/SAMZA-964
Repository: samza
Description
-------
SAMZA-964 Improve the performance of the continuous OFFSET checkpointing for
logged stores
1. Cache metadata more aggressively. Only expire metadata if we get Kafka
exceptions. This applies for all cases EXCEPT the partition count monitor,
which uses the TTL from the StreamMetadataCache
2. Reduce excessive Offset fetching.
3. Do not allow unbounded exponential backoff for offset checkpointing, just
skip the offset file. Exponential backoff can balloon the commit time and stall
the event loop. So we will only retry up to 3 times for a max delay of 400ms
4. Add some trace log messages to help track/time KV Store flushes (the other
culprit for the slowdown)
Diffs (updated)
-----
samza-api/src/main/java/org/apache/samza/system/ExtendedSystemAdmin.java
daa2212cf1d54e90861657fab86b2e780d7e89e2
samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamSystemConsumer.java
0a6661c423a09944aa211223cad205958d3b1fee
samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala
c7b05203a1958a62af9dec04b215d985c4646dc4
samza-core/src/main/scala/org/apache/samza/system/StreamMetadataCache.scala
18b47ec3393978e403cadd8754f3fa5fd68654e9
samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala
110c3a910aa0bae77dfe5eebbf82286b56dc4654
samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala
c8ea64c7c67dd6bf789d2a3445d620ccef1beac0
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
23aa58dff6b5e282bb634d3913cacd73003402ea
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
6c292234dcdd54eaca05f3e1a3fc401e205d6066
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
f0965aec5f3ec2a214dc40c70832c58273623749
samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala
c28f8db8cb59bd5415e78535877acc1e5bee0f67
samza-kv/src/main/scala/org/apache/samza/storage/kv/LoggedStore.scala
7bba6ff37d8266674e7f15c10c7c146f4a41fc91
samza-kv/src/main/scala/org/apache/samza/storage/kv/SerializedKeyValueStore.scala
8e183efcdec6fd3f921fc2bfe1971c95715930ed
Diff: https://reviews.apache.org/r/48459/diff/
Testing
-------
New unit tests.
./gradlew clean build
Manual testing with 2 test jobs and the big job that had the performance issue.
Thanks,
Jake Maes