GitHub user rmatharu opened a pull request:
https://github.com/apache/samza/pull/796
Enabling auto-discovery of regex input topics
This PR makes the following changes
* Enriches StreamPartitionCountMonitor to periodically monitor
input-regexes to match to actual inputs and stop the job when a new input
stream is discovered.
* Add a new API to SysAdmin to allow listing of all streams, e.g.,
Kafka-topics. KafkaSysAdmin implementation of this uses KafkaConsumer's
listTopics API. (Even if listTopics had 1 million topics with 100 bytes per
topic total, temporary memory overhead will be 100 MB).
* Added config job.coordinator.monitor-input-regex.frequency.ms for the
monitoring frequency, and job.coordinator.monitor-input-regex.%s for each input
system. Users can then choose desired regex for each input system, e.g.,
job.coordinator.monitor-input-regex.kafka=test-.*.
* We can later enrich RegexTopicGen rewriter to add a monitor-input-regex
config to allow periodic jonitoring
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rmatharu/samza newtopic-test
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/samza/pull/796.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #796
----
commit f33839e9b7eae354a790d0002352c732c5f6868f
Author: Ray Matharu <rmatharu@...>
Date: 2018-11-06T01:58:13Z
Full-working logic for new topic discovery
----
---