Hello everyone,

I would like to bring up an issue with Pulsar's containers, specifically
regarding the method of overriding configurations. For instance, the
Apache Pulsar Helm chart employs "bin/apply-config-from-env.py
conf/broker.conf" and "bin/gen-yml-from-env.py
conf/functions_worker.yml" [1] to apply configurations passed in the
environment to the configuration files in the container's root file system. 
This approach fails when the container's root file system is read-only due to
strict security policies (`readOnlyRootFilesystem` in
`securityContext`). This issue has been reported as #22088 [2].

A temporary fix could involve using a temporary file to modify the
configuration file when the filesystem is read-only. However, the Python
script solution is not ideal, and we should consider eliminating it. In
the long term, it would also be beneficial to remove the need for a
shell script to start Pulsar, but that's a separate issue.

For configuration handling, we need a solution that can apply overrides
in memory, eliminating the need to modify on-disk files. Modern
configuration frameworks can do this out-of-the-box. Currently, Pulsar
uses a homegrown configuration framework. Instead of extending this
framework, I propose we discuss replacing it with the Gestalt Config
library [3]. This library, licensed under Apache-2.0, is a mature,
well-established solution for configuration handling.

Switching to Gestalt Config would allow us to move towards a more
structured and modular configuration in Pulsar. Our current
configuration is not modular, as it relies on a "god object" for
configuration, which collects all possible configuration options.
Gestalt Config offers modular usage patterns similar to those of 
Spring Boot's external configuration [4] and the MicroProfile Config [5]
in Quarkus. However, Gestalt Config does not pull in other dependencies, 
giving it an advantage over Spring Boot and Quarkus configuration solutions.
Other libraries in this category include the Typesafe config library [6]
from Lightbend with HOCON [7], commonly used in Scala and Akka-based
applications.

Gestalt Config supports many configuration file formats, including flat
properties files, yaml, json, toml, and even hocon. It also offers
security features for reading secrets directly from Vault, AWS Secrets
Manager, and GCP Secret Manager, without the need to use the file system
or environment variables to inject secrets into the application
configuration. This could significantly improve Pulsar's security
posture.

Pulsar's current "homegrown configuration framework" is quite simple,
implemented in a few classes with the main logic in
PulsarConfigurationLoader [8] and FieldParser [9] classes, called from
the PulsarBrokerStarter class [10].

The main question is: should we continue extending Pulsar's homegrown
configuration framework, or should we consider adopting a library like
Gestalt Config for future configuration use case improvements for
modularity, structured configuration, and security?

Best regards,

Lari

References: 
1 - 
https://github.com/apache/pulsar-helm-chart/blob/29ea17b3fceef65160620b9018d0dd0449a168c5/charts/pulsar/templates/broker-statefulset.yaml#L210-L221
2 - https://github.com/apache/pulsar/issues/22088
3 - https://github.com/gestalt-config/gestalt 
4 - 
https://docs.spring.io/spring-boot/docs/current/reference/html/features.html#features.external-config
5 - https://microprofile.io/specifications/microprofile-config/ 
6 - https://github.com/lightbend/config 
7 - https://github.com/lightbend/config/blob/main/HOCON.md 
8 - 
https://github.com/apache/pulsar/blob/master/pulsar-broker-common/src/main/java/org/apache/pulsar/common/configuration/PulsarConfigurationLoader.java
9 - 
https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/java/org/apache/pulsar/common/util/FieldParser.java
10 - 
https://github.com/apache/pulsar/blob/db79096baaa3d7118aa026978a615ddc576f9183/pulsar-broker/src/main/java/org/apache/pulsar/PulsarBrokerStarter.java#L69-L76

Reply via email to