+1 to remove this, I don't believe users heavily depend on this, but let's
keep this thread for at least a few days to collect feedback.

Thanks
Weiwei

On Mon, Dec 19, 2022 at 4:21 PM Craig Condit <ccon...@apache.org> wrote:

> Re-adding existing YUNIKORN-1483 text as formatting broke badly. I’m not
> proposing this is the way to go, just referencing the JIRA for discussion:
>
>
> <SNIP>
>
> The current support for generating periodic state dumps implemented in
> YUNIKORN-940 has several warts:
>
> 1. The configuration in YUNIKORN-949 is done via the core scheduler
> configuration, leading to a random option on partitions which doesn't
> belong there and has nothing to do with scheduling.
>
> 2. Changing the frequency of the state dumps is done via the unsecured
> REST API. This is a potential denial-of-service vector.
>
> 3. Configuration V2 is now complete, which standardizes on using a
> ConfigMap to configure all YuniKorn options that make sense to be
> reconfigured. However, allowing the location to be changed at runtime makes
> no sense in a containerized environment.
>
> 4. Retrieving the state dumps requires mounting of external storage. This
> is necessarily a site-specific configuration and currently requires a
> custom Helm deployment.
>
> 5. The state dumps, though JSON, are emitted as text files with JSON
> appended to them, making parsing difficult.
>
> To address these issues:
>
> 1. Deprecate existing REST API configuration for frequency, and make it a
> no-op now for security reasons. We can remove it completely in 2.0.
>
> 2. Deprecate the statedumpfilepath option on partitions. Ignore it for
> security reasons now (and warn if found), and remove completely in 2.0.
>
> 3. Disable the feature by default. To enable it, we should require setting
> a specific environment variable:
>    - YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to
> enable the feature at all. Making it an env var makes sense as it is not an
> option that should be reconfigured (or even visible) in configuration.
>
> 4. Via configmap, we should allow the feature to be enabled / disabled and
> its frequency set. These options would have no effect if
> YUNIKORN_STATE_DUMP_LOCATION is not defined:
>   - periodicStateDump.enabled: "true" | "false" (default "false")
>   - periodicStateDump.frequency: "15m" (default value, do not allow more
> frequently than 1m intervals)
>   - periodicStateDump.count: 10 (default value)
>
> 5. Create an empty directory /yunkorn-state in the Docker image to store
> state dumps.
>
> 6. Add support to Helm for enabling state dump support as well as setting
> custom mount options (including quota). Enabling support should set the env
> var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory
> via the options specified.
>
> 7. Output a single json file per dump and remove oldest files until count
> <= periodicStateDump.count entries: yunikorn-state-dump-YYYYMMDD-HHMM.json
>
> </SNIP>
>
> > On Dec 19, 2022, at 6:18 PM, Craig Condit <ccon...@apache.org> wrote:
> >
> > All,
> >
> > I’d like to open a discussion about the future of the periodic state
> dump feature. To jumpstart the discussion, I opened
> https://issues.apache.org/jira/browse/YUNIKORN-1483, which is copied
> below for context. In the process of writing this up, it seems to me that
> we might actually be better off simply removing the feature, and relying
> solely on the REST API to retrieve state dumps on demand.
> >
> > In the current state, periodic state dumps need to be enabled, at which
> point they write to a local filesystem within the YuniKorn scheduler. This
> maps onto ephemeral storage, so to avoid out-of-space scenarios, an
> administrator needs to customize the YK Helm deployment with additional
> resource quota. Additionally, to even access the dumps, the filesystem
> needs to be mounted as a persistent volume and external code written to
> interact with the saved dumps. Given the mixed text-and-json format of
> these dumps, this can be rather complicated.
> >
> > Alternatively, users could simply deploy a cron container which pulls
> the state dump on-demand from the existing REST API. This ends up being
> considerably simpler.
> >
> > Are there objections to removing the existing periodic state dump
> functionality? Existing users who would be impacted greatly? To be clear,
> I’m not proposing removing the state dump itself; the version available via
> the REST API has proven extremely valuable. All that is on the table is
> removal of the automatic, periodic state dump which writes to local files.
> >
> > Looking forward to feedback,
> >
> > Craig
> >
> >
> >
> > ------------------------------------
> > YUNIKORN-1483 write-up:
> >
> > The current support for generating periodic state dumps implemented in
> YUNIKORN-940 <https://issues.apache.org/jira/browse/YUNIKORN-940> has
> several warts:
> > The configuration in YUNIKORN-949 <
> https://issues.apache.org/jira/browse/YUNIKORN-949> is done via the core
> scheduler configuration, leading to a random option on partitions which
> doesn't belong there and has nothing to do with scheduling.
> > Changing the frequency of the state dumps is done via the unsecured REST
> API. This is a potential denial-of-service vector.
> > Configuration V2 is now complete, which standardizes on using a
> ConfigMap to configure all YuniKorn options that make sense to be
> reconfigured. However, allowing the location to be changed at runtime makes
> no sense in a containerized environment.
> > Retrieving the state dumps requires mounting of external storage. This
> is necessarily a site-specific configuration and currently requires a
> custom Helm deployment.
> > The state dumps, though JSON, are emitted as text files with JSON
> appended to them, making parsing difficult.
> > To address these issues:
> > Deprecate existing REST API configuration for frequency, and make it a
> no-op now for security reasons. We can remove it completely in 2.0.
> > Deprecate the statedumpfilepath option on partitions. Ignore it for
> security reasons now (and warn if found), and remove completely in 2.0.
> > Disable the feature by default. To enable it, we should require setting
> a specific environment variable:
> > YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to
> enable the feature at all. Making it an env var makes sense as it is not an
> option that should be reconfigured (or even visible) in configuration.
> > Via configmap, we should allow the feature to be enabled / disabled and
> its frequency set. These options would have no effect if
> YUNIKORN_STATE_DUMP_LOCATION is not defined:
> > periodicStateDump.enabled: "true" | "false" (default "false")
> > periodicStateDump.frequency: "15m" (default value, do not allow more
> frequently than 1m intervals)
> > periodicStateDump.count: 10 (default value)
> > Create an empty directory /yunkorn-state in the Docker image to store
> state dumps.
> > Add support to Helm for enabling state dump support as well as setting
> custom mount options (including quota). Enabling support should set the env
> var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory
> via the options specified.
> > Output a single json file per dump and remove oldest files until count
> <= periodicStateDump.count entries: yunikorn-state-dump-YYYYMMDD-HHMM.json
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> For additional commands, e-mail: dev-h...@yunikorn.apache.org
>
>

Reply via email to