I am in favor of just having the REST option as per YUNIKORN-1500 [1]
Volume mounting and maintaining files etc is more involved than setting up
a simple job that pulls the detail on a schedule.
The code to support writing to a volume properly could become complex as
state dumps might grow and handling out of space while still within
parameters becomes a complex case.

I have approved the change for YUNIKORN-1500 and will commit soon.

Wilfred

[1] https://issues.apache.org/jira/browse/YUNIKORN-1500

On Tue, 20 Dec 2022 at 12:29, Weiwei Yang <w...@apache.org> wrote:

> +1 to remove this, I don't believe users heavily depend on this, but let's
> keep this thread for at least a few days to collect feedback.
>
> Thanks
> Weiwei
>
> On Mon, Dec 19, 2022 at 4:21 PM Craig Condit <ccon...@apache.org> wrote:
>
> > Re-adding existing YUNIKORN-1483 text as formatting broke badly. I’m not
> > proposing this is the way to go, just referencing the JIRA for
> discussion:
> >
> >
> > <SNIP>
> >
> > The current support for generating periodic state dumps implemented in
> > YUNIKORN-940 has several warts:
> >
> > 1. The configuration in YUNIKORN-949 is done via the core scheduler
> > configuration, leading to a random option on partitions which doesn't
> > belong there and has nothing to do with scheduling.
> >
> > 2. Changing the frequency of the state dumps is done via the unsecured
> > REST API. This is a potential denial-of-service vector.
> >
> > 3. Configuration V2 is now complete, which standardizes on using a
> > ConfigMap to configure all YuniKorn options that make sense to be
> > reconfigured. However, allowing the location to be changed at runtime
> makes
> > no sense in a containerized environment.
> >
> > 4. Retrieving the state dumps requires mounting of external storage. This
> > is necessarily a site-specific configuration and currently requires a
> > custom Helm deployment.
> >
> > 5. The state dumps, though JSON, are emitted as text files with JSON
> > appended to them, making parsing difficult.
> >
> > To address these issues:
> >
> > 1. Deprecate existing REST API configuration for frequency, and make it a
> > no-op now for security reasons. We can remove it completely in 2.0.
> >
> > 2. Deprecate the statedumpfilepath option on partitions. Ignore it for
> > security reasons now (and warn if found), and remove completely in 2.0.
> >
> > 3. Disable the feature by default. To enable it, we should require
> setting
> > a specific environment variable:
> >    - YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required
> to
> > enable the feature at all. Making it an env var makes sense as it is not
> an
> > option that should be reconfigured (or even visible) in configuration.
> >
> > 4. Via configmap, we should allow the feature to be enabled / disabled
> and
> > its frequency set. These options would have no effect if
> > YUNIKORN_STATE_DUMP_LOCATION is not defined:
> >   - periodicStateDump.enabled: "true" | "false" (default "false")
> >   - periodicStateDump.frequency: "15m" (default value, do not allow more
> > frequently than 1m intervals)
> >   - periodicStateDump.count: 10 (default value)
> >
> > 5. Create an empty directory /yunkorn-state in the Docker image to store
> > state dumps.
> >
> > 6. Add support to Helm for enabling state dump support as well as setting
> > custom mount options (including quota). Enabling support should set the
> env
> > var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory
> > via the options specified.
> >
> > 7. Output a single json file per dump and remove oldest files until count
> > <= periodicStateDump.count entries:
> yunikorn-state-dump-YYYYMMDD-HHMM.json
> >
> > </SNIP>
> >
> > > On Dec 19, 2022, at 6:18 PM, Craig Condit <ccon...@apache.org> wrote:
> > >
> > > All,
> > >
> > > I’d like to open a discussion about the future of the periodic state
> > dump feature. To jumpstart the discussion, I opened
> > https://issues.apache.org/jira/browse/YUNIKORN-1483, which is copied
> > below for context. In the process of writing this up, it seems to me that
> > we might actually be better off simply removing the feature, and relying
> > solely on the REST API to retrieve state dumps on demand.
> > >
> > > In the current state, periodic state dumps need to be enabled, at which
> > point they write to a local filesystem within the YuniKorn scheduler.
> This
> > maps onto ephemeral storage, so to avoid out-of-space scenarios, an
> > administrator needs to customize the YK Helm deployment with additional
> > resource quota. Additionally, to even access the dumps, the filesystem
> > needs to be mounted as a persistent volume and external code written to
> > interact with the saved dumps. Given the mixed text-and-json format of
> > these dumps, this can be rather complicated.
> > >
> > > Alternatively, users could simply deploy a cron container which pulls
> > the state dump on-demand from the existing REST API. This ends up being
> > considerably simpler.
> > >
> > > Are there objections to removing the existing periodic state dump
> > functionality? Existing users who would be impacted greatly? To be clear,
> > I’m not proposing removing the state dump itself; the version available
> via
> > the REST API has proven extremely valuable. All that is on the table is
> > removal of the automatic, periodic state dump which writes to local
> files.
> > >
> > > Looking forward to feedback,
> > >
> > > Craig
> > >
> > >
> > >
> > > ------------------------------------
> > > YUNIKORN-1483 write-up:
> > >
> > > The current support for generating periodic state dumps implemented in
> > YUNIKORN-940 <https://issues.apache.org/jira/browse/YUNIKORN-940> has
> > several warts:
> > > The configuration in YUNIKORN-949 <
> > https://issues.apache.org/jira/browse/YUNIKORN-949> is done via the core
> > scheduler configuration, leading to a random option on partitions which
> > doesn't belong there and has nothing to do with scheduling.
> > > Changing the frequency of the state dumps is done via the unsecured
> REST
> > API. This is a potential denial-of-service vector.
> > > Configuration V2 is now complete, which standardizes on using a
> > ConfigMap to configure all YuniKorn options that make sense to be
> > reconfigured. However, allowing the location to be changed at runtime
> makes
> > no sense in a containerized environment.
> > > Retrieving the state dumps requires mounting of external storage. This
> > is necessarily a site-specific configuration and currently requires a
> > custom Helm deployment.
> > > The state dumps, though JSON, are emitted as text files with JSON
> > appended to them, making parsing difficult.
> > > To address these issues:
> > > Deprecate existing REST API configuration for frequency, and make it a
> > no-op now for security reasons. We can remove it completely in 2.0.
> > > Deprecate the statedumpfilepath option on partitions. Ignore it for
> > security reasons now (and warn if found), and remove completely in 2.0.
> > > Disable the feature by default. To enable it, we should require setting
> > a specific environment variable:
> > > YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to
> > enable the feature at all. Making it an env var makes sense as it is not
> an
> > option that should be reconfigured (or even visible) in configuration.
> > > Via configmap, we should allow the feature to be enabled / disabled and
> > its frequency set. These options would have no effect if
> > YUNIKORN_STATE_DUMP_LOCATION is not defined:
> > > periodicStateDump.enabled: "true" | "false" (default "false")
> > > periodicStateDump.frequency: "15m" (default value, do not allow more
> > frequently than 1m intervals)
> > > periodicStateDump.count: 10 (default value)
> > > Create an empty directory /yunkorn-state in the Docker image to store
> > state dumps.
> > > Add support to Helm for enabling state dump support as well as setting
> > custom mount options (including quota). Enabling support should set the
> env
> > var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory
> > via the options specified.
> > > Output a single json file per dump and remove oldest files until count
> > <= periodicStateDump.count entries:
> yunikorn-state-dump-YYYYMMDD-HHMM.json
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
> > For additional commands, e-mail: dev-h...@yunikorn.apache.org
> >
> >
>

Reply via email to