+1 to remove this, I don't believe users heavily depend on this, but let's keep this thread for at least a few days to collect feedback.
Thanks Weiwei On Mon, Dec 19, 2022 at 4:21 PM Craig Condit <ccon...@apache.org> wrote: > Re-adding existing YUNIKORN-1483 text as formatting broke badly. I’m not > proposing this is the way to go, just referencing the JIRA for discussion: > > > <SNIP> > > The current support for generating periodic state dumps implemented in > YUNIKORN-940 has several warts: > > 1. The configuration in YUNIKORN-949 is done via the core scheduler > configuration, leading to a random option on partitions which doesn't > belong there and has nothing to do with scheduling. > > 2. Changing the frequency of the state dumps is done via the unsecured > REST API. This is a potential denial-of-service vector. > > 3. Configuration V2 is now complete, which standardizes on using a > ConfigMap to configure all YuniKorn options that make sense to be > reconfigured. However, allowing the location to be changed at runtime makes > no sense in a containerized environment. > > 4. Retrieving the state dumps requires mounting of external storage. This > is necessarily a site-specific configuration and currently requires a > custom Helm deployment. > > 5. The state dumps, though JSON, are emitted as text files with JSON > appended to them, making parsing difficult. > > To address these issues: > > 1. Deprecate existing REST API configuration for frequency, and make it a > no-op now for security reasons. We can remove it completely in 2.0. > > 2. Deprecate the statedumpfilepath option on partitions. Ignore it for > security reasons now (and warn if found), and remove completely in 2.0. > > 3. Disable the feature by default. To enable it, we should require setting > a specific environment variable: > - YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to > enable the feature at all. Making it an env var makes sense as it is not an > option that should be reconfigured (or even visible) in configuration. > > 4. Via configmap, we should allow the feature to be enabled / disabled and > its frequency set. These options would have no effect if > YUNIKORN_STATE_DUMP_LOCATION is not defined: > - periodicStateDump.enabled: "true" | "false" (default "false") > - periodicStateDump.frequency: "15m" (default value, do not allow more > frequently than 1m intervals) > - periodicStateDump.count: 10 (default value) > > 5. Create an empty directory /yunkorn-state in the Docker image to store > state dumps. > > 6. Add support to Helm for enabling state dump support as well as setting > custom mount options (including quota). Enabling support should set the env > var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory > via the options specified. > > 7. Output a single json file per dump and remove oldest files until count > <= periodicStateDump.count entries: yunikorn-state-dump-YYYYMMDD-HHMM.json > > </SNIP> > > > On Dec 19, 2022, at 6:18 PM, Craig Condit <ccon...@apache.org> wrote: > > > > All, > > > > I’d like to open a discussion about the future of the periodic state > dump feature. To jumpstart the discussion, I opened > https://issues.apache.org/jira/browse/YUNIKORN-1483, which is copied > below for context. In the process of writing this up, it seems to me that > we might actually be better off simply removing the feature, and relying > solely on the REST API to retrieve state dumps on demand. > > > > In the current state, periodic state dumps need to be enabled, at which > point they write to a local filesystem within the YuniKorn scheduler. This > maps onto ephemeral storage, so to avoid out-of-space scenarios, an > administrator needs to customize the YK Helm deployment with additional > resource quota. Additionally, to even access the dumps, the filesystem > needs to be mounted as a persistent volume and external code written to > interact with the saved dumps. Given the mixed text-and-json format of > these dumps, this can be rather complicated. > > > > Alternatively, users could simply deploy a cron container which pulls > the state dump on-demand from the existing REST API. This ends up being > considerably simpler. > > > > Are there objections to removing the existing periodic state dump > functionality? Existing users who would be impacted greatly? To be clear, > I’m not proposing removing the state dump itself; the version available via > the REST API has proven extremely valuable. All that is on the table is > removal of the automatic, periodic state dump which writes to local files. > > > > Looking forward to feedback, > > > > Craig > > > > > > > > ------------------------------------ > > YUNIKORN-1483 write-up: > > > > The current support for generating periodic state dumps implemented in > YUNIKORN-940 <https://issues.apache.org/jira/browse/YUNIKORN-940> has > several warts: > > The configuration in YUNIKORN-949 < > https://issues.apache.org/jira/browse/YUNIKORN-949> is done via the core > scheduler configuration, leading to a random option on partitions which > doesn't belong there and has nothing to do with scheduling. > > Changing the frequency of the state dumps is done via the unsecured REST > API. This is a potential denial-of-service vector. > > Configuration V2 is now complete, which standardizes on using a > ConfigMap to configure all YuniKorn options that make sense to be > reconfigured. However, allowing the location to be changed at runtime makes > no sense in a containerized environment. > > Retrieving the state dumps requires mounting of external storage. This > is necessarily a site-specific configuration and currently requires a > custom Helm deployment. > > The state dumps, though JSON, are emitted as text files with JSON > appended to them, making parsing difficult. > > To address these issues: > > Deprecate existing REST API configuration for frequency, and make it a > no-op now for security reasons. We can remove it completely in 2.0. > > Deprecate the statedumpfilepath option on partitions. Ignore it for > security reasons now (and warn if found), and remove completely in 2.0. > > Disable the feature by default. To enable it, we should require setting > a specific environment variable: > > YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to > enable the feature at all. Making it an env var makes sense as it is not an > option that should be reconfigured (or even visible) in configuration. > > Via configmap, we should allow the feature to be enabled / disabled and > its frequency set. These options would have no effect if > YUNIKORN_STATE_DUMP_LOCATION is not defined: > > periodicStateDump.enabled: "true" | "false" (default "false") > > periodicStateDump.frequency: "15m" (default value, do not allow more > frequently than 1m intervals) > > periodicStateDump.count: 10 (default value) > > Create an empty directory /yunkorn-state in the Docker image to store > state dumps. > > Add support to Helm for enabling state dump support as well as setting > custom mount options (including quota). Enabling support should set the env > var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory > via the options specified. > > Output a single json file per dump and remove oldest files until count > <= periodicStateDump.count entries: yunikorn-state-dump-YYYYMMDD-HHMM.json > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org > For additional commands, e-mail: dev-h...@yunikorn.apache.org > >