I am in favor of just having the REST option as per YUNIKORN-1500 [1] Volume mounting and maintaining files etc is more involved than setting up a simple job that pulls the detail on a schedule. The code to support writing to a volume properly could become complex as state dumps might grow and handling out of space while still within parameters becomes a complex case.
I have approved the change for YUNIKORN-1500 and will commit soon. Wilfred [1] https://issues.apache.org/jira/browse/YUNIKORN-1500 On Tue, 20 Dec 2022 at 12:29, Weiwei Yang <w...@apache.org> wrote: > +1 to remove this, I don't believe users heavily depend on this, but let's > keep this thread for at least a few days to collect feedback. > > Thanks > Weiwei > > On Mon, Dec 19, 2022 at 4:21 PM Craig Condit <ccon...@apache.org> wrote: > > > Re-adding existing YUNIKORN-1483 text as formatting broke badly. I’m not > > proposing this is the way to go, just referencing the JIRA for > discussion: > > > > > > <SNIP> > > > > The current support for generating periodic state dumps implemented in > > YUNIKORN-940 has several warts: > > > > 1. The configuration in YUNIKORN-949 is done via the core scheduler > > configuration, leading to a random option on partitions which doesn't > > belong there and has nothing to do with scheduling. > > > > 2. Changing the frequency of the state dumps is done via the unsecured > > REST API. This is a potential denial-of-service vector. > > > > 3. Configuration V2 is now complete, which standardizes on using a > > ConfigMap to configure all YuniKorn options that make sense to be > > reconfigured. However, allowing the location to be changed at runtime > makes > > no sense in a containerized environment. > > > > 4. Retrieving the state dumps requires mounting of external storage. This > > is necessarily a site-specific configuration and currently requires a > > custom Helm deployment. > > > > 5. The state dumps, though JSON, are emitted as text files with JSON > > appended to them, making parsing difficult. > > > > To address these issues: > > > > 1. Deprecate existing REST API configuration for frequency, and make it a > > no-op now for security reasons. We can remove it completely in 2.0. > > > > 2. Deprecate the statedumpfilepath option on partitions. Ignore it for > > security reasons now (and warn if found), and remove completely in 2.0. > > > > 3. Disable the feature by default. To enable it, we should require > setting > > a specific environment variable: > > - YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required > to > > enable the feature at all. Making it an env var makes sense as it is not > an > > option that should be reconfigured (or even visible) in configuration. > > > > 4. Via configmap, we should allow the feature to be enabled / disabled > and > > its frequency set. These options would have no effect if > > YUNIKORN_STATE_DUMP_LOCATION is not defined: > > - periodicStateDump.enabled: "true" | "false" (default "false") > > - periodicStateDump.frequency: "15m" (default value, do not allow more > > frequently than 1m intervals) > > - periodicStateDump.count: 10 (default value) > > > > 5. Create an empty directory /yunkorn-state in the Docker image to store > > state dumps. > > > > 6. Add support to Helm for enabling state dump support as well as setting > > custom mount options (including quota). Enabling support should set the > env > > var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory > > via the options specified. > > > > 7. Output a single json file per dump and remove oldest files until count > > <= periodicStateDump.count entries: > yunikorn-state-dump-YYYYMMDD-HHMM.json > > > > </SNIP> > > > > > On Dec 19, 2022, at 6:18 PM, Craig Condit <ccon...@apache.org> wrote: > > > > > > All, > > > > > > I’d like to open a discussion about the future of the periodic state > > dump feature. To jumpstart the discussion, I opened > > https://issues.apache.org/jira/browse/YUNIKORN-1483, which is copied > > below for context. In the process of writing this up, it seems to me that > > we might actually be better off simply removing the feature, and relying > > solely on the REST API to retrieve state dumps on demand. > > > > > > In the current state, periodic state dumps need to be enabled, at which > > point they write to a local filesystem within the YuniKorn scheduler. > This > > maps onto ephemeral storage, so to avoid out-of-space scenarios, an > > administrator needs to customize the YK Helm deployment with additional > > resource quota. Additionally, to even access the dumps, the filesystem > > needs to be mounted as a persistent volume and external code written to > > interact with the saved dumps. Given the mixed text-and-json format of > > these dumps, this can be rather complicated. > > > > > > Alternatively, users could simply deploy a cron container which pulls > > the state dump on-demand from the existing REST API. This ends up being > > considerably simpler. > > > > > > Are there objections to removing the existing periodic state dump > > functionality? Existing users who would be impacted greatly? To be clear, > > I’m not proposing removing the state dump itself; the version available > via > > the REST API has proven extremely valuable. All that is on the table is > > removal of the automatic, periodic state dump which writes to local > files. > > > > > > Looking forward to feedback, > > > > > > Craig > > > > > > > > > > > > ------------------------------------ > > > YUNIKORN-1483 write-up: > > > > > > The current support for generating periodic state dumps implemented in > > YUNIKORN-940 <https://issues.apache.org/jira/browse/YUNIKORN-940> has > > several warts: > > > The configuration in YUNIKORN-949 < > > https://issues.apache.org/jira/browse/YUNIKORN-949> is done via the core > > scheduler configuration, leading to a random option on partitions which > > doesn't belong there and has nothing to do with scheduling. > > > Changing the frequency of the state dumps is done via the unsecured > REST > > API. This is a potential denial-of-service vector. > > > Configuration V2 is now complete, which standardizes on using a > > ConfigMap to configure all YuniKorn options that make sense to be > > reconfigured. However, allowing the location to be changed at runtime > makes > > no sense in a containerized environment. > > > Retrieving the state dumps requires mounting of external storage. This > > is necessarily a site-specific configuration and currently requires a > > custom Helm deployment. > > > The state dumps, though JSON, are emitted as text files with JSON > > appended to them, making parsing difficult. > > > To address these issues: > > > Deprecate existing REST API configuration for frequency, and make it a > > no-op now for security reasons. We can remove it completely in 2.0. > > > Deprecate the statedumpfilepath option on partitions. Ignore it for > > security reasons now (and warn if found), and remove completely in 2.0. > > > Disable the feature by default. To enable it, we should require setting > > a specific environment variable: > > > YUNIKORN_STATE_DUMP_LOCATION=/path/to/dir : This would be required to > > enable the feature at all. Making it an env var makes sense as it is not > an > > option that should be reconfigured (or even visible) in configuration. > > > Via configmap, we should allow the feature to be enabled / disabled and > > its frequency set. These options would have no effect if > > YUNIKORN_STATE_DUMP_LOCATION is not defined: > > > periodicStateDump.enabled: "true" | "false" (default "false") > > > periodicStateDump.frequency: "15m" (default value, do not allow more > > frequently than 1m intervals) > > > periodicStateDump.count: 10 (default value) > > > Create an empty directory /yunkorn-state in the Docker image to store > > state dumps. > > > Add support to Helm for enabling state dump support as well as setting > > custom mount options (including quota). Enabling support should set the > env > > var YUNIKORN_STATE_DUMP_LOCATION=/yunikorn-state and mount this directory > > via the options specified. > > > Output a single json file per dump and remove oldest files until count > > <= periodicStateDump.count entries: > yunikorn-state-dump-YYYYMMDD-HHMM.json > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org > > For additional commands, e-mail: dev-h...@yunikorn.apache.org > > > > >