I've been working on this ticket. I have a JUnit test (ConfigYamlCoverageTest) with forward and reverse checks, and an @HiddenInYaml annotation ready to go. I've triaged all 131 Config.java fields that are missing from the yaml files and have decision suggestions for most of them.
Before I share the full spreadsheet, I'd like to gauge whether we can handle some groups at the group level rather than property-by-property: - *Deprecated fields (13 fields):* back_pressure_*, otc_coalescing_*, otc_backlog_*, windows_timer_interval, max_streaming_retries, repair_session_max_tree_depth, scripted_user_defined_functions_enabled, use_deterministic_table_id, cms_default_retry_backoff, cms_default_max_retry_backoff. These are all @Deprecated in Config.java. The test already skips them automatically, but should they also be annotated @HiddenInYaml for clarity? - *Paxos v2 (16 fields):* All the paxos_* and skip_paxos_repair_* fields from CEP-14. Internal LWT protocol tuning. - *TCM/CMS (12 fields):* CMS timeouts/retries (cms_*), metadata snapshotting, discovery timeout, unsafe_tcm_mode, progress barrier fields (progress_barrier_*), and short_rpc_timeout. All internal CEP-21 infrastructure. - *Accord (3 fields):* accord_preaccept_timeout, concurrent_accord_operations, consensus_migration_cache_size. CEP-15 internals. Still maturing. Does anyone see a reason any of these should be exposed in cassandra.yaml rather than marked @HiddenInYaml? If we can agree on these groups, it reduces the remaining discussion to about 13 individual fields which is much more manageable. On Tue, 4 Mar 2025 at 09:31, Dmitry Konstantinov <[email protected]> wrote: > >> > https://docs.google.com/spreadsheets/d/11MOxhNqwE1tWP4ex2gzKG2pmeAWFaHDKo-CRp25h9BU/edit?gid=0#gid=0 > We still have a lot of rows empty. I have added many default values and a > Cassandra version when a parameter was introduced (to differentiate some > recent parameters from old ones) based on source code but it would be nice > to get a description for parameters from the authors as well as > classification exposed/hidden. > Maybe we should not wait for collecting info about all parameters and > update what we have + use a threshold in the Ant validation task to fail > when new missed parameters are added. The logic in the dev branch here > https://issues.apache.org/jira/browse/CASSANDRA-20249 already supports a > threshold. > > > On Tue, 28 Jan 2025 at 00:04, Josh McKenzie <[email protected]> wrote: > >> Good point re: the implications of parsing and durability in the face of >> seeing unknown or missing parameters. I don't think widening the scope on >> that would be ideal, especially considering the entire impetus for this >> conversation is "we've misbehaved with our config and have a bunch of >> undocumented stuff we're not sure is still useful, or what it's for". =/ >> >> On Mon, Jan 27, 2025, at 3:41 PM, Štefan Miklošovič wrote: >> >> "we take "unclaimed" items and move them to their own InternalConfig.java >> or something" >> >> This is interesting. If we are meant to be still able to put these >> properties into cassandra.yaml (even they are "internal ones") and they >> would be just in InternalConfig.java for some basic separation of internal >> / user-facing configuration, then we would need to have two yaml loaders: >> >> 1) the one as we have now which loads cassandra.yaml it into Config.java >> 2) the second one which would load cassandra.yaml into InternalConfig.java >> >> For both cases, we could not fail when there are unrecognized properties >> in cassandra.yaml while parsing it (1), because every loader, for >> Config.java as well as InternalConfig.java, is parsing just some "subset" >> of yaml. >> >> (1) >> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java#L443-L444 >> >> If we just had "public InternalConfig internal = new InternalConfig" as a >> field in Config.java, then this would lead to properties being effectively >> renamed in cassandra.yaml like >> >> internal: >> some_currently_internal_property: false >> >> instead of just >> >> some_currently_internal_property: false >> >> I do not think we want to have them renamed / under different >> configuration sections in yaml. I get that they are "internal" etc but we >> just don't know how / where it is used and deployed and just blindly >> renaming them is not a good idea imho. >> >> On Mon, Jan 27, 2025 at 8:46 PM Josh McKenzie <[email protected]> >> wrote: >> >> >> This may be an off-base comparison, but this reminds me of struggles >> we've had getting to 0 failing unit tests before and the debates on fencing >> off a snapshot of the current "failure set" so you can have a set point >> where no further degradation is allowed in a primary data set. >> >> All of which is to say - maybe at the end of the spreadsheet, we take >> "unclaimed" items and move them to their own InternalConfig.java or >> something and add an ant target that a) disallows further addition to >> InternalConfig.java w/out throwing an error / needing whitelist update, and >> b) disallows further regression in the Config.java <-> cassandra.yaml >> relationship for non-annotated fields. >> >> That way we can at least halt the progression of the disease even if >> we're stymied on cleaning up some of the existing symptoms. >> >> On Mon, Jan 27, 2025, at 1:38 PM, Štefan Miklošovič wrote: >> >> Indeed, we need to balance that and thoughtfully choose what is going to >> be added and what not. However, we should not hide something which is meant >> to be tweaked by a user. The config is intimidating mostly because >> everything is just in one file. I merely remember discussions a few years >> ago which were about splitting cassandra.yaml into multiple files which >> would be focused just on one subsystem / would cover some logically >> isolated domain. >> >> Anyway, I think the main goal of this effort for now would be to at least >> map where we are at. Some of them are genuinely missing. E.g. guardrails, >> how is a user meant to know about that if it is not even documented ... >> >> On Mon, Jan 27, 2025 at 6:16 PM Chris lohfink <[email protected]> wrote: >> >> Might be a bit of a balance between exposing what people actually are >> likely to need to modify vs having a super intimidating config file. It's >> already nearly 2000 lines. Personally I'd rather see some >> auto-documentation or something that's in the docs >> <https://cassandra.apache.org/doc/latest/cassandra/managing/configuration/cass_yaml_file.html> >> than an effort to manually add another 1000 lines. >> >> Chris >> >> On Fri, Jan 24, 2025 at 9:41 AM Dmitry Konstantinov <[email protected]> >> wrote: >> >> Maybe I missed some patterns but it looks like a pretty good estimation, >> I did like 10 random checks manually to verify :-) >> I will try to make an ant target with a similar logic (hopefully, during >> the weekend) >> I will create a ticket to track this activity (to share attachments there >> to not overload the thread with such outputs in future). >> >> On Fri, 24 Jan 2025 at 15:37, Štefan Miklošovič <[email protected]> >> wrote: >> >> Oh my god, 112? :DD I was thinking it would be less than 10. >> >> Anyway, I think we need to integrate this to some ant target. If you >> expanded on this, that would be great. >> >> On Fri, Jan 24, 2025 at 4:31 PM Dmitry Konstantinov <[email protected]> >> wrote: >> >> A very primitive implementation of the 1st idea below: >> >> String configUrl = >> "file:///Users/dmitry/IdeaProjects/cassandra-trunk/conf/cassandra.yaml"; >> Field[] allFields = Config.class.getFields(); >> List<String> topLevelPropertyNames = new ArrayList<>(); >> for(Field field : allFields) >> { >> if (!Modifier.*isStatic*(field.getModifiers())) >> { >> topLevelPropertyNames.add(field.getName()); >> } >> } >> >> URL url = new URL(configUrl); >> List<String> lines = Files.*readAllLines*(Paths.*get*(url.toURI())); >> >> int missedCount = 0; >> for (String propertyName : topLevelPropertyNames) >> { >> boolean found = false; >> for (String line : lines) >> { >> if (line.startsWith(propertyName + ":") >> || line.startsWith("#" + propertyName + ":") >> || line.startsWith("# " + propertyName + ":")) { >> found = true; >> break; >> } >> } >> if (!found) >> { >> missedCount++; >> System.*out*.println(propertyName); >> } >> } >> System.*out*.println("Total missed:" + missedCount); >> >> >> It prints the following config property names which are defined in >> Config.java but not present as "property" or "# property " in a file: >> >> permissions_cache_max_entries >> roles_cache_max_entries >> credentials_cache_max_entries >> auto_bootstrap >> force_new_prepared_statement_behaviour >> use_deterministic_table_id >> repair_request_timeout >> stream_transfer_task_timeout >> cms_await_timeout >> cms_default_max_retries >> cms_default_retry_backoff >> epoch_aware_debounce_inflight_tracker_max_size >> metadata_snapshot_frequency >> available_processors >> repair_session_max_tree_depth >> use_offheap_merkle_trees >> internode_max_message_size >> native_transport_max_message_size >> native_transport_max_request_data_in_flight_per_ip >> native_transport_max_request_data_in_flight >> native_transport_receive_queue_capacity >> min_free_space_per_drive >> max_space_usable_for_compactions_in_percentage >> reject_repair_compaction_threshold >> concurrent_index_builders >> max_streaming_retries >> commitlog_max_compression_buffers_in_pool >> max_mutation_size >> dynamic_snitch >> failure_detector >> use_creation_time_for_hint_ttl >> key_cache_migrate_during_compaction >> key_cache_invalidate_after_sstable_deletion >> paxos_cache_size >> file_cache_round_up >> disk_optimization_estimate_percentile >> disk_optimization_page_cross_chance >> purgeable_tobmstones_metric_granularity >> windows_timer_interval >> otc_coalescing_strategy >> otc_coalescing_window_us >> otc_coalescing_enough_coalesced_messages >> otc_backlog_expiration_interval_ms >> scripted_user_defined_functions_enabled >> user_defined_functions_threads_enabled >> allow_insecure_udfs >> allow_extra_insecure_udfs >> user_defined_functions_warn_timeout >> user_defined_functions_fail_timeout >> user_function_timeout_policy >> back_pressure_enabled >> back_pressure_strategy >> repair_command_pool_full_strategy >> repair_command_pool_size >> block_for_peers_timeout_in_secs >> block_for_peers_in_remote_dcs >> skip_stream_disk_space_check >> snapshot_on_repaired_data_mismatch >> validation_preview_purge_head_start >> initial_range_tombstone_list_allocation_size >> range_tombstone_list_growth_factor >> snapshot_on_duplicate_row_detection >> check_for_duplicate_rows_during_reads >> check_for_duplicate_rows_during_compaction >> autocompaction_on_startup_enabled >> auto_optimise_inc_repair_streams >> auto_optimise_full_repair_streams >> auto_optimise_preview_repair_streams >> consecutive_message_errors_threshold >> internode_error_reporting_exclusions >> compact_tables_enabled >> vector_type_enabled >> intersect_filtering_query_warned >> intersect_filtering_query_enabled >> streaming_slow_events_log_timeout >> repair_state_expires >> repair_state_size >> paxos_variant >> skip_paxos_repair_on_topology_change >> paxos_purge_grace_period >> paxos_on_linearizability_violations >> paxos_state_purging >> paxos_repair_enabled >> paxos_topology_repair_no_dc_checks >> paxos_topology_repair_strict_each_quorum >> skip_paxos_repair_on_topology_change_keyspaces >> paxos_contention_wait_randomizer >> paxos_contention_min_wait >> paxos_contention_max_wait >> paxos_contention_min_delta >> paxos_repair_parallelism >> sstable_read_rate_persistence_enabled >> client_request_size_metrics_enabled >> max_top_size_partition_count >> max_top_tombstone_partition_count >> min_tracked_partition_size >> min_tracked_partition_tombstone_count >> top_partitions_enabled >> severity_during_decommission >> progress_barrier_min_consistency_level >> progress_barrier_default_consistency_level >> progress_barrier_timeout >> progress_barrier_backoff >> discovery_timeout >> unsafe_tcm_mode >> cql_start_time >> native_transport_throw_on_overload >> native_transport_queue_max_item_age_threshold >> native_transport_min_backoff_on_queue_overload >> native_transport_max_backoff_on_queue_overload >> native_transport_timeout >> enforce_native_deadline_for_hints >> Total missed:112 >> >> >> >> On Fri, 24 Jan 2025 at 15:10, Štefan Miklošovič <[email protected]> >> wrote: >> >> It should also work the other way around. If there is a property which is >> commented out in yaml and it is not in Config.java, that should fail as >> well. If it is not commented out and it is not in Config.java, that will >> fail in runtime as it fails on unrecognized property. >> >> This will be used in practice very rarely as we seldom remove the >> properties in Config but if we do and a property is commented out, we >> should not ship a dead property name, even commented out. >> >> On Fri, Jan 24, 2025 at 3:51 PM Paulo Motta <[email protected]> wrote: >> >> > > If "# my_cool_property: true" is NOT in cassandra.yaml, we might >> indeed add it, also commented out. I think it would be quite easy to check >> against yaml if there is a line starting on "# my_cool_property" or just on >> "my_cool_property". Both cases would satisfy the check. >> >> Makes sense, I think this would be good to have as a lint or test to >> easily catch overlooks during review. >> >> On Fri, Jan 24, 2025 at 9:44 AM Štefan Miklošovič <[email protected]> >> wrote: >> >> >> >> On Fri, Jan 24, 2025 at 3:27 PM Paulo Motta <[email protected]> wrote: >> >> > from time to time I see configuration properties in Config.java and >> they are clearly not in cassandra.yaml. Not every property in Config is in >> cassandra.yaml. I would like to know if there is some specific reason >> behind that. >> >> I think one of the original reasons was to "hide" advanced configs that >> are not meant to be updated, unless in very niche circumstances. However I >> think this has been extrapolated to non-advanced settings. >> >> > Question related to that is if we could not have a build-time check >> that all properties in Config have to be in cassandra.yaml and fail the >> build if a property in Config does not have its counterpart in yaml. >> >> Are you saying every configuration property should be commented-out, or >> do you think that every Config property should be specified in >> cassandra.yaml with their default uncomented ? One issue with that is that >> you could cause user confusion if you "reveal" a niche/advanced config that >> is not meant to be updated. I think this would be addressed by >> the @HiddenInYaml flag you are proposing in a later post. >> >> >> Yes, then can stay hidden, but we should annotate it with @Hidden or >> similar. As of now, if that property is not in yaml, we just don't know if >> it was forgotten to be added or if we have not added it on purpose. >> >> They can keep being commented out if they currently are. Imagine a >> property in Config.java >> >> public boolean my_cool_property = true; >> >> and then this in cassandra.yaml >> >> # my_cool_property: true >> >> It is completely ok. >> >> If "# my_cool_property: true" is NOT in cassandra.yaml, we might indeed >> add it, also commented out. I think it would be quite easy to check against >> yaml if there is a line starting on "# my_cool_property" or just on >> "my_cool_property". Both cases would satisfy the check. >> >> >> >> > There are dozens of properties in Config and I have a strong suspicion >> that we missed to publish some to yaml so users do not even know such a >> property exists and as of now we do not even know which they are. >> >> I believe this is a problem. I think most properties should be in >> cassandra.yaml, unless they are very advanced or not meant to be updated. >> >> Another tangential issue is that there are features/settings that don't >> even have a Config entry, but are just controlled by JVM properties. >> >> I think that we should attempt to unify Config and jvm properties under a >> predictable structure. For example, if there is a YAML config >> enable_user_defined_functions, then there should be a respective JVM flag >> -Dcassandra.enable_user_defined_functions, and vice versa. >> >> >> Yeah, good idea. >> >> >> >> On Fri, Jan 24, 2025 at 9:16 AM Štefan Miklošovič <[email protected]> >> wrote: >> >> Hello, >> >> from time to time I see configuration properties in Config.java and they >> are clearly not in cassandra.yaml. Not every property in Config is in >> cassandra.yaml. I would like to know if there is some specific reason >> behind that. >> >> Question related to that is if we could not have a build-time check that >> all properties in Config have to be in cassandra.yaml and fail the build if >> a property in Config does not have its counterpart in yaml. >> >> There are dozens of properties in Config and I have a strong suspicion >> that we missed to publish some to yaml so users do not even know such a >> property exists and as of now we do not even know which they are. >> >> >> >> -- >> Dmitry Konstantinov >> >> >> >> -- >> Dmitry Konstantinov >> >> >> >> > > -- > Dmitry Konstantinov >
