Hi all, I'd like to discuss the naming policy of Spark configs, as for now it depends on personal preference which leads to inconsistent namings.
In general, the config name should be a noun that describes its meaning clearly. Good examples: spark.sql.session.timeZone spark.sql.streaming.continuous.executorQueueSize spark.sql.statistics.histogram.numBins Bad examples: spark.sql.defaultSizeInBytes (default size for what?) Also note that, config name has many parts, joined by dots. Each part is a namespace. Don't create namespace unnecessarily. Good example: spark.sql.execution.rangeExchange.sampleSizePerPartition spark.sql.execution.arrow.maxRecordsPerBatch Bad examples: spark.sql.windowExec.buffer.in.memory.threshold ("in" is not a useful namespace, better to be .buffer.inMemoryThreshold) For a big feature, usually we need to create an umbrella config to turn it on/off, and other configs for fine-grained controls. These configs should share the same namespace, and the umbrella config should be named like featureName.enabled. For example: spark.sql.cbo.enabled spark.sql.cbo.starSchemaDetection spark.sql.cbo.starJoinFTRatio spark.sql.cbo.joinReorder.enabled spark.sql.cbo.joinReorder.dp.threshold (BTW "dp" is not a good namespace) spark.sql.cbo.joinReorder.card.weight (BTW "card" is not a good namespace) For boolean configs, in general it should end with a verb, e.g. spark.sql.join.preferSortMergeJoin. If the config is for a feature and you can't find a good verb for the feature, featureName.enabled is also good. I'll update https://spark.apache.org/contributing.html after we reach a consensus here. Any comments are welcome! Thanks, Wenchen