+1. 2020년 2월 13일 (목) 오전 9:30, Gengliang Wang <gengliang.w...@databricks.com>님이 작성:
> +1, this is really helpful. We should make the SQL configurations > consistent and more readable. > > On Wed, Feb 12, 2020 at 3:33 PM Rubén Berenguel <rbereng...@gmail.com> > wrote: > >> I love it, it will make configs easier to read and write. Thanks Wenchen. >> >> R >> >> On 13 Feb 2020, at 00:15, Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: >> >> >> Thank you, Wenchen. >> >> The new policy looks clear to me. +1 for the explicit policy. >> >> So, are we going to revise the existing conf names before 3.0.0 release? >> >> Or, is it applied to new up-coming configurations from now? >> >> Bests, >> Dongjoon. >> >> On Wed, Feb 12, 2020 at 7:43 AM Wenchen Fan <cloud0...@gmail.com> wrote: >> >>> Hi all, >>> >>> I'd like to discuss the naming policy of Spark configs, as for now it >>> depends on personal preference which leads to inconsistent namings. >>> >>> In general, the config name should be a noun that describes its meaning >>> clearly. >>> Good examples: >>> spark.sql.session.timeZone >>> spark.sql.streaming.continuous.executorQueueSize >>> spark.sql.statistics.histogram.numBins >>> Bad examples: >>> spark.sql.defaultSizeInBytes (default size for what?) >>> >>> Also note that, config name has many parts, joined by dots. Each part is >>> a namespace. Don't create namespace unnecessarily. >>> Good example: >>> spark.sql.execution.rangeExchange.sampleSizePerPartition >>> spark.sql.execution.arrow.maxRecordsPerBatch >>> Bad examples: >>> spark.sql.windowExec.buffer.in.memory.threshold ("in" is not a useful >>> namespace, better to be .buffer.inMemoryThreshold) >>> >>> For a big feature, usually we need to create an umbrella config to turn >>> it on/off, and other configs for fine-grained controls. These configs >>> should share the same namespace, and the umbrella config should be named >>> like featureName.enabled. For example: >>> spark.sql.cbo.enabled >>> spark.sql.cbo.starSchemaDetection >>> spark.sql.cbo.starJoinFTRatio >>> spark.sql.cbo.joinReorder.enabled >>> spark.sql.cbo.joinReorder.dp.threshold (BTW "dp" is not a good >>> namespace) >>> spark.sql.cbo.joinReorder.card.weight (BTW "card" is not a good >>> namespace) >>> >>> For boolean configs, in general it should end with a verb, e.g. >>> spark.sql.join.preferSortMergeJoin. If the config is for a feature and >>> you can't find a good verb for the feature, featureName.enabled is also >>> good. >>> >>> I'll update https://spark.apache.org/contributing.html after we reach a >>> consensus here. Any comments are welcome! >>> >>> Thanks, >>> Wenchen >>> >>> >>>