I think we should aim to have the same behavior across properties that are
set in SQL conf, table config, and write options. Having SQL conf override
table config for this doesn't make sense to me. If the need is to override
table configuration, then write options are the right way to do it.

On Wed, Jul 26, 2023 at 10:10 AM Wing Yew Poon <wyp...@cloudera.com.invalid>
wrote:

> I was on vacation.
> Currently, write modes (copy-on-write/merge-on-read) can only be set as
> table properties, and default to copy-on-write. We have a customer who
> wants to use copy-on-write for certain Spark jobs that write to some
> Iceberg table and merge-on-read for other Spark jobs writing to the same
> table, because of the write characteristics of those jobs. This seems like
> a use case that should be supported. The only way they can do this
> currently is to toggle the table property as needed before doing the
> writes. This is not a sustainable workaround.
> Hence, I think it would be useful to be able to configure the write mode
> as a SQLConf. I also disagree that the table property should always win. If
> this is the case, there is no way to override it. The existing behavior in
> SparkConfParser is to use the option if set, else use the session conf if
> set, else use the table property. This applies across the board.
> - Wing Yew
>
>
>
>
>
>
> On Sun, Jul 16, 2023 at 4:48 PM Ryan Blue <b...@tabular.io> wrote:
>
>> Yes, I agree that there is value for administrators from having some
>> things exposed as Spark SQL configuration. That gets much harder when you
>> want to use the SQLConf for table-level settings, though. For example, the
>> target split size is something that was an engine setting in the Hadoop
>> world, even though it makes no sense to use the same setting across vastly
>> different tables --- think about joining a fact table with a dimension
>> table.
>>
>> Settings like write mode are table-level settings. It matters what is
>> downstream of the table. You may want to set a *default* write mode, but
>> the table-level setting should always win. Currently, there are limits to
>> overriding the write mode in SQL. That's why we should add hints. For
>> anything beyond that, I think we need to discuss what you're trying to do.
>> If it's to override a table-level setting with a SQL global, then we should
>> understand the use case better.
>>
>> On Fri, Jul 14, 2023 at 6:09 PM Wing Yew Poon <wyp...@cloudera.com.invalid>
>> wrote:
>>
>>> Also, in the case of write mode (I mean write.delete.mode,
>>> write.update.mode, write.merge.mode), these cannot be set as options
>>> currently; they are only settable as table properties.
>>>
>>> On Fri, Jul 14, 2023 at 5:58 PM Wing Yew Poon <wyp...@cloudera.com>
>>> wrote:
>>>
>>>> I think that different use cases benefit from or even require different
>>>> solutions. I think enabling options in Spark SQL is helpful, but allowing
>>>> some configurations to be done in SQLConf is also helpful.
>>>> For Cheng Pan's use case (to disable locality), I think providing a
>>>> conf (which can be added to spark-defaults.conf by a cluster admin) is
>>>> useful.
>>>> For my customer's use case (https://github.com/apache/iceberg/pull/7790),
>>>> being able to set the write mode per Spark job (where right now it can only
>>>> be set as a table property) is useful. Allowing this to be done in the SQL
>>>> with an option/hint could also work, but as I understand it, Szehon's PR (
>>>> https://github.com/apache/spark/pull/416830) is only applicable to
>>>> reads, not writes.
>>>>
>>>> - Wing Yew
>>>>
>>>>
>>>> On Thu, Jul 13, 2023 at 1:04 AM Cheng Pan <pan3...@gmail.com> wrote:
>>>>
>>>>> Ryan, I understand that option should be job-specific, and introducing
>>>>> an OPTIONS HINT can make Spark SQL achieves similar capabilities as
>>>>> DataFrame API does.
>>>>>
>>>>> My point is, some of the Iceberg options should not be job-specific.
>>>>>
>>>>> For example, Iceberg has an option “locality” which only allows
>>>>> setting at the job level, but Spark has a configuration
>>>>> “spark.shuffle.reduceLocality.enabled” which allows setting at the cluster
>>>>> level, this is a gap block Spark administers migrate to Iceberg because
>>>>> they can not disable it at the cluster level.
>>>>>
>>>>> So, what’s the principle in the Iceberg of classifying a configuration
>>>>> into SQLConf or OPTION?
>>>>>
>>>>> Thanks,
>>>>> Cheng Pan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> > On Jul 5, 2023, at 16:26, Cheng Pan <pan3...@gmail.com> wrote:
>>>>> >
>>>>> > I would argue that the SQLConf way is more in line with Spark
>>>>> user/administrator habits.
>>>>> >
>>>>> > It’s a common practice that Spark administrators set configurations
>>>>> in spark-defaults.conf at the cluster level , and when the user has issues
>>>>> with their Spark SQL/Jobs, the first question they asked mostly is: can it
>>>>> be fixed by adding a spark configuration?
>>>>> >
>>>>> > The OPTIONS way brings additional learning efforts to Spark users
>>>>> and how can Spark administrators set them at cluster level?
>>>>> >
>>>>> > Thanks,
>>>>> > Cheng Pan
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >> On Jun 17, 2023, at 04:01, Wing Yew Poon
>>>>> <wyp...@cloudera.com.INVALID> wrote:
>>>>> >>
>>>>> >> Hi,
>>>>> >> I recently put up a PR, https://github.com/apache/iceberg/pull/7790,
>>>>> to allow the write mode (copy-on-write/merge-on-read) to be specified in
>>>>> SQLConf. The use case is explained in the PR.
>>>>> >> Cheng Pan has an open PR,
>>>>> https://github.com/apache/iceberg/pull/7733, to allow locality to be
>>>>> specified in SQLConf.
>>>>> >> In the recent past, https://github.com/apache/iceberg/pull/6838/
>>>>> was a PR to allow the write distribution mode to be specified in SQLConf.
>>>>> This was merged.
>>>>> >> Cheng Pan asks if there is any guidance on when we should allow
>>>>> configs to be specified in SQLConf.
>>>>> >> Thanks,
>>>>> >> Wing Yew
>>>>> >>
>>>>> >> ps. The above open PRs could use reviews by committers.
>>>>> >>
>>>>> >
>>>>>
>>>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

-- 
Ryan Blue
Tabular

Reply via email to