Re: Supports Dynamic Table Options for Spark SQL

Nicholas Chammas Mon, 15 Nov 2021 11:30:54 -0800

Side note about time travel: There is a PR
<https://github.com/apache/spark/pull/34497> to add VERSION/TIMESTAMP AS OF
syntax to Spark SQL.


On Mon, Nov 15, 2021 at 2:23 PM Ryan Blue <b...@tabular.io> wrote:

> I want to note that I wouldn't recommend time traveling this way by using
> the hint for `snapshot-id`. Instead, we want to add the standard SQL syntax
> for that in a separate PR. This is useful for other options that help a
> table scan perform better, like specifying the target split size.
>
> You're right that this isn't a typical optimizer hint, but I'm not sure
> what other syntax is possible for this use case. How else would we send
> custom properties through to the scan?
>
> On Mon, Nov 15, 2021 at 9:25 AM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> I am looking at the hint and it appears to me (I stand corrected), it is
>> a single table hint as below:
>>
>> -- time travel
>> SELECT * FROM t /*+ OPTIONS('snapshot-id'='10963874102873L') */
>>
>> My assumption is that any view on this table will also benefit from this
>> hint. This is not a hint to optimizer in a classical sense. Only a snapshot
>> hint. Normally, a hint is an instruction to the optimizer. When writing
>> SQL, one may know information about the data unknown to the optimizer.
>> Hints enable one to make decisions normally made by the optimizer,
>> sometimes causing the optimizer to select a plan that it sees as higher
>> cost.
>>
>>
>> So far as this case is concerned, it looks OK and I concur it should be
>> extended to write as well.
>>
>>
>> HTH
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Mon, 15 Nov 2021 at 17:02, Russell Spitzer <russell.spit...@gmail.com>
>> wrote:
>>
>>> I think since we probably will end up using this same syntax on write,
>>> this makes a lot of sense. Unless there is another good way to express a
>>> similar concept during a write operation I think going forward with this
>>> would be ok.
>>>
>>> On Mon, Nov 15, 2021 at 10:44 AM Ryan Blue <b...@tabular.io> wrote:
>>>
>>>> The proposed feature is to be able to pass options through SQL like you
>>>> would when using the DataFrameReader API, so it would work for all
>>>> sources that support read options. Read options are part of the DSv2 API,
>>>> there just isn’t a way to pass options when using SQL. The PR also has a
>>>> non-Iceberg example, which is being able to customize some JDBC source
>>>> behaviors per query (e.g., fetchSize), rather than globally in the table’s
>>>> options.
>>>>
>>>> The proposed syntax is odd, but I think that's an artifact of Spark
>>>> introducing read options that aren't a normal part of SQL. Seems reasonable
>>>> to me to pass them through a hint.
>>>>
>>>> On Mon, Nov 15, 2021 at 2:18 AM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Interesting.
>>>>>
>>>>> What is this going to add on top of support for Apache Iceberg
>>>>> <https://www.dremio.com/data-lake/apache-iceberg/>. Will it be in
>>>>> line with support for Hive ACID tables or Delta Lake?
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, 15 Nov 2021 at 01:56, Zhun Wang <wangzhun6...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi dev,
>>>>>>
>>>>>> We are discussing Support Dynamic Table Options for Spark SQL (
>>>>>> https://github.com/apache/spark/pull/34072). It is currently not
>>>>>> sure if the syntax makes sense, and would like to know if there is other
>>>>>> feedback or opinion on this.
>>>>>>
>>>>>> I would appreciate any feedback on this.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>
>
> --
> Ryan Blue
> Tabular
>

Re: Supports Dynamic Table Options for Spark SQL

Reply via email to