It's useful to have a SQL API to specify table options, similar to the
DataFrameReader API. However, I share the same concern from @Hyukjin Kwon
<gurwls...@gmail.com> and am not very comfortable with using hints to do it.

In the PR, someone mentioned TVF. I think it's better than hints, but still
has problems. For example, shall we support `FROM read(t AS VERSION OF 1,
options...)`?

We probably should investigate if there are similar SQL syntaxes in other
databases first.

On Wed, Nov 17, 2021 at 2:39 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> This concept is explained here
> <https://databricks.com/blog/2019/02/04/introducing-delta-time-travel-for-large-scale-data-lakes.html>
> somehow. If this is true why cannot we just use
>
> SELECT * FROM <TABBLE_NAME> VERSION AS OF <VERSION_NO>
>
>
>   view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 16 Nov 2021 at 17:49, Ryan Blue <b...@tabular.io> wrote:
>
>> Mich, time travel will use the newly added VERSION AS OF or TIMESTAMP AS
>> OF syntax.
>>
>> On Tue, Nov 16, 2021 at 12:40 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> As I stated before, hints are designed to direct the optimizer to
>>> choose a certain query execution plan based on the specific criteria.
>>>
>>>
>>> -- time travel
>>> SELECT * FROM t /*+ OPTIONS('snapshot-id'='10963874102873L') */
>>>
>>>
>>> The alternative would be to specify time travel by creating a snapshot
>>> based on CURRENT_DATE() range which encapsulates time travel for
>>> 'snapshot-id'='10963874102873L'
>>>
>>>
>>> CREATE SNAPSHOT t_snap
>>>
>>>   START WITH CURRENT_DATE() - 30
>>>
>>>   NEXT CURRENT_DATE()
>>>
>>>   AS SELECT * FROM t
>>>
>>>
>>> SELECT * FROM t_snap
>>>
>>>
>>> HTH
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 16 Nov 2021 at 04:26, Hyukjin Kwon <gurwls...@gmail.com> wrote:
>>>
>>>> My biggest concern with the syntax in hints is that Spark SQL's options
>>>> can change results (e.g., CSV's header options) whereas hints are generally
>>>> not designed to affect the external results if I am not mistaken. This is
>>>> counterintuitive.
>>>> I left the comment in the PR but what's the real benefit over
>>>> leveraging: SET conf and RESET conf? we can extract options from runtime
>>>> session configurations e.g., SessionConfigSupport.
>>>>
>>>> On Tue, 16 Nov 2021 at 04:30, Nicholas Chammas <
>>>> nicholas.cham...@gmail.com> wrote:
>>>>
>>>>> Side note about time travel: There is a PR
>>>>> <https://github.com/apache/spark/pull/34497> to add VERSION/TIMESTAMP
>>>>> AS OF syntax to Spark SQL.
>>>>>
>>>>> On Mon, Nov 15, 2021 at 2:23 PM Ryan Blue <b...@tabular.io> wrote:
>>>>>
>>>>>> I want to note that I wouldn't recommend time traveling this way by
>>>>>> using the hint for `snapshot-id`. Instead, we want to add the standard 
>>>>>> SQL
>>>>>> syntax for that in a separate PR. This is useful for other options that
>>>>>> help a table scan perform better, like specifying the target split size.
>>>>>>
>>>>>> You're right that this isn't a typical optimizer hint, but I'm not
>>>>>> sure what other syntax is possible for this use case. How else would we
>>>>>> send custom properties through to the scan?
>>>>>>
>>>>>> On Mon, Nov 15, 2021 at 9:25 AM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> I am looking at the hint and it appears to me (I stand corrected),
>>>>>>> it is a single table hint as below:
>>>>>>>
>>>>>>> -- time travel
>>>>>>> SELECT * FROM t /*+ OPTIONS('snapshot-id'='10963874102873L') */
>>>>>>>
>>>>>>> My assumption is that any view on this table will also benefit from
>>>>>>> this hint. This is not a hint to optimizer in a classical sense. Only a
>>>>>>> snapshot hint. Normally, a hint is an instruction to the optimizer.
>>>>>>> When writing SQL, one may know information about the data unknown to the
>>>>>>> optimizer. Hints enable one to make decisions normally made by the
>>>>>>> optimizer, sometimes causing the optimizer to select a plan that it 
>>>>>>> sees as
>>>>>>> higher cost.
>>>>>>>
>>>>>>>
>>>>>>> So far as this case is concerned, it looks OK and I concur it should
>>>>>>> be extended to write as well.
>>>>>>>
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 15 Nov 2021 at 17:02, Russell Spitzer <
>>>>>>> russell.spit...@gmail.com> wrote:
>>>>>>>
>>>>>>>> I think since we probably will end up using this same syntax on
>>>>>>>> write, this makes a lot of sense. Unless there is another good way to
>>>>>>>> express a similar concept during a write operation I think going 
>>>>>>>> forward
>>>>>>>> with this would be ok.
>>>>>>>>
>>>>>>>> On Mon, Nov 15, 2021 at 10:44 AM Ryan Blue <b...@tabular.io> wrote:
>>>>>>>>
>>>>>>>>> The proposed feature is to be able to pass options through SQL
>>>>>>>>> like you would when using the DataFrameReader API, so it would
>>>>>>>>> work for all sources that support read options. Read options are part 
>>>>>>>>> of
>>>>>>>>> the DSv2 API, there just isn’t a way to pass options when using SQL. 
>>>>>>>>> The PR
>>>>>>>>> also has a non-Iceberg example, which is being able to customize some 
>>>>>>>>> JDBC
>>>>>>>>> source behaviors per query (e.g., fetchSize), rather than globally in 
>>>>>>>>> the
>>>>>>>>> table’s options.
>>>>>>>>>
>>>>>>>>> The proposed syntax is odd, but I think that's an artifact of
>>>>>>>>> Spark introducing read options that aren't a normal part of SQL. Seems
>>>>>>>>> reasonable to me to pass them through a hint.
>>>>>>>>>
>>>>>>>>> On Mon, Nov 15, 2021 at 2:18 AM Mich Talebzadeh <
>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Interesting.
>>>>>>>>>>
>>>>>>>>>> What is this going to add on top of support for Apache Iceberg
>>>>>>>>>> <https://www.dremio.com/data-lake/apache-iceberg/>. Will it be
>>>>>>>>>> in line with support for Hive ACID tables or Delta Lake?
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>> other
>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>> content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, 15 Nov 2021 at 01:56, Zhun Wang <wangzhun6...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi dev,
>>>>>>>>>>>
>>>>>>>>>>> We are discussing Support Dynamic Table Options for Spark SQL (
>>>>>>>>>>> https://github.com/apache/spark/pull/34072). It is currently
>>>>>>>>>>> not sure if the syntax makes sense, and would like to know if there 
>>>>>>>>>>> is
>>>>>>>>>>> other feedback or opinion on this.
>>>>>>>>>>>
>>>>>>>>>>> I would appreciate any feedback on this.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Ryan Blue
>>>>>>>>> Tabular
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Tabular
>>>>>>
>>>>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Reply via email to