It's useful to have a SQL API to specify table options, similar to the DataFrameReader API. However, I share the same concern from @Hyukjin Kwon <gurwls...@gmail.com> and am not very comfortable with using hints to do it.
In the PR, someone mentioned TVF. I think it's better than hints, but still has problems. For example, shall we support `FROM read(t AS VERSION OF 1, options...)`? We probably should investigate if there are similar SQL syntaxes in other databases first. On Wed, Nov 17, 2021 at 2:39 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > This concept is explained here > <https://databricks.com/blog/2019/02/04/introducing-delta-time-travel-for-large-scale-data-lakes.html> > somehow. If this is true why cannot we just use > > SELECT * FROM <TABBLE_NAME> VERSION AS OF <VERSION_NO> > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 16 Nov 2021 at 17:49, Ryan Blue <b...@tabular.io> wrote: > >> Mich, time travel will use the newly added VERSION AS OF or TIMESTAMP AS >> OF syntax. >> >> On Tue, Nov 16, 2021 at 12:40 AM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> As I stated before, hints are designed to direct the optimizer to >>> choose a certain query execution plan based on the specific criteria. >>> >>> >>> -- time travel >>> SELECT * FROM t /*+ OPTIONS('snapshot-id'='10963874102873L') */ >>> >>> >>> The alternative would be to specify time travel by creating a snapshot >>> based on CURRENT_DATE() range which encapsulates time travel for >>> 'snapshot-id'='10963874102873L' >>> >>> >>> CREATE SNAPSHOT t_snap >>> >>> START WITH CURRENT_DATE() - 30 >>> >>> NEXT CURRENT_DATE() >>> >>> AS SELECT * FROM t >>> >>> >>> SELECT * FROM t_snap >>> >>> >>> HTH >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Tue, 16 Nov 2021 at 04:26, Hyukjin Kwon <gurwls...@gmail.com> wrote: >>> >>>> My biggest concern with the syntax in hints is that Spark SQL's options >>>> can change results (e.g., CSV's header options) whereas hints are generally >>>> not designed to affect the external results if I am not mistaken. This is >>>> counterintuitive. >>>> I left the comment in the PR but what's the real benefit over >>>> leveraging: SET conf and RESET conf? we can extract options from runtime >>>> session configurations e.g., SessionConfigSupport. >>>> >>>> On Tue, 16 Nov 2021 at 04:30, Nicholas Chammas < >>>> nicholas.cham...@gmail.com> wrote: >>>> >>>>> Side note about time travel: There is a PR >>>>> <https://github.com/apache/spark/pull/34497> to add VERSION/TIMESTAMP >>>>> AS OF syntax to Spark SQL. >>>>> >>>>> On Mon, Nov 15, 2021 at 2:23 PM Ryan Blue <b...@tabular.io> wrote: >>>>> >>>>>> I want to note that I wouldn't recommend time traveling this way by >>>>>> using the hint for `snapshot-id`. Instead, we want to add the standard >>>>>> SQL >>>>>> syntax for that in a separate PR. This is useful for other options that >>>>>> help a table scan perform better, like specifying the target split size. >>>>>> >>>>>> You're right that this isn't a typical optimizer hint, but I'm not >>>>>> sure what other syntax is possible for this use case. How else would we >>>>>> send custom properties through to the scan? >>>>>> >>>>>> On Mon, Nov 15, 2021 at 9:25 AM Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> I am looking at the hint and it appears to me (I stand corrected), >>>>>>> it is a single table hint as below: >>>>>>> >>>>>>> -- time travel >>>>>>> SELECT * FROM t /*+ OPTIONS('snapshot-id'='10963874102873L') */ >>>>>>> >>>>>>> My assumption is that any view on this table will also benefit from >>>>>>> this hint. This is not a hint to optimizer in a classical sense. Only a >>>>>>> snapshot hint. Normally, a hint is an instruction to the optimizer. >>>>>>> When writing SQL, one may know information about the data unknown to the >>>>>>> optimizer. Hints enable one to make decisions normally made by the >>>>>>> optimizer, sometimes causing the optimizer to select a plan that it >>>>>>> sees as >>>>>>> higher cost. >>>>>>> >>>>>>> >>>>>>> So far as this case is concerned, it looks OK and I concur it should >>>>>>> be extended to write as well. >>>>>>> >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>> for any loss, damage or destruction of data or any other property which >>>>>>> may >>>>>>> arise from relying on this email's technical content is explicitly >>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>> damages >>>>>>> arising from such loss, damage or destruction. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, 15 Nov 2021 at 17:02, Russell Spitzer < >>>>>>> russell.spit...@gmail.com> wrote: >>>>>>> >>>>>>>> I think since we probably will end up using this same syntax on >>>>>>>> write, this makes a lot of sense. Unless there is another good way to >>>>>>>> express a similar concept during a write operation I think going >>>>>>>> forward >>>>>>>> with this would be ok. >>>>>>>> >>>>>>>> On Mon, Nov 15, 2021 at 10:44 AM Ryan Blue <b...@tabular.io> wrote: >>>>>>>> >>>>>>>>> The proposed feature is to be able to pass options through SQL >>>>>>>>> like you would when using the DataFrameReader API, so it would >>>>>>>>> work for all sources that support read options. Read options are part >>>>>>>>> of >>>>>>>>> the DSv2 API, there just isn’t a way to pass options when using SQL. >>>>>>>>> The PR >>>>>>>>> also has a non-Iceberg example, which is being able to customize some >>>>>>>>> JDBC >>>>>>>>> source behaviors per query (e.g., fetchSize), rather than globally in >>>>>>>>> the >>>>>>>>> table’s options. >>>>>>>>> >>>>>>>>> The proposed syntax is odd, but I think that's an artifact of >>>>>>>>> Spark introducing read options that aren't a normal part of SQL. Seems >>>>>>>>> reasonable to me to pass them through a hint. >>>>>>>>> >>>>>>>>> On Mon, Nov 15, 2021 at 2:18 AM Mich Talebzadeh < >>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Interesting. >>>>>>>>>> >>>>>>>>>> What is this going to add on top of support for Apache Iceberg >>>>>>>>>> <https://www.dremio.com/data-lake/apache-iceberg/>. Will it be >>>>>>>>>> in line with support for Hive ACID tables or Delta Lake? >>>>>>>>>> >>>>>>>>>> HTH >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> view my Linkedin profile >>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all >>>>>>>>>> responsibility for any loss, damage or destruction of data or any >>>>>>>>>> other >>>>>>>>>> property which may arise from relying on this email's technical >>>>>>>>>> content is >>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any >>>>>>>>>> monetary damages arising from such loss, damage or destruction. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, 15 Nov 2021 at 01:56, Zhun Wang <wangzhun6...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi dev, >>>>>>>>>>> >>>>>>>>>>> We are discussing Support Dynamic Table Options for Spark SQL ( >>>>>>>>>>> https://github.com/apache/spark/pull/34072). It is currently >>>>>>>>>>> not sure if the syntax makes sense, and would like to know if there >>>>>>>>>>> is >>>>>>>>>>> other feedback or opinion on this. >>>>>>>>>>> >>>>>>>>>>> I would appreciate any feedback on this. >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Ryan Blue >>>>>>>>> Tabular >>>>>>>>> >>>>>>>> >>>>>> >>>>>> -- >>>>>> Ryan Blue >>>>>> Tabular >>>>>> >>>>> >> >> -- >> Ryan Blue >> Tabular >> >