+1 (non-binding) p.s How do I become binding?
Thanks, Nimrod On Tue, Apr 30, 2024 at 10:53 AM Ye Xianjin <advance...@gmail.com> wrote: > +1 > Sent from my iPhone > > On Apr 30, 2024, at 3:23 PM, DB Tsai <dbt...@dbtsai.com> wrote: > > > +1 > > On Apr 29, 2024, at 8:01 PM, Wenchen Fan <cloud0...@gmail.com> wrote: > > > To add more color: > > Spark data source table and Hive Serde table are both stored in the Hive > metastore and keep the data files in the table directory. The only > difference is they have different "table provider", which means Spark will > use different reader/writer. Ideally the Spark native data source > reader/writer is faster than the Hive Serde ones. > > What's more, the default format of Hive Serde is text. I don't think > people want to use text format tables in production. Most people will add > `STORED AS parquet` or `USING parquet` explicitly. By setting this config > to false, we have a more reasonable default behavior: creating Parquet > tables (or whatever is specified by `spark.sql.sources.default`). > > On Tue, Apr 30, 2024 at 10:45 AM Wenchen Fan <cloud0...@gmail.com> wrote: > >> @Mich Talebzadeh <mich.talebza...@gmail.com> there seems to be a >> misunderstanding here. The Spark native data source table is still stored >> in the Hive metastore, it's just that Spark will use a different (and >> faster) reader/writer for it. `hive-site.xml` should work as it is today. >> >> On Tue, Apr 30, 2024 at 5:23 AM Hyukjin Kwon <gurwls...@apache.org> >> wrote: >> >>> +1 >>> >>> It's a legacy conf that we should eventually remove it away. Spark >>> should create Spark table by default, not Hive table. >>> >>> Mich, for your workload, you can simply switch that conf off if it >>> concerns you. We also enabled ANSI as well (that you agreed on). It's a bit >>> akwakrd to stop in the middle for this compatibility reason during making >>> Spark sound. The compatibility has been tested in production for a long >>> time so I don't see any particular issue about the compatibility case you >>> mentioned. >>> >>> On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> >>>> Hi @Wenchen Fan <cloud0...@gmail.com> >>>> >>>> Thanks for your response. I believe we have not had enough time to >>>> "DISCUSS" this matter. >>>> >>>> Currently in order to make Spark take advantage of Hive, I create a >>>> soft link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is >>>> 3.1.1 >>>> >>>> /opt/spark/conf/hive-site.xml -> >>>> /data6/hduser/hive-3.1.1/conf/hive-site.xml >>>> >>>> This works fine for me in my lab. So in the future if we opt to use the >>>> setting "spark.sql.legacy.createHiveTableByDefault" to False, there will >>>> not be a need for this logical link.? >>>> On the face of it, this looks fine but in real life it may require a >>>> number of changes to the old scripts. Hence my concern. >>>> As a matter of interest has anyone liaised with the Hive team to ensure >>>> they have introduced the additional changes you outlined? >>>> >>>> HTH >>>> >>>> Mich Talebzadeh, >>>> Technologist | Architect | Data Engineer | Generative AI | FinCrime >>>> London >>>> United Kingdom >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>> >>>> >>>> >>>> *Disclaimer:* The information provided is correct to the best of my >>>> knowledge but of course cannot be guaranteed . It is essential to note >>>> that, as with any advice, quote "one test result is worth one-thousand >>>> expert opinions (Werner >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >>>> >>>> >>>> On Sun, 28 Apr 2024 at 09:34, Wenchen Fan <cloud0...@gmail.com> wrote: >>>> >>>>> @Mich Talebzadeh <mich.talebza...@gmail.com> thanks for sharing your >>>>> concern! >>>>> >>>>> Note: creating Spark native data source tables is usually Hive >>>>> compatible as well, unless we use features that Hive does not support >>>>> (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to >>>>> create Spark native table in this case, instead of creating Hive table and >>>>> fail. >>>>> >>>>> On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan <pan3...@gmail.com> wrote: >>>>> >>>>>> +1 (non-binding) >>>>>> >>>>>> Thanks, >>>>>> Cheng Pan >>>>>> >>>>>> On Sat, Apr 27, 2024 at 9:29 AM Holden Karau <holden.ka...@gmail.com> >>>>>> wrote: >>>>>> > >>>>>> > +1 >>>>>> > >>>>>> > Twitter: https://twitter.com/holdenkarau >>>>>> > Books (Learning Spark, High Performance Spark, etc.): >>>>>> https://amzn.to/2MaRAG9 >>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>> > >>>>>> > >>>>>> > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh <vii...@gmail.com> >>>>>> wrote: >>>>>> >> >>>>>> >> +1 >>>>>> >> >>>>>> >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun < >>>>>> dongj...@apache.org> wrote: >>>>>> >> > >>>>>> >> > I'll start with my +1. >>>>>> >> > >>>>>> >> > Dongjoon. >>>>>> >> > >>>>>> >> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote: >>>>>> >> > > Please vote on SPARK-46122 to set >>>>>> spark.sql.legacy.createHiveTableByDefault >>>>>> >> > > to `false` by default. The technical scope is defined in the >>>>>> following PR. >>>>>> >> > > >>>>>> >> > > - DISCUSSION: >>>>>> >> > > >>>>>> https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd >>>>>> >> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122 >>>>>> >> > > - PR: https://github.com/apache/spark/pull/46207 >>>>>> >> > > >>>>>> >> > > The vote is open until April 30th 1AM (PST) and passes >>>>>> >> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 >>>>>> votes. >>>>>> >> > > >>>>>> >> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false >>>>>> by default >>>>>> >> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault >>>>>> because ... >>>>>> >> > > >>>>>> >> > > Thank you in advance. >>>>>> >> > > >>>>>> >> > > Dongjoon >>>>>> >> > > >>>>>> >> > >>>>>> >> > >>>>>> --------------------------------------------------------------------- >>>>>> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>> >> > >>>>>> >> >>>>>> >> >>>>>> --------------------------------------------------------------------- >>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>> >> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>> >>>>>>