+1 Kent Yao
On 2024/04/30 09:07:21 Yuming Wang wrote: > +1 > > On Tue, Apr 30, 2024 at 3:31 PM Ye Xianjin <advance...@gmail.com> wrote: > > > +1 > > Sent from my iPhone > > > > On Apr 30, 2024, at 3:23 PM, DB Tsai <dbt...@dbtsai.com> wrote: > > > > > > +1 > > > > On Apr 29, 2024, at 8:01 PM, Wenchen Fan <cloud0...@gmail.com> wrote: > > > > > > To add more color: > > > > Spark data source table and Hive Serde table are both stored in the Hive > > metastore and keep the data files in the table directory. The only > > difference is they have different "table provider", which means Spark will > > use different reader/writer. Ideally the Spark native data source > > reader/writer is faster than the Hive Serde ones. > > > > What's more, the default format of Hive Serde is text. I don't think > > people want to use text format tables in production. Most people will add > > `STORED AS parquet` or `USING parquet` explicitly. By setting this config > > to false, we have a more reasonable default behavior: creating Parquet > > tables (or whatever is specified by `spark.sql.sources.default`). > > > > On Tue, Apr 30, 2024 at 10:45 AM Wenchen Fan <cloud0...@gmail.com> wrote: > > > >> @Mich Talebzadeh <mich.talebza...@gmail.com> there seems to be a > >> misunderstanding here. The Spark native data source table is still stored > >> in the Hive metastore, it's just that Spark will use a different (and > >> faster) reader/writer for it. `hive-site.xml` should work as it is today. > >> > >> On Tue, Apr 30, 2024 at 5:23 AM Hyukjin Kwon <gurwls...@apache.org> > >> wrote: > >> > >>> +1 > >>> > >>> It's a legacy conf that we should eventually remove it away. Spark > >>> should create Spark table by default, not Hive table. > >>> > >>> Mich, for your workload, you can simply switch that conf off if it > >>> concerns you. We also enabled ANSI as well (that you agreed on). It's a > >>> bit > >>> akwakrd to stop in the middle for this compatibility reason during making > >>> Spark sound. The compatibility has been tested in production for a long > >>> time so I don't see any particular issue about the compatibility case you > >>> mentioned. > >>> > >>> On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh < > >>> mich.talebza...@gmail.com> wrote: > >>> > >>>> > >>>> Hi @Wenchen Fan <cloud0...@gmail.com> > >>>> > >>>> Thanks for your response. I believe we have not had enough time to > >>>> "DISCUSS" this matter. > >>>> > >>>> Currently in order to make Spark take advantage of Hive, I create a > >>>> soft link in $SPARK_HOME/conf. FYI, my spark version is 3.4.0 and Hive is > >>>> 3.1.1 > >>>> > >>>> /opt/spark/conf/hive-site.xml -> > >>>> /data6/hduser/hive-3.1.1/conf/hive-site.xml > >>>> > >>>> This works fine for me in my lab. So in the future if we opt to use the > >>>> setting "spark.sql.legacy.createHiveTableByDefault" to False, there will > >>>> not be a need for this logical link.? > >>>> On the face of it, this looks fine but in real life it may require a > >>>> number of changes to the old scripts. Hence my concern. > >>>> As a matter of interest has anyone liaised with the Hive team to ensure > >>>> they have introduced the additional changes you outlined? > >>>> > >>>> HTH > >>>> > >>>> Mich Talebzadeh, > >>>> Technologist | Architect | Data Engineer | Generative AI | FinCrime > >>>> London > >>>> United Kingdom > >>>> > >>>> > >>>> view my Linkedin profile > >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > >>>> > >>>> > >>>> https://en.everybodywiki.com/Mich_Talebzadeh > >>>> > >>>> > >>>> > >>>> *Disclaimer:* The information provided is correct to the best of my > >>>> knowledge but of course cannot be guaranteed . It is essential to note > >>>> that, as with any advice, quote "one test result is worth one-thousand > >>>> expert opinions (Werner > >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun > >>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > >>>> > >>>> > >>>> On Sun, 28 Apr 2024 at 09:34, Wenchen Fan <cloud0...@gmail.com> wrote: > >>>> > >>>>> @Mich Talebzadeh <mich.talebza...@gmail.com> thanks for sharing your > >>>>> concern! > >>>>> > >>>>> Note: creating Spark native data source tables is usually Hive > >>>>> compatible as well, unless we use features that Hive does not support > >>>>> (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to > >>>>> create Spark native table in this case, instead of creating Hive table > >>>>> and > >>>>> fail. > >>>>> > >>>>> On Sat, Apr 27, 2024 at 12:46 PM Cheng Pan <pan3...@gmail.com> wrote: > >>>>> > >>>>>> +1 (non-binding) > >>>>>> > >>>>>> Thanks, > >>>>>> Cheng Pan > >>>>>> > >>>>>> On Sat, Apr 27, 2024 at 9:29 AM Holden Karau <holden.ka...@gmail.com> > >>>>>> wrote: > >>>>>> > > >>>>>> > +1 > >>>>>> > > >>>>>> > Twitter: https://twitter.com/holdenkarau > >>>>>> > Books (Learning Spark, High Performance Spark, etc.): > >>>>>> https://amzn.to/2MaRAG9 > >>>>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > >>>>>> > > >>>>>> > > >>>>>> > On Fri, Apr 26, 2024 at 12:06 PM L. C. Hsieh <vii...@gmail.com> > >>>>>> wrote: > >>>>>> >> > >>>>>> >> +1 > >>>>>> >> > >>>>>> >> On Fri, Apr 26, 2024 at 10:01 AM Dongjoon Hyun < > >>>>>> dongj...@apache.org> wrote: > >>>>>> >> > > >>>>>> >> > I'll start with my +1. > >>>>>> >> > > >>>>>> >> > Dongjoon. > >>>>>> >> > > >>>>>> >> > On 2024/04/26 16:45:51 Dongjoon Hyun wrote: > >>>>>> >> > > Please vote on SPARK-46122 to set > >>>>>> spark.sql.legacy.createHiveTableByDefault > >>>>>> >> > > to `false` by default. The technical scope is defined in the > >>>>>> following PR. > >>>>>> >> > > > >>>>>> >> > > - DISCUSSION: > >>>>>> >> > > > >>>>>> https://lists.apache.org/thread/ylk96fg4lvn6klxhj6t6yh42lyqb8wmd > >>>>>> >> > > - JIRA: https://issues.apache.org/jira/browse/SPARK-46122 > >>>>>> >> > > - PR: https://github.com/apache/spark/pull/46207 > >>>>>> >> > > > >>>>>> >> > > The vote is open until April 30th 1AM (PST) and passes > >>>>>> >> > > if a majority +1 PMC votes are cast, with a minimum of 3 +1 > >>>>>> votes. > >>>>>> >> > > > >>>>>> >> > > [ ] +1 Set spark.sql.legacy.createHiveTableByDefault to false > >>>>>> by default > >>>>>> >> > > [ ] -1 Do not change spark.sql.legacy.createHiveTableByDefault > >>>>>> because ... > >>>>>> >> > > > >>>>>> >> > > Thank you in advance. > >>>>>> >> > > > >>>>>> >> > > Dongjoon > >>>>>> >> > > > >>>>>> >> > > >>>>>> >> > > >>>>>> --------------------------------------------------------------------- > >>>>>> >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>>>>> >> > > >>>>>> >> > >>>>>> >> > >>>>>> --------------------------------------------------------------------- > >>>>>> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>>>>> >> > >>>>>> > >>>>>> --------------------------------------------------------------------- > >>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >>>>>> > >>>>>> > --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org