Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-20 Thread Maxim Gekk
+1 On Wed, Feb 21, 2024 at 9:50 AM Hyukjin Kwon wrote: > +1 > > On Tue, 20 Feb 2024 at 22:00, Cheng Pan wrote: > >> +1 (non-binding) >> >> - Build successfully from source code. >> - Pass integration tests with Spark ClickHouse Connector[1] >> >> [1]

Re: Apache Spark 3.4.2 (?)

2023-11-09 Thread Maxim Gekk
+1 On Wed, Nov 8, 2023 at 5:29 AM kazuyuki tanimura wrote: > +1 > > Kazu > > On Nov 7, 2023, at 5:23 PM, L. C. Hsieh wrote: > > +1 > > On Tue, Nov 7, 2023 at 4:56 PM Dongjoon Hyun > wrote: > > > Thank you all! > > Dongjoon > > On Mon, Nov 6, 2023 at 6:03 PM Holden Karau wrote: > > > +1 > >

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-18 Thread Maxim Gekk
Thank you for the work, Yuanjian! On Mon, Sep 18, 2023 at 6:28 AM beliefer wrote: > Congratulations! Apache Spark. > > > > At 2023-09-16 01:01:40, "Yuanjian Li" wrote: > > Hi All, > > We are happy to announce the availability of *Apache Spark 3.5.0*! > > Apache Spark 3.5.0 is the sixth release

Re: Time for Spark v3.5.0 release

2023-07-04 Thread Maxim Gekk
+1 On Tue, Jul 4, 2023 at 11:55 AM Kent Yao wrote: > +1, thank you > > Kent > > On 2023/07/04 05:32:52 Dongjoon Hyun wrote: > > +1 > > > > Thank you, Yuanjian > > > > Dongjoon > > > > On Tue, Jul 4, 2023 at 1:03 AM Hyukjin Kwon > wrote: > > > > > Yeah one day postponed shouldn't be a big deal.

Re: starter tasks for new contributors

2023-04-24 Thread Maxim Gekk
in the related JIRA. Maxim Gekk Software Engineer Databricks, Inc. On Mon, Mar 20, 2023 at 9:17 PM Maxim Gekk wrote: > Hello Jason, Soumya, Rui: > > You can start from https://spark.apache.org/contributing.html, and take > one of those tasks (just leave a comment that you are g

Re: starter tasks for new contributors

2023-03-20 Thread Maxim Gekk
/apache/spark/pull/38656 - https://github.com/apache/spark/pull/38490 Maxim Gekk Software Engineer Databricks, Inc. On Sat, Mar 18, 2023 at 1:44 AM jason carlson wrote: > I’m also interested in helping out. > > -Jason Carlson > > On Mar 17, 2023, at 6:18 PM, Soumya Acharya >

starter tasks for new contributors

2023-03-17 Thread Maxim Gekk
free to ping me (and other contributors) in your PRs. Maxim Gekk Software Engineer Databricks, Inc.

Re: Time for release v3.3.2

2023-01-30 Thread Maxim Gekk
+1 On Tue, Jan 31, 2023 at 10:12 AM John Zhuge wrote: > +1 Thanks Liang-Chi for driving the release! > > On Mon, Jan 30, 2023 at 10:26 PM Yuming Wang wrote: > >> +1 >> >> On Tue, Jan 31, 2023 at 12:18 PM yangjie01 wrote: >> >>> +1 Thanks Liang-Chi! >>> >>> >>> >>> YangJie >>> >>> >>> >>>

Re: Time for Spark 3.4.0 release?

2023-01-04 Thread Maxim Gekk
+1 On Thu, Jan 5, 2023 at 12:25 AM huaxin gao wrote: > +1 Thanks! > > On Wed, Jan 4, 2023 at 10:19 AM L. C. Hsieh wrote: > >> +1 >> >> Thank you! >> >> On Wed, Jan 4, 2023 at 9:13 AM Chao Sun wrote: >> >>> +1, thanks! >>> >>> Chao >>> >>> On Wed, Jan 4, 2023 at 1:56 AM Mridul Muralidharan

Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Maxim Gekk
Thank you, Chao! On Wed, Nov 30, 2022 at 12:42 PM Jungtaek Lim wrote: > Thanks Chao for driving the release! > > On Wed, Nov 30, 2022 at 6:03 PM Wenchen Fan wrote: > >> Thanks, Chao! >> >> On Wed, Nov 30, 2022 at 1:33 AM Chao Sun wrote: >> >>> We are happy to announce the availability of

Re: [ANNOUNCE] Apache Spark 3.3.1 released

2022-10-26 Thread Maxim Gekk
Congratulations everyone with the new release, and thanks to Yuming for his efforts. Maxim Gekk Software Engineer Databricks, Inc. On Wed, Oct 26, 2022 at 10:14 AM Hyukjin Kwon wrote: > Thanks, Yuming. > > On Wed, 26 Oct 2022 at 16:01, L. C. Hsieh wrote: > >> Tha

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Maxim Gekk
Keep up the great work, Yikun! On Sun, Oct 9, 2022 at 10:52 AM Gengliang Wang wrote: > Congratulations, Yikun! > > On Sun, Oct 9, 2022 at 12:33 AM 416161...@qq.com > wrote: > >> Congrats, Yikun! >> >> -- >> Ruifeng Zheng >> ruife...@foxmail.com >> >>

Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Maxim Gekk
Congratulations, Xinrong! Maxim Gekk Software Engineer Databricks, Inc. On Tue, Aug 9, 2022 at 3:15 PM Weichen Xu wrote: > Congrats! > > On Tue, Aug 9, 2022 at 5:55 PM Jungtaek Lim > wrote: > >> Congrats Xinrong! Well deserved. >> >> 2022년 8월 9일 (화) 오후 5:

Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-12 Thread Maxim Gekk
+1 On Tue, Jul 12, 2022 at 11:05 AM Yang,Jie(INF) wrote: > +1 (non-binding) > > > > Yang Jie > > > > > > *发件人**: *Dongjoon Hyun > *日期**: *2022年7月12日 星期二 16:03 > *收件人**: *dev > *抄送**: *Cheng Su , "Yang,Jie(INF)" , > Sean Owen > *主题**: *Re: [VOTE] Release Spark 3.2.2 (RC1) > > > > +1 > > > >

Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Maxim Gekk
+1 On Thu, Jul 7, 2022 at 12:26 AM John Zhuge wrote: > +1 Thanks for the effort! > > On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen > wrote: > >> +1 >> >> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : >> >>> Yeah +1 >>> >>> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun >>> wrote: >>> Hi,

[ANNOUNCE] Apache Spark 3.3.0 released

2022-06-17 Thread Maxim Gekk
for contributing to this release. This release would not have been possible without you. Maxim Gekk Software Engineer Databricks, Inc.

[VOTE][RESULT] Release Spark 3.3.0 (RC6)

2022-06-14 Thread Maxim Gekk
Chao Sun Martin Grigorov Peter Toth Max Gekk 0: None -1: None Maxim Gekk Software Engineer Databricks, Inc.

Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-14 Thread Maxim Gekk
(binding) + 8 (non-binding) voted for Spark 3.3.0 RC6. The vote passed *successfully*. Maxim Gekk Software Engineer Databricks, Inc. On Tue, Jun 14, 2022 at 2:40 PM Peter Toth wrote: > +1 > > > Martin Grigorov ezt írta (időpont: 2022. jún. 14., > K, 13:23): > >>

[VOTE] Release Spark 3.3.0 (RC6)

2022-06-09 Thread Maxim Gekk
er to help target the issue. Maxim Gekk Software Engineer Databricks, Inc.

Re: [VOTE] Release Spark 3.3.0 (RC5)

2022-06-09 Thread Maxim Gekk
the voting as *failed*. I will prepare RC6 as soon as the issues mentioned in the thread are solved. Maxim Gekk Software Engineer Databricks, Inc. On Wed, Jun 8, 2022 at 9:18 PM huaxin gao wrote: > I agree with Prashant, -1 from me too because this may break iceberg > usage. >

[VOTE] Release Spark 3.3.0 (RC5)

2022-06-04 Thread Maxim Gekk
er to help target the issue. Maxim Gekk Software Engineer Databricks, Inc.

Re: [VOTE] Release Spark 3.3.0 (RC4)

2022-06-04 Thread Maxim Gekk
Hi All, The voting for RC4 failed because of the issue w/ Scala 2.13. I will prepare RC5 w/ a fix for the issue soon. Maxim Gekk Software Engineer Databricks, Inc. On Sat, Jun 4, 2022 at 12:22 AM L. C. Hsieh wrote: > It's fixed at https://github.com/apache/spark/pull/36762. > > On

[VOTE] Release Spark 3.3.0 (RC4)

2022-06-03 Thread Maxim Gekk
er to help target the issue. Maxim Gekk Software Engineer Databricks, Inc.

Re: The draft of the Spark 3.3.0 release notes

2022-06-03 Thread Maxim Gekk
Hi Adam, The release notes focus on new features and user-facing improvements mostly. SPARK-37618 <https://issues.apache.org/jira/browse/SPARK-37618> seems like a bug fix, that's why I didn't put it in the doc. Maxim Gekk Software Engineer Databricks, Inc. On Fri, Jun 3, 2022 at 2:20 P

The draft of the Spark 3.3.0 release notes

2022-06-03 Thread Maxim Gekk
Hi All, I am preparing the release notes of Spark 3.3.0. Here is a draft document: https://docs.google.com/document/d/1gGySrLGvIK8bajKdGjTI_mDqk0-YPvHmPN64YjoWfOQ/edit?usp=sharing Please take a look and let me know if I missed any major changes or something. Maxim Gekk Software Engineer

Re: [VOTE] Release Spark 3.3.0 (RC3)

2022-05-27 Thread Maxim Gekk
Hi All, The voting for Spark 3.3.0 RC3 failed due to reported correctness issues (regressions): 1. [Merged] https://issues.apache.org/jira/browse/SPARK-39293 2. [In progress] https://issues.apache.org/jira/browse/SPARK-39313 Maxim Gekk Software Engineer Databricks, Inc. On Fri, May 27, 2022

[VOTE] Release Spark 3.3.0 (RC3)

2022-05-24 Thread Maxim Gekk
er to help target the issue. Maxim Gekk Software Engineer Databricks, Inc.

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-20 Thread Maxim Gekk
and report any issues. Maxim Gekk Software Engineer Databricks, Inc. On Thu, May 19, 2022 at 1:59 PM Emil Ejbyfeldt wrote: > Hi, > > When testing out Spark 3.3.0 on our production spark workload it was > noticed that https://issues.apache.org/jira/browse/SPARK-38681 is > actually a

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-18 Thread Maxim Gekk
Hi Kent, > Shall we backport the fix from the master to 3.3 too? Yes, we shall. Maxim Gekk Software Engineer Databricks, Inc. On Thu, May 19, 2022 at 6:44 AM Kent Yao wrote: > Hi, > > I verified the simple case below with the binary release, and it looks > like a bug to me

[VOTE] Release Spark 3.3.0 (RC2)

2022-05-16 Thread Maxim Gekk
er to help target the issue. Maxim Gekk Software Engineer Databricks, Inc.

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-11 Thread Maxim Gekk
Hi All, The vote has failed. I will create RC2 in a couple of days. Maxim Gekk Software Engineer Databricks, Inc. On Wed, May 11, 2022 at 4:23 AM Hyukjin Kwon wrote: > I expect to see RC2 too. I guess he just sticks to the standard, leaving > the vote open till the end. > It h

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-10 Thread Maxim Gekk
Hi All, Today is the last day for voting. Please, *test* the RC1 and vote. Maxim Gekk Software Engineer Databricks, Inc. On Sat, May 7, 2022 at 10:58 AM beliefer wrote: > > @Maxim Gekk Glad to hear that! > But there is a bug https://github.com/apache/spark/pull/36457 > I thi

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-06 Thread Maxim Gekk
Hi Dongjoon, > https://issues.apache.org/jira/projects/SPARK/versions/12350369 > Since RC1 is started, could you move them out from the 3.3.0 milestone? I have removed the 3.3.0 label from Fix version(s). Thank you, Dongjoon. Maxim Gekk Software Engineer Databricks, Inc. On Fri,

[VOTE] Release Spark 3.3.0 (RC1)

2022-05-05 Thread Maxim Gekk
er to help target the issue. Maxim Gekk Software Engineer Databricks, Inc.

Re: Apache Spark 3.3 Release

2022-04-28 Thread Maxim Gekk
Aggregation Function: percentile_disc 16. SPARK-34079: Improvement CTE table scan Max Gekk Software Engineer Databricks, Inc. On Fri, Apr 15, 2022 at 4:28 PM Maxim Gekk wrote: > Hello All, > > Current status of features from the allow list for branch-3.3 is: > > IN PROGRESS: > &

Re: Apache Spark 3.3 Release

2022-04-15 Thread Maxim Gekk
DELETE commands for group-based sources 13. SPARK-34659: Web UI does not correctly get appId 14. SPARK-38589: New SQL function: try_avg Max Gekk Software Engineer Databricks, Inc. On Mon, Apr 4, 2022 at 9:27 PM Maxim Gekk wrote: > Hello All, > > Below is current status of

Re: Apache Spark 3.3 Release

2022-04-04 Thread Maxim Gekk
DELTA_BYTE_ARRAY and DELTA_LENGTH_BYTE_ARRAY encodings for Parquet V2 support We need to decide whether we are going to wait a little bit more or close the doors. Maxim Gekk Software Engineer Databricks, Inc. On Fri, Mar 18, 2022 at 9:22 AM Maxim Gekk wrote: > Hi All, > > Here is

Re: Apache Spark 3.3 Release

2022-03-18 Thread Maxim Gekk
zed DELTA_BYTE_ARRAY/DELTA_LENGTH_BYTE_ARRAY encodings for > >> Parquet V2 support (https://github.com/apache/spark/pull/35262) > >> > >> It's already reviewed and approved. > >> > >> On Wed, Mar 16, 2022 at 9:13 AM Tom Graves > wrote: > >> >

Re: Apache Spark 3.3 Release

2022-03-16 Thread Maxim Gekk
Hi All, I have created the branch for Spark 3.3: https://github.com/apache/spark/commits/branch-3.3 Please, backport important fixes to it, and if you have some doubts, ping me in the PR. Regarding new features, we are still building the allow list for branch-3.3. Best regards, Max Gekk On

Re: Apache Spark 3.3 Release

2022-03-14 Thread Maxim Gekk
Hi All, Since there are no actual blockers for Spark 3.3.0 and significant objections, I am going to cut branch-3.3 after 15th March at 00:00 PST. Please, let us know if you have any concerns about that. Best regards, Max Gekk On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk wrote: > Hello

Apache Spark 3.3 Release

2022-03-03 Thread Maxim Gekk
Hello All, I would like to bring on the table the theme about the new Spark release 3.3. According to the public schedule at https://spark.apache.org/versioning-policy.html, we planned to start the code freeze and release branch cut on March 15th, 2022. Since this date is coming soon, I would

Re: Migration onto error classes and their testing

2022-01-17 Thread Maxim Gekk
Hi All, I have created the umbrella JIRA https://issues.apache.org/jira/browse/SPARK-37935, and a few sub-tasks. If you would like to contribute, please leave a comment in a sub-task that you are working on it. Yours faithfully, Max Gekk On Wed, Jan 12, 2022 at 9:39 PM Maxim Gekk wrote: >

Migration onto error classes and their testing

2022-01-12 Thread Maxim Gekk
Hi All, As you probably know, we introduced error classes ( https://github.com/apache/spark/pull/32850) as a part of the error messages framework (https://issues.apache.org/jira/browse/SPARK-33539). I would like to propose to migrate all exceptions from QueryExecutionErrors,

Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-08 Thread Maxim Gekk
+1 (non-binding) On Fri, Oct 8, 2021 at 10:44 AM Mich Talebzadeh wrote: > +1 (non-binding) > > > >view my Linkedin profile > > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or

Re: [VOTE] Release Spark 3.0.3 (RC1)

2021-06-21 Thread Maxim Gekk
+1 (non-binding) On Mon, Jun 21, 2021 at 3:44 PM Takeshi Yamamuro wrote: > Thank you, Yi. > +1 > > On Mon, Jun 21, 2021 at 3:00 PM Jungtaek Lim > wrote: > >> +1 (non-binding) Thanks for your efforts! >> >> On Mon, Jun 21, 2021 at 2:40 PM Kent Yao wrote: >> >>> +1 (non-binding) >>> >>> *Kent

Re: [ANNOUNCE] Apache Spark 3.1.2 released

2021-06-02 Thread Maxim Gekk
Congratulations everyone with the new release, and thanks to Dongjoon! On Wed, Jun 2, 2021 at 9:08 AM Jungtaek Lim wrote: > Nice! Thanks Dongjoon for your amazing efforts! > > On Wed, Jun 2, 2021 at 2:59 PM Liang-Chi Hsieh wrote: > >> Thank you, Dongjoon! >> >> >> >> Takeshi Yamamuro wrote >>

Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-26 Thread Maxim Gekk
+1 (non-binding) On Mon, May 24, 2021 at 9:14 AM Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.1.2. > > The vote is open until May 27th 1AM (PST) and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1

Re: [ANNOUNCE] Apache Spark 2.4.8 released

2021-05-18 Thread Maxim Gekk
Congratulations everyone with the new release, and thanks to Liang-Chi. Maxim Gekk Software Engineer Databricks, Inc. On Tue, May 18, 2021 at 11:06 AM Yuming Wang wrote: > Great work, Liang-Chi! > > On Tue, May 18, 2021 at 3:57 PM Jungtaek Lim > wrote: > >> Thanks

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-09 Thread Maxim Gekk
+1 (non-binding) On Mon, May 10, 2021 at 5:23 AM Takeshi Yamamuro wrote: > +1 (non-binding) > > I don't see any critical issue in the SQL part. > Many thanks for working on it, Liang-Chi~ > > On Mon, May 10, 2021 at 6:22 AM Liang-Chi Hsieh wrote: > >> Please vote on releasing the following

Re: [VOTE] Release Spark 2.4.8 (RC3)

2021-04-28 Thread Maxim Gekk
+1 (non-binding) On Wed, Apr 28, 2021 at 8:12 PM Wenchen Fan wrote: > +1 (binding) > > On Thu, Apr 29, 2021 at 1:05 AM DB Tsai wrote: > >> +1 (binding) >> >> > On Apr 28, 2021, at 9:26 AM, Liang-Chi Hsieh wrote: >> > >> > >> > Please vote on releasing the following candidate as Apache Spark

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-14 Thread Maxim Gekk
+1 (non-binding) On Wed, Apr 14, 2021 at 6:39 PM Dongjoon Hyun wrote: > +1 > > Bests, > Dongjoon. > > On Tue, Apr 13, 2021 at 10:38 PM Kent Yao wrote: > >> +1 (non-binding) >> >> *Kent Yao * >> @ Data Science Center, Hangzhou Research Institute, NetEase Corp. >> *a spark enthusiast* >> *kyuubi

Re: Welcoming six new Apache Spark committers

2021-04-01 Thread Maxim Gekk
Thank you all for your congratulations! Max Gekk On Tue, Mar 30, 2021 at 2:29 PM Jacek Laskowski wrote: > Hi, > > Congrats to all of you committers! Wishing you all the best (commits)! > > Pozdrawiam, > Jacek Laskowski > > https://about.me/JacekLaskowski > "The Internals Of" Online Books

Re: [VOTE] SPIP: Support pandas API layer on PySpark

2021-03-28 Thread Maxim Gekk
+1 (non-binding) On Sun, Mar 28, 2021 at 8:53 PM Matei Zaharia wrote: > +1 > > Matei > > On Mar 28, 2021, at 1:45 AM, Gengliang Wang wrote: > > +1 (non-binding) > > On Sun, Mar 28, 2021 at 11:12 AM Mridul Muralidharan > wrote: > >> +1 >> >> Regards, >> Mridul >> >> On Sat, Mar 27, 2021 at

Re: Property spark.sql.streaming.minBatchesToRetain

2021-03-09 Thread Maxim Gekk
/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L84 Maxim Gekk Software Engineer Databricks, Inc. On Tue, Mar 9, 2021 at 3:27 PM German Schiavon wrote: > Hello all, > > I wanted to ask if this property is still active? I can't find it in the &g

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-23 Thread Maxim Gekk
+1 (non-binding) On Wed, Feb 24, 2021 at 2:42 AM Cheng Su wrote: > +1 (non-binding) > > > > *From: *Takeshi Yamamuro > *Date: *Tuesday, February 23, 2021 at 3:30 PM > *To: *Hyukjin Kwon , dev > *Subject: *Re: [VOTE] Release Spark 3.1.1 (RC3) > > > > +1 > > > > On Wed, Feb 24, 2021 at 2:07 AM

Re: [VOTE] Release Spark 3.0.2 (RC1)

2021-02-16 Thread Maxim Gekk
+1 (non-binding) On Wed, Feb 17, 2021 at 9:54 AM Wenchen Fan wrote: > +1 > > On Wed, Feb 17, 2021 at 1:43 PM Dongjoon Hyun > wrote: > >> +1 >> >> Bests, >> Dongjoon. >> >> >> On Tue, Feb 16, 2021 at 2:27 AM Herman van Hovell >> wrote: >> >>> +1 >>> >>> On Tue, Feb 16, 2021 at 11:08 AM Hyukjin

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-02-02 Thread Maxim Gekk
ark/commit/278f6f45f46ccafc7a31007d51ab9cb720c9cb14, we had: io.file.buffer.size = 65536 Maxim Gekk Software Engineer Databricks, Inc. On Wed, Feb 3, 2021 at 2:37 AM Hyukjin Kwon wrote: > Yeah, agree. I changed. Thanks for the heads up. Tom. > > 2021년 2월 3일 (수) 오전 8:31, Tom Graves 님이

Re: [VOTE] Release Spark 3.1.1 (RC1)

2021-01-21 Thread Maxim Gekk
Also I am investigating a performance regression in some TPC-DS queries (q88 for instance) that is caused by a recent commit in 3.1, highly likely in the period from 19th November, 2020 to 18th December, 2020. Maxim Gekk Software Engineer Databricks, Inc. On Fri, Jan 22, 2021 at 10:45 AM

Re: The progress of DataSourceV2 based connector for JDBC?

2020-12-25 Thread Maxim Gekk
Hi Chang Chen, You can track progress in the umbrella JIRA: https://issues.apache.org/jira/browse/SPARK-24907 Maxim Gekk Software Engineer Databricks, Inc. On Fri, Dec 25, 2020 at 1:01 PM Chang Chen wrote: > Hi All > > Is there any plan for supporting JDBC DataSourceV2? >

Re: Welcoming some new Apache Spark committers

2020-07-15 Thread Maxim Gekk
Congratulations, Huaxin, Jungtaek, Dilip!!! Maxim Gekk Software Engineer Databricks, Inc. On Wed, Jul 15, 2020 at 8:34 PM Felix Cheung wrote: > Welcome! > > -- > *From:* Nick Pentreath > *Sent:* Tuesday, July 14, 2020 10:21:17 PM > *To:*

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Maxim Gekk
Hi Dongjoon, I would add: - Filters pushdown to JSON (https://github.com/apache/spark/pull/27366) - Filters pushdown to other datasources like Avro - Support nested attributes of filters pushed down to JSON Maxim Gekk Software Engineer Databricks, Inc. On Mon, Jun 29, 2020 at 7:07 PM

Re: [Spark SQL] Question about support for TimeType columns in Apache Parquet files

2020-06-26 Thread Maxim Gekk
d range checking and makes users apps error prone, from my point of view. Maxim Gekk Software Engineer Databricks, Inc. On Fri, Jun 26, 2020 at 3:48 PM Bart Samwel wrote: > On Fri, Jun 26, 2020 at 12:24 PM Maxim Gekk > wrote: > >> Hi Bart, >> >> > But is it useful

Re: [Spark SQL] Question about support for TimeType columns in Apache Parquet files

2020-06-26 Thread Maxim Gekk
:00, Morning TV show) (2020-08-02, 19:00, Soccer game) So, you can join your day schedule with the fact table and find out which TV shows you can watch when you are at home. Maxim Gekk Software Engineer Databricks, Inc. On Fri, Jun 26, 2020 at 12:40 PM Bart Samwel wrote: > I can't c

Re: [vote] Apache Spark 3.0 RC3

2020-06-07 Thread Maxim Gekk
+1 (non-binding) On Sun, Jun 7, 2020 at 2:34 PM Takeshi Yamamuro wrote: > +1 (non-binding) > > I don't see any ongoing PR to fix critical bugs in my area. > Bests, > Takeshi > > On Sun, Jun 7, 2020 at 7:24 PM Mridul Muralidharan > wrote: > >> +1 >> >> Regards, >> Mridul >> >> On Sat, Jun 6,

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-02 Thread Maxim Gekk
-1 (non-binding) The problem of compatibility with Spark 2.4 in reading/writing dates/timestamps hasn't been solved completely so far. In particular, the sub-task https://issues.apache.org/jira/browse/SPARK-31328 hasn't resolved yet. Maxim Gekk Software Engineer Databricks, Inc. On Wed, Apr

Re: [DISCUSSION] Esoteric Spark function `TRIM/LTRIM/RTRIM`

2020-02-15 Thread Maxim Gekk
case. Once we see the second case, we could output a warning with a notice about the order of parameters. Maxim Gekk Software Engineer Databricks, Inc. On Sat, Feb 15, 2020 at 5:04 PM Wenchen Fan wrote: > It's unfortunate that we don't have a clear document to talk about > breaking c

unify benchmarks in 2.4 and regenerate results

2020-02-04 Thread Maxim Gekk
: https://issues.apache.org/jira/browse/SPARK-25475 and regenerate all results in the same environment. This will allow to compare Spark 3.0 to Spark 2.4.x, and minor releases of 2.4.x. Maxim Gekk Software Engineer Databricks, Inc.

Re: [VOTE] Release Apache Spark 2.4.5 (RC2)

2020-02-04 Thread Maxim Gekk
+1 I re-ran some of existing benchmarks in branch-2.4 on Linux/MacOS, and haven't found any regressions compared to 2.4.4. Maxim Gekk On Tue, Feb 4, 2020 at 11:07 AM Takeshi Yamamuro wrote: > +1; > I run the tests with > `-Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pk

Re: How to implement a "saveAsBinaryFile" function?

2020-01-16 Thread Maxim Gekk
Hi Bing, You can try Text datasource. It shouldn't modify strings: scala> Seq(""""20192_1",1,24,0,2,”S66.000x001”""").toDS.write.text("tmp/text.txt") $ cat tmp/text.txt/part-0-256d960f-9f85-47fe-8edd-8428276eb3c6-c000.txt "2019

Re: proposal for expanded & consistent timestamp types

2018-12-07 Thread Maxim Gekk
Hello Imran, Thank you for bringing this problem up. I have faced to the issue of handling timestamps and dates when I implemented date/timestamp parsing in CSV/JSON datasource: https://github.com/apache/spark/pull/23150 https://github.com/apache/spark/pull/23196 Maxim Gekk Technical Solutions

Re: Random sampling in tests

2018-10-08 Thread Maxim Gekk
timezones. We basically just spend time and resources on testing the external libraries. I mean the PRs: https://github.com/apache/spark/pull/22657 and https://github.com/apache/spark/pull/22379#discussion_r223039662 Maxim Gekk Technical Solutions Lead Databricks B. V. <http://databricks.

Re: from_csv

2018-09-16 Thread Maxim Gekk
k? > > -- > excuse the brevity and lower case due to wrist injury > > > On Sat, Sep 15, 2018 at 2:58 AM Maxim Gekk > wrote: > >> Hi All, >> >> I would like to propose new function from_csv() for parsing columns >> containing strings in CSV format. Here is

from_csv

2018-09-15 Thread Maxim Gekk
ods should make a better user experience in parsing CSV-like columns. Please, share your thoughts. -- Maxim Gekk Technical Solutions Lead Databricks Inc. maxim.g...@databricks.com databricks.com <http://databricks.com/>