Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-08 Thread Cheng Su
+1 for this release candidate. Thanks, Cheng Su From: 郑瑞峰 Date: Monday, February 8, 2021 at 10:58 PM To: Gengliang Wang , Sean Owen Cc: gurwls223 , Yuming Wang , dev Subject: 回复: [VOTE] Release Spark 3.1.1 (RC2) +1 (non-binding) Thank you, Hyukjin -- 原始邮件

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread DB Tsai
+1 to add it as an external module so people can test it out and give feedback easier. On Mon, Feb 8, 2021 at 10:22 PM Gabor Somogyi wrote: > > +1 adding it any way. > > On Mon, 8 Feb 2021, 21:54 Holden Karau, wrote: >> >> +1 for an external module. >> >> On Mon, Feb 8, 2021 at 11:51 AM Cheng

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread Jungtaek Lim
+1 to add, no matter to add under sql-core vs external module. Rationalization for myself: * The discussion thread and voices here show strong demand for adding RocksDB state store out of the box. * No workaround on huge state store problem out of the box. Direct competitors on streaming

回复: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-08 Thread 郑瑞峰
+1 (non-binding) Thank you, Hyukjin -- 原始邮件 -- 发件人: "Gengliang Wang"

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread Gabor Somogyi
+1 adding it any way. On Mon, 8 Feb 2021, 21:54 Holden Karau, wrote: > +1 for an external module. > > On Mon, Feb 8, 2021 at 11:51 AM Cheng Su wrote: > >> +1 for (2) adding to external module. >> >> I think this feature is useful and popular in practice, and option 2 is >> not conflict with

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-08 Thread Gengliang Wang
+1 On Tue, Feb 9, 2021 at 1:39 PM Sean Owen wrote: > Same result as last time for me, +1. Tested with Java 11. > I fixed the two issues without assignee; one was WontFix though. > > On Mon, Feb 8, 2021 at 7:43 PM Hyukjin Kwon wrote: > >> Let's set the assignees properly then. Shouldn't be a

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-08 Thread Sean Owen
Same result as last time for me, +1. Tested with Java 11. I fixed the two issues without assignee; one was WontFix though. On Mon, Feb 8, 2021 at 7:43 PM Hyukjin Kwon wrote: > Let's set the assignees properly then. Shouldn't be a problem for the > release. > > On Tue, 9 Feb 2021, 10:40 Yuming

ASF board report for February 2021

2021-02-08 Thread Matei Zaharia
It’s time to prepare our quarterly ASF board report, which we need to submit on Feb 10th. The last one was in November. I’ve written a draft here, but let me know if you want to add any more content that I’ve missed. == Apache Spark is a fast and general engine for large-scale

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-08 Thread Hyukjin Kwon
Let's set the assignees properly then. Shouldn't be a problem for the release. On Tue, 9 Feb 2021, 10:40 Yuming Wang, wrote: > > Many tickets do not have correct assignee: > >

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-08 Thread Yuming Wang
Many tickets do not have correct assignee: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20in%20(3.1.0%2C%203.1.1)%20AND%20(assignee%20is%20EMPTY%20or%20assignee%20%3D%20apachespark) On Tue, Feb 9, 2021 at 9:05 AM

Re: [VOTE] Release Spark 3.1.1 (RC2)

2021-02-08 Thread Hyukjin Kwon
+1 (binding) from myself too. 2021년 2월 9일 (화) 오전 9:28, Kent Yao 님이 작성: > > +1 > > *Kent Yao * > @ Data Science Center, Hangzhou Research Institute, NetEase Corp. > *a spark enthusiast* > *kyuubi is a unified multi-tenant JDBC > interface for large-scale data

Re:[VOTE] Release Spark 3.1.1 (RC2)

2021-02-08 Thread Kent Yao
+1 Kent Yao @ Data Science Center, Hangzhou Research Institute, NetEase Corp.a spark enthusiastkyuubiis a unified multi-tenant JDBC

[VOTE] Release Spark 3.1.1 (RC2)

2021-02-08 Thread Hyukjin Kwon
Please vote on releasing the following candidate as Apache Spark version 3.1.1. The vote is open until February 15th 5PM PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. Note that it is 7 days this time because it is a holiday season in several countries

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-08 Thread Ryan Blue
Wenchen, There are a few issues with the Invoke approach, and I don’t think that it is really much better for the additional complexity of the API. First I think that you’re confusing codegen to call a function with codegen to implement a function. The non-goal refers to supporting codegen to

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread Holden Karau
+1 for an external module. On Mon, Feb 8, 2021 at 11:51 AM Cheng Su wrote: > +1 for (2) adding to external module. > > I think this feature is useful and popular in practice, and option 2 is > not conflict with previous concern for dependency. > > > > Thanks, > > Cheng Su > > > > *From:

[Spark Streaming] [DISCUSS] Clear metadata method and Generate Batches using same Event Loop

2021-02-08 Thread Karthikeyan Ravi
Hello, Our system observed this behaviour of Batches getting delayed for generation in spark streaming and thereby creating a very big batch and followed by few zero record batches. I read the code and added logs to confirm this root cause details below 1. GenerateBatch(Recurring Timer) and

Re: [DISCUSS] SPIP: FunctionCatalog

2021-02-08 Thread Wenchen Fan
This is a very important feature, thanks for working on it! Spark uses codegen by default, and it's a bit unfortunate to see that codegen support is treated as a non-goal. I think it's better to not ask the UDF implementations to provide two different code paths for interpreted evaluation and

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread Cheng Su
+1 for (2) adding to external module. I think this feature is useful and popular in practice, and option 2 is not conflict with previous concern for dependency. Thanks, Cheng Su From: Dongjoon Hyun Date: Monday, February 8, 2021 at 10:39 AM To: Jacek Laskowski Cc: Liang-Chi Hsieh , dev

Re: Binary compatibility issues in 3.1.1?

2021-02-08 Thread Wenchen Fan
This is the cost of relying on Spark internal APIs, and the external connectors need to take care of it. BTW, the Alias change is source-compatible, and it shouldn't break the external connectors if they are compiled with Spark 3.1. On Tue, Feb 9, 2021 at 2:26 AM Alex Ott wrote: > although no,

[DISCUSS] SPIP: FunctionCatalog

2021-02-08 Thread Ryan Blue
Hi everyone, I'd like to start a discussion for adding a FunctionCatalog interface to catalog plugins. This will allow catalogs to expose functions to Spark, similar to how the TableCatalog interface allows a catalog to expose tables. The proposal doc is available here:

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread Dongjoon Hyun
Thank you, Liang-chi and all. +1 for (2) external module design because it can deliver the new feature in a safe way. Bests, Dongjoon On Mon, Feb 8, 2021 at 9:00 AM Jacek Laskowski wrote: > Hi, > > I'm "okay to add RocksDB StateStore as external module". See no reason not > to. > >

Re: Binary compatibility issues in 3.1.1?

2021-02-08 Thread Alex Ott
although no, additional constructor won't work... On Mon, Feb 8, 2021 at 7:01 PM Alex Ott wrote: > Hi all > > I've noticed following SO question about Spark 3.1.1 not working with > Delta 0.7.0: >

Binary compatibility issues in 3.1.1?

2021-02-08 Thread Alex Ott
Hi all I've noticed following SO question about Spark 3.1.1 not working with Delta 0.7.0: https://stackoverflow.com/questions/66106096/delta-lake-insert-into-sql-in-pyspark-is-failing-with-java-lang-nosuchmethoder/66106800#66106800 - I checked with Delta 0.8.0 and it has the same problem. It

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread Jacek Laskowski
Hi, I'm "okay to add RocksDB StateStore as external module". See no reason not to. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books Follow me on https://twitter.com/jaceklaskowski

Re: Hyperparameter Optimization via Randomization

2021-02-08 Thread Sean Owen
It seems pretty reasonable to me. If it's a pull request we can code review it. My only question is just, would it be better to tell people to use hyperopt, and how much better is this than implementing randomization on the grid. But the API change isn't significant so maybe just fine. On Mon,

Re: Hyperparameter Optimization via Randomization

2021-02-08 Thread Phillip Henry
Hi, Sean. I don't think sampling from a grid is a good idea as the min/max may lie between grid points. Unconstrained random sampling avoids this problem. To this end, I have an implementation at: https://github.com/apache/spark/compare/master...PhillHenry:master It is unit tested and does not