Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-30 Thread DB Tsai
+1 On Apr 29, 2024, at 8:01 PM, Wenchen Fan wrote:To add more color:Spark data source table and Hive Serde table are both stored in the Hive metastore and keep the data files in the table directory. The only difference is they have different "table provider", which means Spark will use different

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-16 Thread DB Tsai
+1Sent from my iPhoneOn Apr 16, 2024, at 3:11 PM, bo yang wrote:+1On Tue, Apr 16, 2024 at 1:38 PM Hyukjin Kwon wrote:+1On Wed, Apr 17, 2024 at 3:57 AM L. C. Hsieh wrote:+1 On Tue, Apr 16, 2024 at 4:08 AM Wenchen Fan wrote: > > +1 >

Re: Versioning of Spark Operator

2024-04-09 Thread DB Tsai
Aligning with Spark releases is sensible, as it allows us to guarantee that the Spark operator functions correctly with the new version while also maintaining support for previous versions. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Apr 9, 2024, at 9:45 AM, Mri

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-14 Thread DB Tsai
+1 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Nov 14, 2023, at 10:14 AM, Vakaris Baškirov > wrote: > > +1 (non-binding) > > On Tue, Nov 14, 2023 at 8:03 PM Chao Sun <mailto:sunc...@apache.org>> wrote: >> +1 >> >>

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-09 Thread DB Tsai
Kubernetes operator is essential for our Spark community as well. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Nov 9, 2023, at 12:05 PM, Zhou Jiang wrote: > > Hi Spark community, > I'm reaching out to initiate a conversation about the possibility of > de

Re: [VOTE][SPIP] Lazy Materialization for Parquet Read Performance Improvement

2023-02-14 Thread DB Tsai
+1 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Feb 14, 2023, at 8:29 AM, Guo Weijie wrote: > > +1 > > Yuming Wang mailto:wgy...@gmail.com>> 于2023年2月14日周二 > 15:58写道: >> +1 >> >> On Tue, Feb 14, 2023 at 11:27 AM Prem Sahoo >

Re: [DISCUSS] SPIP: Lazy Materialization for Parquet Read Performance Improvement

2023-01-31 Thread DB Tsai
+1Sent from my iPhoneOn Jan 31, 2023, at 4:16 PM, Yuming Wang wrote:+1.On Wed, Feb 1, 2023 at 7:42 AM kazuyuki tanimura wrote:Great! Much appreciated, Mitch! KazuOn Jan 31, 2023, at 3:07 PM, Mich Talebzadeh wrote:Thanks, Kazu.I followed that template link and indeed

Re: [ANNOUNCE] Apache Spark 3.2.1 released

2022-01-28 Thread DB Tsai
Thank you, Huaxin for the 3.2.1 release! Sent from my iPhone > On Jan 28, 2022, at 5:45 PM, Chao Sun wrote: > >  > Thanks Huaxin for driving the release! > >> On Fri, Jan 28, 2022 at 5:37 PM Ruifeng Zheng wrote: >> It's Great! >> Congrats and thanks, huaxin! >> >> >> --

Re: Apache Spark Jenkins Infra 2022

2022-01-09 Thread DB Tsai
Thank you, Dongjoon for driving the build infra. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Jan 9, 2022, at 6:38 PM, shane knapp ☠ wrote: > > > apache spark jenkins lives on! > > @dongjoon, let me know if there's anything you need. ni

Re: [VOTE] SPIP: Row-level operations in Data Source V2

2021-11-12 Thread DB Tsai
----- > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> > >> > > - > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1

Re: [VOTE] SPIP: Storage Partitioned Join for Data Source V2

2021-10-29 Thread DB Tsai
+1 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 On Fri, Oct 29, 2021 at 11:42 AM Ryan Blue wrote: > +1 > > On Fri, Oct 29, 2021 at 11:06 AM huaxin gao > wrote: > >> +1 >> >> On Fri, Oct 29, 2021 at 10:59 AM Dongjoon Hyun >> wrote: >&g

Re: [DISCUSS] SPIP: Storage Partitioned Join for Data Source V2

2021-10-24 Thread DB Tsai
forward to it as a new feature in Spark 3.3 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 On Fri, Oct 22, 2021 at 12:18 PM Chao Sun wrote: > > Hi, > > Ryan and I drafted a design doc to support a new type of join: storage > partitioned join which covers buck

Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-11 Thread DB Tsai
+1 DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 On Mon, Oct 11, 2021 at 6:01 AM Almeida, (Ricardo) wrote: > > +1 (non-binding) > > > > Ricardo Almeida > > > > From: Xiao Li > Sent: Monday, October 11, 2021 9:09 AM > To: Yi Wu > Cc: Ho

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-31 Thread DB Tsai
Hello Xiao, there are multiple patches in Spark 3.2 depending on parquet 1.12, so it might be easier to wait for the fix in parquet community instead of reverting all the related changes. The fix in parquet community is very trivial, and we hope that it will not take too long. Thanks. DB Tsai

Re: [DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread DB Tsai
+1 on renaming. DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > On Jun 24, 2021, at 11:41 AM, Chao Sun wrote: > > Hi, > > As Spark master has upgraded to Hadoop-3.3.1, the current Maven profile name > hadoop-3.2 is no longer accurate, and it may confus

Re: [VOTE] Release Spark 2.4.8 (RC3)

2021-04-28 Thread DB Tsai
+1 (binding) > On Apr 28, 2021, at 9:26 AM, Liang-Chi Hsieh wrote: > > > Please vote on releasing the following candidate as Apache Spark version > 2.4.8. > > The vote is open until May 4th at 9AM PST and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 votes. > > [ ] +1

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-14 Thread DB Tsai
+1 (binding) DB Tsai | ACS Spark Core |  Apple, Inc. > On Apr 14, 2021, at 10:42 AM, Wenchen Fan wrote: > > +1 (binding) > > On Thu, Apr 15, 2021 at 12:22 AM Maxim Gekk <mailto:maxim.g...@databricks.com>> wrote: > +1 (non-binding) > > On Wed, Apr

Re: [DISCUSS] Add RocksDB StateStore

2021-02-08 Thread DB Tsai
econd >>> one, we propose (SPARK-34198) to add it as an external module to relieve the >>> dependency concern. >>> >>> Because it was pushed back previously, I'm going to raise this discussion to >>> know what people think about it now, in advance of submitt

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread DB Tsai
>> >> Active discussions on the jira and SPIP document have settled. >> >> I will leave the vote open until Friday (the 18th September 2020), 5pm >> CST. >> >> [ ] +1: Accept the proposal as an official SPIP >> [ ] +0 >> [ ] -1: I don't think this is a good idea because ... >> >> >> Thanks, >> Mridul >> > -- Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1

Re: [VOTE] Decommissioning SPIP

2020-07-02 Thread DB Tsai
; >>> +1 for having this feature in Spark >>> >>> >>> >>> -- >>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ >>> >>> ----- &

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-13 Thread DB Tsai
At the job level sure, but upgrading large jobs, possibly written in Scala >> 2.11, whole-hog as it currently stands is not a small matter. >> >> On Fri, Jun 12, 2020 at 9:40 PM DB Tsai wrote: >> +1 for a 2.x release with DSv2, JDK11, and Scala 2.11 support >>

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread DB Tsai
forward using new features. Afterall, the reason why we are working on OSS is we like people to use our code, isn't it? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Fri, Jun 12, 2020 at 8:51 PM Jungtaek

Re: [vote] Apache Spark 3.0 RC3

2020-06-08 Thread DB Tsai
+1 (binding) Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Mon, Jun 8, 2020 at 1:03 PM Dongjoon Hyun wrote: > > +1 > > Thanks, > Dongjoon. > > On Mon, Jun 8, 2020 at 6:37 AM Russ

Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-05-31 Thread DB Tsai
+1 (binding), thanks! On Sun, May 31, 2020 at 9:23 PM Wenchen Fan wrote: > +1 (binding), although I don't know why we jump from RC 3 to RC 8... > > On Mon, Jun 1, 2020 at 7:47 AM Holden Karau wrote: > >> Please vote on releasing the following candidate as Apache Spark >> version 2.4.6. >> >>

Re: [VOTE] Release Spark 2.4.6 (RC3)

2020-05-18 Thread DB Tsai
' code when upgrading from Scala 2.11 to Scala 2.12. Thanks, Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 Sincerely, DB Tsai -- Web: https

Re: [VOTE] Release Spark 2.4.6 (RC3)

2020-05-17 Thread DB Tsai
+1 as well. Thanks. On Sun, May 17, 2020 at 7:39 AM Sean Owen wrote: > +1 , same response as to the last RC. > This looks like it includes the fix discussed last time, as well as a > few more small good fixes. > > On Sat, May 16, 2020 at 12:08 AM Holden Karau > wrote: > > > > Please vote on

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-14 Thread DB Tsai
+1 Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Tue, Jan 14, 2020 at 11:08 AM Sean Owen wrote: > > Yeah it's something about the env I spun up, but I don't know what. It > happens f

Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-28 Thread DB Tsai
+1 Thanks! On Wed, Aug 28, 2019 at 7:14 AM Wenchen Fan wrote: > +1, no more blocking issues that I'm aware of. > > On Wed, Aug 28, 2019 at 8:33 PM Sean Owen wrote: > >> +1 from me again. >> >> On Tue, Aug 27, 2019 at 6:06 PM Dongjoon Hyun >> wrote: >> > >> > Please vote on releasing the

[DISCUSSION]JDK11 for Apache 2.x?

2019-08-27 Thread DB Tsai
is not desired in minor release? Thanks. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Release Apache Spark 2.3.4 (RC1)

2019-08-27 Thread DB Tsai
+1 Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Tue, Aug 27, 2019 at 11:31 AM Dongjoon Hyun wrote: > > +1. > > I also verified SHA/GPG and tested UTs on AdoptOpenJDKu8_222/CentOS6.9 wit

Re: JDK11 Support in Apache Spark

2019-08-24 Thread DB Tsai
Congratulations on the great work! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Sat, Aug 24, 2019 at 8:11 AM Dongjoon Hyun wrote: > > Hi, All. > > Thanks to your many many contributions, &g

Re: Release Apache Spark 2.4.4

2019-08-13 Thread DB Tsai
+1 On Tue, Aug 13, 2019 at 4:16 PM Dongjoon Hyun wrote: > > Hi, All. > > Spark 2.4.3 was released three months ago (8th May). > As of today (13th August), there are 112 commits (75 JIRAs) in `branch-24` > since 2.4.3. > > It would be great if we can have Spark 2.4.4. > Shall we start `2.4.4

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-24 Thread DB Tsai
processing support, I can imagine that the heavy lifting parts of ML applications (such as computing the objective functions) can be written as columnar expressions that leverage on SIMD architectures to get a good speedup. Sincerely, DB Tsai

[ANNOUNCE] Announcing Apache Spark 2.4.1

2019-04-05 Thread DB Tsai
+user list We are happy to announce the availability of Spark 2.4.1! Apache Spark 2.4.1 is a maintenance release, based on the branch-2.4 maintenance branch of Spark. We strongly recommend all 2.4.0 users to upgrade to this stable release. In Apache Spark 2.4.1, Scala 2.12 support is GA, and

[ANNOUNCE] Announcing Apache Spark 2.4.1

2019-04-04 Thread DB Tsai
. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-31 Thread DB Tsai
This vote passes! +1: Wenchen Fan (binding) Sean Owen (binding) Mihaly Toth DB Tsai (binding) Jonatan Jäderberg Xiao Li (binding) Denny Lee Felix Cheung (binding) +0: None -1: None It's the largest RC ever; I will follow up with an official release announcement soon. Thank you all for your

Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-28 Thread DB Tsai
+1 from myself On Thu, Mar 28, 2019 at 3:14 AM Mihaly Toth wrote: > +1 (non-binding) > > Thanks, Misi > > Sean Owen ezt írta (időpont: 2019. márc. 28., Cs, > 0:19): > >> +1 from me - same as last time. >> >> On Wed, Mar 27, 2019 at 1:31 PM DB Tsai wr

[VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-27 Thread DB Tsai
typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue. DB Tsai | Siri Open Source Te

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread DB Tsai
RC9 was just cut. Will send out another thread once the build is finished. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Mon, Mar 25, 2019 at 5:10 PM Sean Owen wrote: > > That's all merged now. I

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread DB Tsai
I am going to cut a 2.4.1 rc9 soon tonight. Besides SPARK-26961 https://github.com/apache/spark/pull/24126 , anything critical that we have to wait for 2.4.1 release? Thanks! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-24 Thread DB Tsai
Hello Sean, By looking at SPARK-26961 PR, seems it's ready to go. Do you think we can merge it into 2.4 branch soon? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Sat, Mar 23, 2019 at 12:04 PM Sean Owen

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-23 Thread DB Tsai
-1 I will fail RC8, and cut another RC9 on Monday to include SPARK-27160, SPARK-27178, SPARK-27112. Please let me know if there is any critical PR that has to be back-ported into branch-2.4. Thanks. Sincerely, DB Tsai -- Web: https

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-22 Thread DB Tsai
branch-2.4, can you make anther PR against branch-2.4 so we can include the ORC fix in 2.4.1? Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 42E5B25A8F7A82C1 On Wed, Mar 20, 2019 at 9:11 PM Felix Cheung wrote

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread DB Tsai
the differences between RC8 and 2.4.0 are big? If an issue is found to justify to fail RC8, we can include SPARK-27112 and SPARK-27160 in next cut. Thus, even we decide to cut another RC, it will be easier to test. Thanks. Sincerely, DB Tsai -- Web

[VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-19 Thread DB Tsai
typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue. DB Tsai | Siri Open Source Te

Re: [discuss] 2.4.1-rcX release, k8s client PRs, build system infrastructure update

2019-03-14 Thread DB Tsai
Since rc8 was already cut without the k8s client upgrade; the build is ready to vote, and including k8s client upgrade in 2.4.1 implies that we will drop the old-but-not-that-old K8S versions as Sean mentioned, should we include this upgrade in 2.4.2? Thanks. Sincerely, DB Tsai

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-10 Thread DB Tsai
As we have many important fixes in 2.4 branch which we want to release asap, and this is is not a regression from Spark 2.4; as a result, 2.4.1 will be not blocked by this. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread DB Tsai
Since I can not find the commit of `Preparing development version 2.4.2-SNAPSHOT` after rc6 cut, it's very risky to fix the branch and do a force-push. I'll follow Marcelo's suggestion to have another rc7 cut. Thus, this vote fails. DB Tsai | Siri Open Source Technologies [not a contribution

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread DB Tsai
Okay, I see the problem. rc6 tag is not in the 2.4 branch. It's very weird. It must be overwritten by a force push. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Mar 8, 2019, at 11:39 AM, DB Tsai wrote: > > I was using `./do-release-docker.sh`

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread DB Tsai
of using the same commit causing this issue. Should we create a new rc7? DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Mar 8, 2019, at 10:54 AM, Marcelo Vanzin > wrote: > > I personally find it a little weird to not have the commit i

[VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-07 Thread DB Tsai
typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue. DB Tsai | Siri Open Source Technologies [not a co

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-07 Thread DB Tsai
-streaming-flume-assembly_2.11-2.4.1-tests.jar', check the logs.* I am sure my key is in the key server, and the weird thing is that it fails on different jars each time I ran the publish script. Sincerely, DB Tsai -- Web: https://www.dbtsai.com

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-21 Thread DB Tsai
I am cutting a new rc4 with fix from Felix. Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0359BC9965359766 On Thu, Feb 21, 2019 at 8:57 AM Felix Cheung wrote: > > I merged the fix

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-20 Thread DB Tsai
Okay. Let's fail rc2, and I'll prepare rc3 with SPARK-26859. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Feb 20, 2019, at 12:11 PM, Marcelo Vanzin > wrote: > > Just wanted to point out that > https://issues.apache.org/jira/bro

[VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-20 Thread DB Tsai
y not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue. DB Tsai | Siri Open Source Technologies [not a co

Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread DB Tsai
Great. I'll prepare the release for voting. Thanks! DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Feb 12, 2019, at 4:11 AM, Wenchen Fan wrote: > > +1 for 2.4.1 > > On Tue, Feb 12, 2019 at 7:55 PM Hyukjin Kwon wrote: > +1 for 2.4.1 &g

Time to cut an Apache 2.4.1 release?

2019-02-11 Thread DB Tsai
Hello all, I am preparing to cut a new Apache 2.4.1 release as there are many bugs and correctness issues fixed in branch-2.4. The list of addressed issues are

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-23 Thread DB Tsai
-1 Agreed with Anton that this bug will potentially corrupt the data silently. As he is ready to submit a PR, I'll suggest to wait to include the fix. Thanks! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-08 Thread DB Tsai
+1 Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0 On Tue, Jan 8, 2019 at 11:14 AM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.2.3. >

Re: Automated formatting

2018-11-21 Thread DB Tsai
I like the idea of checking only the diff. Even I am sometimes confused about the right style in Spark since I am working on multiple projects with slightly different coding styles. On Wed, Nov 21, 2018 at 1:36 PM Sean Owen wrote: > I know the PR builder runs SBT, but I presume this would just

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-21 Thread DB Tsai
+1 on removing Scala 2.11 support for 3.0 given Scala 2.11 is already EOL. On Tue, Nov 20, 2018 at 2:53 PM Sean Owen wrote: > PS: pull request at https://github.com/apache/spark/pull/23098 > Not going to merge it until there's clear agreement. > > On Tue, Nov 20, 2018 at 10:16 AM Ryan Blue

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-08 Thread DB Tsai
if we want to change the alternative Scala version to 2.13 and drop 2.11 if we just want to support two Scala versions at one time. Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0 On Wed, Nov 7, 2018

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread DB Tsai
Ideally, supporting only Scala 2.12 in Spark 3 will be ideal. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Nov 6, 2018, at 2:55 PM, Felix Cheung wrote: > > So to clarify, only scala 2.12 is supported in Spark 3? > > > From: Ryan Blu

Re: Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread DB Tsai
agree with Sean that this can make the decencies really complicated; hence I support to drop Scala 2.11 in Spark 3.0 directly. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Nov 6, 2018, at 11:38 AM, Sean Owen wrote: > > I think we should make S

Re: Test and support only LTS JDK release?

2018-11-06 Thread DB Tsai
OpenJDK will follow Oracle's release cycle, https://openjdk.java.net/projects/jdk/ <https://openjdk.java.net/projects/jdk/>, a strict six months model. I'm not familiar with other non-Oracle VMs and Redhat support. DB Tsai | Siri Open Source Technologies [not a contribution] |  Appl

Test and support only LTS JDK release?

2018-11-06 Thread DB Tsai
Given Oracle's new 6-month release model, I feel the only realistic option is to only test and support JDK such as JDK 11 LTS and future LTS release. I would like to have a discussion on this in Spark community. Thanks, DB Tsai | Siri Open Source Technologies [not a contribution

Make Scala 2.12 as default Scala version in Spark 3.0

2018-11-06 Thread DB Tsai
to work on bugs and issues that we may run into. What do you think? Thanks, DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Java 11 support

2018-11-06 Thread DB Tsai
Given Oracle's new 6-month release model, I think the only realistic option is to only support and test LTS JDK. I'll send out two separate emails to dev to facilitate the discussion. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Nov 6, 2018, at 9:47 AM, sh

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-30 Thread DB Tsai
are selected simultaneously. https://issues.apache.org/jira/browse/SPARK-25879 If we decide to not fix it in 2.4, we should at least document it in the release note to let users know. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID

Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread DB Tsai
I'll +1 on removing those legacy mllib code. Many users are confused about the APIs, and some of them have weird behaviors (for example, in gradient descent, the intercept is regularized which supports not to). DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc

Re: Scala 2.12 support

2018-06-07 Thread DB Tsai
4 PM, Holden Karau >> wrote: >> > I agree that's a little odd, could we not add the bacspace terminal >> > character? Regardless even if not, I don't think that should be a >> blocker >> > for 2.12 support especially since it doesn't degrade the 2.11 >&g

Re: Scala 2.12 support

2018-06-07 Thread DB Tsai
ark context Web UI available at http://192.168.1.169:4040 Spark context available as 'sc' (master = local[*], app id = local-1528180279528). Spark session available as 'spark’. scala> DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Jun 7, 2018, at 5:49 P

Re: Scala 2.12 support

2018-06-07 Thread DB Tsai
a blocker for us to move to newer version of Scala 2.12.x since the newer version of Scala 2.12.x has the same issue. In my opinion, Scala should fix the root cause and provide a stable hook for 3rd party developers to initialize their custom code. DB Tsai | Siri Open Source Technologies

Re: [MLLib] Logistic Regression and standadization

2018-04-24 Thread DB Tsai
, and the result should match R. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Apr 20, 2018, at 5:56 PM, Weichen Xu <weichen...@databricks.com> wrote: > > Right. If regularization item isn't zero, then enable/disable standardization > will ge

Re: Will higher order functions in spark SQL be pushed upstream?

2017-10-10 Thread DB Tsai
gt; datatypes? >> >> >> For parquet, this effort is primarily tracked via SPARK-4502 (see >> https://github.com/apache/spark/pull/16578) and is currently targeted for >> 2.3. -- Sincerely, DB Tsai -- PGP Key ID: 0x5CED8B896A6BDFA0 - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Welcoming Tejas Patil as a Spark committer

2017-10-06 Thread DB Tsai
Congratulations! On Wed, Oct 4, 2017 at 6:55 PM, Liwei Lin wrote: > Congratulations! > > Cheers, > Liwei > > On Wed, Oct 4, 2017 at 2:27 PM, Yuval Itzchakov wrote: >> >> Congratulations and Good luck! :) >> >> >> >> -- >> Sent from:

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-06 Thread DB Tsai
+1 Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x5CED8B896A6BDFA0 On Fri, Oct 6, 2017 at 7:46 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: > Thanks Nick, Hyukjin. Yes this seems to be a longer stand

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-10 Thread DB Tsai
-1 I think that back-porting SPARK-20270 <https://github.com/apache/spark/pull/17577> and SPARK-18555 <https://github.com/apache/spark/pull/15994> are very important since it's a critical bug that na.fill will mess up the data in Long even the data isn't null. Thanks. Sincere

Re: welcoming Xiao Li as a committer

2016-10-05 Thread DB Tsai
Congrats, Xiao! Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0x9DCC1DBD7FC7BBB2 On Wed, Oct 5, 2016 at 2:36 PM, Fred Reiss <freiss@gmail.com> wrote: > Congratulations, Xiao! > > Fred > > > On

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-06 Thread DB Tsai
+1 for renaming the jar file. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Tue, Apr 5, 2016 at 8:02 PM, Chris Fregly <ch...@fregly.com> wrote: > perhaps renaming to Spark ML would actually clea

Re: Export BLAS module on Spark MLlib

2015-11-30 Thread DB Tsai
maintenance cost. Once it's getting mature, and people are asking for them, we will gradually make them public. Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sat, Nov 28, 2015 at 5:20 AM, Sasaki Kai

Re: Export BLAS module on Spark MLlib

2015-11-30 Thread DB Tsai
I used reflection initially, but I found it's very slow especially in a tight loop. Maybe caching the reflection can help which I never try. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon, Nov 30, 2015

Re: [Spark MLlib] about linear regression issue

2015-11-01 Thread DB Tsai
ear regression, but currently, there is no open source implementation in Spark. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Sun, Nov 1, 2015 at 9:22 AM, Zhiliang Zhu <zchl.j...@yahoo.com> wrote: > Dear All,

Re: Spark Implementation of XGBoost

2015-10-27 Thread DB Tsai
shrinkage). Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon, Oct 26, 2015 at 8:37 PM, Meihua Wu <rotationsymmetr...@gmail.com> wrote: > Hi DB Tsai, > > Thank you very much fo

Re: Spark Implementation of XGBoost

2015-10-26 Thread DB Tsai
Also, does it support categorical feature? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon, Oct 26, 2015 at 4:06 PM, DB Tsai <dbt...@dbtsai.com> wrote: > Interesting. For feature sub-sampling,

Re: Spark Implementation of XGBoost

2015-10-26 Thread DB Tsai
Interesting. For feature sub-sampling, is it per-node or per-tree? Do you think you can implement generic GBM and have it merged as part of Spark codebase? Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Mon

Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-10-22 Thread DB Tsai
There is a JIRA for this. I know Holden is interested in this. On Thursday, October 22, 2015, YiZhi Liu wrote: > Would someone mind giving some hint? > > 2015-10-20 15:34 GMT+08:00 YiZhi Liu >: > > Hi all, > > > > I noticed that in

Re: What is the difference between ml.classification.LogisticRegression and mllib.classification.LogisticRegressionWithLBFGS

2015-10-12 Thread DB Tsai
those code to share more.) Sincerely, DB Tsai -- Blog: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D <https://pgp.mit.edu/pks/lookup?search=0x59DF55B8AF08DF8D> On Mon, Oct 12, 2015 at 1:24 AM, YiZhi Liu <javeli...@gmail.com>

Re: MLlib: Anybody working on hierarchical topic models like HLDA?

2015-06-03 Thread DB Tsai
Is your HDP implementation based on distributed gibbs sampling? Thanks. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Wed, Jun 3, 2015 at 8:13 PM, Yang, Yuhao yuhao.y...@intel.com wrote: Hi Lorenz, I’m trying to build

Re: spark packages

2015-05-23 Thread DB Tsai
I thought LGPL is okay but GPL is not okay for Apache project. On Saturday, May 23, 2015, Patrick Wendell pwend...@gmail.com wrote: Yes - spark packages can include non ASF licenses. On Sat, May 23, 2015 at 6:16 PM, Debasish Das debasish.da...@gmail.com javascript:; wrote: Hi, Is it

Re: Regularization in MLlib

2015-04-14 Thread DB Tsai
Hi Theodore, I'm currently working on elastic-net regression in ML framework, and I decided not to have any extra layer of abstraction for now but focus on accuracy and performance. We may come out with proper solution later. Any idea is welcome. Sincerely, DB Tsai

Re: Regularization in MLlib

2015-04-07 Thread DB Tsai
. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Tue, Apr 7, 2015 at 3:03 PM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, Could anyone elaborate on the regularization in Spark? I've found that L1 and L2 are implemented

Re: LogisticGradient Design

2015-03-25 Thread DB Tsai
to avoid the second cache. In this case, the code will be more complicated, so I will split the code into two paths. Will be done in another PR. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Wed, Mar 25, 2015 at 11:57 AM, Joseph Bradley

Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?

2015-03-15 Thread DB Tsai
It's a bug in breeze's side. Once David fixes it and publishes it to maven, we can upgrade to breeze 0.11.2. Please file a jira ticket for this issue. thanks. Sincerely, DB Tsai --- Blog: https://www.dbtsai.com On Sun, Mar 15, 2015 at 12:45

Re: LinearRegressionWithSGD accuracy

2015-01-28 Thread DB Tsai
Hi Robin, You can try this PR out. This has built-in features scaling, and has ElasticNet regularization (L1/L2 mix). This implementation can stably converge to model from R's glmnet package. https://github.com/apache/spark/pull/4259 Sincerely, DB Tsai

Re: Maximum size of vector that reduce can handle

2015-01-23 Thread DB Tsai
are small. By default, depth 2 is used, so if you have so many partitions of large vector, this may still cause issue. You can increase the depth into higher numbers such that in the final reduce in driver, the number of partitions are very small. Sincerely, DB Tsai

Re: LinearRegressionWithSGD accuracy

2015-01-17 Thread DB Tsai
I'm working on LinearRegressionWithElasticNet using OWLQN now. This will do the data standardization internally so it's transparent to users. With OWLQN, you don't have to manually choose stepSize. Will send out PR soon next week. Sincerely, DB Tsai

CrossValidator API in new spark.ml package

2014-12-12 Thread DB Tsai
Hi Xiangrui, It seems that it's stateless so will be hard to implement regularization path. Any suggestion to extend it? Thanks. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai

Re: CrossValidator API in new spark.ml package

2014-12-12 Thread DB Tsai
Okay, I got it. In Estimator, fit(dataset: SchemaRDD, paramMaps: Array[ParamMap]): Seq[M] can be overwritten to implement regularization path. Correct me if I'm wrong. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https

Re: Protobuf version in mvn vs sbt

2014-12-05 Thread DB Tsai
As Marcelo said, CDH5.3 is based on hadoop 2.3, so please try ./make-distribution.sh -Pyarn -Phive -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.1.3 -DskipTests See the detail of how to change the profile at https://spark.apache.org/docs/latest/building-with-maven.html Sincerely, DB Tsai

  1   2   >