Re: [VOTE] Release Spark 2.4.6 (RC8)

2020-06-03 Thread Xiao Li
Just downloaded it in my local macbook. Trying to create a table using the pre-built PySpark. It sounds like the conf "spark.sql.warehouse.dir" does not take an effect. It is trying to create a directory in "file:/user/hive/warehouse/t1". I have not done any investigation yet. Have any of you hit

Re: [VOTE] Apache Spark 3.0 RC2

2020-05-22 Thread Xiao Li
Thanks for reporting these issues! Please continue to test RC2 and report more issues. Cheers, Xiao On Fri, May 22, 2020 at 7:40 AM Koert Kuipers wrote: > i would like to point out that SPARK-27194 is a fault tolerance bug that > causes jobs to fail when any single task is retried. for us

Re: [VOTE] Release Spark 2.4.6 (RC3)

2020-05-18 Thread Xiao Li
This RC does not include the correctness bug fix https://github.com/apache/spark/commit/a4885f3654899bcb852183af70cc0a82e7dd81d0 which is just after RC3 cut. On Mon, May 18, 2020 at 7:21 AM Tom Graves wrote: > +1. > > Tom > > On Monday, May 18, 2020, 08:05:24 AM CDT, Wenchen Fan > wrote: > > >

Re: [build system] jenkins rebooting now

2020-05-14 Thread Xiao Li
Thank you, Shane! On Thu, May 14, 2020 at 9:50 AM shane knapp ☠ wrote: > we're back. doesn't seem to have fixed the issue of the workers > connecting to repository.apache.org but i'm still investigating. > > On Thu, May 14, 2020 at 9:11 AM shane knapp ☠ wrote: > >> that is all. >> >> -- >>

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread Xiao Li
> > 1. Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default, > which effectively revert SPARK-30098. The CREATE TABLE syntax is still > confusing but it's the same as 2.4 > 2. Do not support the v2 CreateTable command if STORE AS/BY or EXTERNAL is > specified. This gives us more

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Xiao Li
Below are the three major blockers. I think we should start discussing how to unblock the release. -

[OSS DIGEST] The major changes of Apache Spark from Mar 25 to Apr 7

2020-04-29 Thread Xiao Li
Hi all, This is the bi-weekly Apache Spark digest from the Databricks OSS team. For each API/configuration/behavior change, an *[API] *tag is added in the title. CORE

Re: Getting the ball started on a 2.4.6 release

2020-04-23 Thread Xiao Li
u, Apr 23, 2020 at 11:43 AM Xiao Li wrote: > >> Hi, Holden, >> >> We are trying to avoid backporting the improvement/cleanup PRs to the >> maintenance releases, especially the core modules, like Spark Core and >> SQL. For example, SPARK-26390 is a good example. &

Re: Getting the ball started on a 2.4.6 release

2020-04-23 Thread Xiao Li
Hi, Holden, We are trying to avoid backporting the improvement/cleanup PRs to the maintenance releases, especially the core modules, like Spark Core and SQL. For example, SPARK-26390 is a good example. Xiao On Thu, Apr 23, 2020 at 11:17 AM Holden Karau wrote: > Tentatively I'm planning on

Re: Getting the ball started on a 2.4.6 release

2020-04-20 Thread Xiao Li
Yes. This one got merged yesterday. Thanks! Xiao On Mon, Apr 20, 2020 at 10:51 AM Sean Owen wrote: > Looks like we have 1 marked for 2.4.6: > https://issues.apache.org/jira/projects/SPARK/versions/12346781 > > https://issues.apache.org/jira/browse/SPARK-31234 ResetCommand should > not wipe

Re: Automatic PR labeling

2020-04-13 Thread Xiao Li
Looks great! Thanks for making this happen. This is pretty helpful. Xiao On Sun, Apr 12, 2020 at 11:52 PM Hyukjin Kwon wrote: > Okay, now it started to work. Let's see if it works well! > > 2020년 4월 3일 (금) 오전 11:41, Hyukjin Kwon 님이 작성: > >> Seems like this email missed to cc the mailing list,

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Xiao Li
Only the low-risk or high-value bug fixes, and the documentation changes are allowed to merge to branch-3.0. I expect all the committers are following the same rules like what we did in the previous releases. Xiao On Thu, Apr 9, 2020 at 6:13 PM Jungtaek Lim wrote: > Looks like around 80

Re: Is `branch-3.0` frozen for RC1 or not?

2020-03-31 Thread Xiao Li
Hi, Dongjoon, You can backport the commits from master to 3.0, as long as they follow our code freeze policy. Feel free to -1 on RC1 vote if your backported PRs are blocking the release. Cheers, Xiao On Tue, Mar 31, 2020 at 9:28 AM Dongjoon Hyun wrote: > Hi, All. > > RC1 tag was created

Re: Release Manager's official `branch-3.0` Assessment?

2020-03-24 Thread Xiao Li
Let us try to finish the remaining major blockers in the next few days. For example, https://issues.apache.org/jira/browse/SPARK-31085 +1 to cut the RC even if we still have the blockers that will fail the RCs. Cheers, Xiao On Tue, Mar 24, 2020 at 6:56 PM Dongjoon Hyun wrote: > +1 > >

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Xiao Li
+1 (binding) Xiao On Mon, Mar 9, 2020 at 8:33 AM Denny Lee wrote: > +1 (non-binding) > > On Mon, Mar 9, 2020 at 1:59 AM Hyukjin Kwon wrote: > >> The proposal itself seems good as the factors to consider, Thanks Michael. >> >> Several concerns mentioned look good points, in particular: >> >> >

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-03-07 Thread Xiao Li
I want to thank you *Ruifeng Zheng* publicly for his work that lists all the signature differences of Core, SQL and Hive we made in this upcoming release. For details, please read the files attached in SPARK-30982 . I went over these files and

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-02-25 Thread Xiao Li
+1 Xiao Michael Armbrust 于2020年2月24日周一 下午3:03写道: > Hello Everyone, > > As more users have started upgrading to Spark 3.0 preview (including > myself), there have been many discussions around APIs that have been broken > compared with Spark 2.x. In many of these discussions, one of the >

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Xiao Li
Like https://github.com/apache/spark/pull/23131, we added back unionAll. We might need to double check whether we removed some widely used APIs in this release before RC. If the maintenance costs are small, keeping some deprecated APIs look reasonable to me. This can help the adoption of Spark

Re: [spark-packages.org] Jenkins down

2020-02-05 Thread Xiao Li
@Cheng Lian just recreated the Jenkins service. The service is up now. Thank you for your patience, Xiao Dongjoon Hyun 于2020年1月24日周五 上午10:32写道: > Thank you for updating! > > On Fri, Jan 24, 2020 at 10:29 AM Xiao Li wrote: > >> It does not block any Spark release. Re

Re: Spark 3.0 branch cut and code freeze on Jan 31?

2020-02-04 Thread Xiao Li
gt; Thanks, >>>>> Jungtaek Lim (HeartSaVioR) >>>>> >>>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro < >>>>> linguin@gmail.com> wrote: >>>>> >>>>>> Looks nice, happy holiday, all! >>&g

Re: [FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-01 Thread Xiao Li
Thanks! Dongjoon. Xiao On Sat, Feb 1, 2020 at 5:15 PM Hyukjin Kwon wrote: > Thanks Dongjoon. > > On Sun, 2 Feb 2020, 09:08 Dongjoon Hyun, wrote: > >> Hi, All. >> >> From Today, we have `branch-3.0` as a tool of `Feature Freeze`. >> >> https://github.com/apache/spark/tree/branch-3.0 >> >>

Re: new branch-3.0 jenkins job configs are ready to be deployed...

2020-01-31 Thread Xiao Li
Thank you always, Shane! Xiao On Fri, Jan 31, 2020 at 11:19 AM shane knapp ☠ wrote: > ...whenever i get the word. :) > > FWIW they will all be identical to the current group of master > builds/tests. > > shane > -- > Shane Knapp > Computer Guy / Voice of Reason > UC Berkeley EECS Research /

Re: [spark-packages.org] Jenkins down

2020-01-24 Thread Xiao Li
gt; Bests, > Dongjoon. > > On Fri, Jan 24, 2020 at 10:20 AM Xiao Li wrote: > >> Hi, all, >> >> Because the Jenkins of spark-packages.org is down, new packages or >> releases are unable to be created in spark-packages.org. >> >> Now, we are working on it. For the

[spark-packages.org] Jenkins down

2020-01-24 Thread Xiao Li
Hi, all, Because the Jenkins of spark-packages.org is down, new packages or releases are unable to be created in spark-packages.org. Now, we are working on it. For the latest status, please follow the ticket https://issues.apache.org/jira/browse/SPARK-30636. Happy lunar new year, Xiao

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-16 Thread Xiao Li
-1 Let us include the correctness fix: https://github.com/apache/spark/pull/27229 Thanks, Xiao On Thu, Jan 16, 2020 at 8:46 AM Dongjoon Hyun wrote: > Thank you, Jungtaek! > > Bests, > Dongjoon. > > > On Wed, Jan 15, 2020 at 8:57 PM Jungtaek Lim > wrote: > >> Once we decided to cancel the

Re: Fail to use SparkR of 3.0 preview 2

2020-01-07 Thread Xiao Li
We can use R version 3.6.1, if we have a concern about the quality of 3.6.2? On Thu, Dec 26, 2019 at 8:14 PM Hyukjin Kwon wrote: > I was randomly googling out of curiosity, and seems indeed that's the > problem ( >

Re: Release Apache Spark 2.4.5

2020-01-05 Thread Xiao Li
+1 Xiao On Sun, Jan 5, 2020 at 9:50 PM Holden Karau wrote: > +1 > > On Sun, Jan 5, 2020 at 9:40 PM Wenchen Fan wrote: > >> +1 >> >> On Mon, Jan 6, 2020 at 12:02 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>> +1 to have another Spark 2.4 release, as Spark 2.4.4 was released

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Xiao Li
Thank you all. Happy Holidays! Xiao On Tue, Dec 24, 2019 at 12:53 PM Yuming Wang wrote: > Hi all, > > To enable wide-scale community testing of the upcoming Spark 3.0 release, > the Apache Spark community has posted a new preview release of Spark 3.0. > This preview is *not a stable release in

Re: Spark 3.0 branch cut and code freeze on Jan 31?

2019-12-24 Thread Xiao Li
Jan 31 is pretty reasonable. Happy Holidays! Xiao On Tue, Dec 24, 2019 at 5:52 AM Sean Owen wrote: > Yep, always happens. Is earlier realistic, like Jan 15? it's all arbitrary > but indeed this has been in progress for a while, and there's a downside to > not releasing it, to making the gap to

Re: [VOTE][RESULT] SPARK 3.0.0-preview2 (RC2)

2019-12-22 Thread Xiao Li
This is the fastest release! Thank you all for making this happen. Happy Holiday! Xiao On Sun, Dec 22, 2019 at 10:58 AM Dongjoon Hyun wrote: > Thank you all. Especially, Yuming as a release manager! > Happy Holidays! > > Cheers, > Dongjoon. > > > On Sun, Dec 22, 2019 at 12:51 AM Yuming Wang

Re: Spark 3.0 preview release 2?

2019-12-12 Thread Xiao Li
;> +1 for another preview >> >> Tom >> >> On Monday, December 9, 2019, 12:32:29 AM CST, Xiao Li < >> gatorsm...@gmail.com> wrote: >> >> >> I got many great feedbacks from the community about the recent 3.0 >> preview release. Since

Re: I would like to add JDBCDialect to support Vertica database

2019-12-11 Thread Xiao Li
but not sure where in the repo that would go. > If automated testing is required, I can ask our engineers whether there > exists something like a mockito that could be included. > > > > Thanks, Bryan H > > > > *From:* Xiao Li [mailto:lix...@databricks.com] > *Sent:* Wedne

Re: I would like to add JDBCDialect to support Vertica database

2019-12-11 Thread Xiao Li
How can the dev community test it? Xiao On Wed, Dec 11, 2019 at 6:52 AM Sean Owen wrote: > It's probably OK, IMHO. The overhead of another dialect is small. Are > there differences that require a new dialect? I assume so and might > just be useful to summarize them if you open a PR. > > On

Re: Spark 3.0 preview release 2?

2019-12-09 Thread Xiao Li
e now. > How about simply moving to a release candidate? If not now then at > least move to code freeze from the start of 2020. There is also some > downside in pushing out the 3.0 release further with previews. > > On Mon, Dec 9, 2019 at 12:32 AM Xiao Li wrote: > > &g

Spark 3.0 preview release 2?

2019-12-08 Thread Xiao Li
I got many great feedbacks from the community about the recent 3.0 preview release. Since the last 3.0 preview release, we already have 353 commits [https://github.com/apache/spark/compare/v3.0.0-preview...master]. There are various important features and behavior changes we want the community to

Re: [DISCUSS] PostgreSQL dialect

2019-11-26 Thread Xiao Li
+1 > One particular negative effect has been that new postgresql tests add well > over an hour to tests, Adding postgresql tests is for improving the test coverage of Spark SQL. We should continue to do this by importing more test cases. The quality of Spark highly depends on the test

Re: [build system] Upgrading pyarrow, builds might be temporarily broken

2019-11-14 Thread Xiao Li
Hi, Bryan, Thank you for your update! Xiao On Thu, Nov 14, 2019 at 8:48 PM Bryan Cutler wrote: > Update: #26133 has been > merged and builds should be passing now, thanks all! > > On Thu, Nov 14, 2019 at 4:12 PM Bryan Cutler wrote: > >> We are in

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-02 Thread Xiao Li
ests, > Dongjoon. > > > > On Fri, Nov 1, 2019 at 5:37 PM Jiaxin Shan wrote: > >> +1 for Hadoop 3.2. Seems lots of cloud integration efforts Steve made is >> only available in 3.2. We see lots of users asking for better S3A support >> in Spark. >> >>

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-01 Thread Xiao Li
from HEAD requests before an object was actually created. > > It would be really good if the spark distributions shipped with later > versions of the hadoop artifacts. > > On Mon, Oct 28, 2019 at 7:53 PM Xiao Li wrote: > >> The stability and quality of Hadoop 3.2 prof

Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-10-31 Thread Xiao Li
Spark 3.0 will still use the Hadoop 2.7 profile by default, I think. Hadoop 2.7 profile is much more stable than Hadoop 3.2 profile. On Thu, Oct 31, 2019 at 3:54 PM Sean Owen wrote: > This isn't a big thing, but I see that the pyspark build includes > Hadoop 2.7 rather than 3.2. Maybe later we

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-10-28 Thread Xiao Li
The stability and quality of Hadoop 3.2 profile are unknown. The changes are massive, including Hive execution and a new version of Hive thriftserver. To reduce the risk, I would like to keep the current default version unchanged. When it becomes stable, we can change the default profile to

Happy Diwali everyone!!!

2019-10-27 Thread Xiao Li
Happy Diwali everyone!!! Xiao

Re: Unable to resolve dependency of sbt-mima-plugin since yesterday

2019-10-22 Thread Xiao Li
Thank you, Dongjoon! Xiao On Tue, Oct 22, 2019 at 5:08 PM Dongjoon Hyun wrote: > Hi, All. > > This is fixed in master/branch-2.4. > > Bests, > Dongjoon. > > On Tue, Oct 22, 2019 at 12:19 Sean Owen wrote: > >> Weird. Let's discuss at https://issues.apache.org/jira/browse/SPARK-29560 >> >> On

Add the Google's Code Review Developer Guide as a reference in our code review guide?

2019-10-21 Thread Xiao Li
Hi, all, Here, I am proposing to add the Google's Code Review Developer Guide as a reference in our code review guide. The guide looks very reasonable to our Spark development too. We do not need to completely follow each rule but it is a good

Re: SparkGraph review process

2019-10-14 Thread Xiao Li
> > 1. On the technical side, my main concern is the runtime dependency on > org.opencypher:okapi-shade. okapi depends on several Scala libraries. We > came out with the solution to shade a few Scala libraries to avoid > pollution. However, I'm not super confident that the approach is >

Re: [build system] IMPORTANT! northern california fire danger, potential power outage(s)

2019-10-11 Thread Xiao Li
That is great news!!! Shane, have a good trip! Xiao On Fri, Oct 11, 2019 at 1:58 PM Shane Knapp wrote: > finally, some good news! power was just restored to campus. > > i'm about to leave town, but jon (CCed) will be heading down to power > things up soon and we should hopefully be building

Re: Committing while Jenkins down?

2019-10-10 Thread Xiao Li
Since the outage could be as long as five days I’d rather not just have PRs >> pile up for that entire period. >> > >> > On Thu, Oct 10, 2019 at 8:38 AM Xiao Li wrote: >> >> >> >> I think we are unable to merge any major PR if we do not know wheth

Re: Committing while Jenkins down?

2019-10-10 Thread Xiao Li
Please check the note from Shane. [build system] IMPORTANT! northern california fire danger, potential power outage(s) Thomas graves 于2019年10月10日周四 上午8:35写道: > This is directed towards committers/PMC members. > > It looks like Jenkins will be down for a while, what is everyone's > thoughts on

Re: Committing while Jenkins down?

2019-10-10 Thread Xiao Li
I think we are unable to merge any major PR if we do not know whether the tests can pass. Xiao Xiao Li 于2019年10月10日周四 上午8:36写道: > Please check the note from Shane. > > [build system] IMPORTANT! northern california fire danger, potential power > outage(s) > > Thomas graves

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-10-10 Thread Xiao Li
+1 On Thu, Oct 10, 2019 at 2:13 AM Hyukjin Kwon wrote: > +1 (binding) > > 2019년 10월 10일 (목) 오후 5:11, Takeshi Yamamuro 님이 작성: > >> Thanks for the great work, Gengliang! >> >> +1 for that. >> As I said before, the behaviour is pretty common in DBMSs, so the change >> helps for DMBS users. >> >>

Re: Spark 3.0 preview release feature list and major changes

2019-10-09 Thread Xiao Li
SPARK-29345 Add an API that allows a user to define and observe arbitrary metrics on streaming queries Let us add this too. Cheers, Xiao On Tue, Oct 8, 2019 at 10:31 PM Wenchen Fan wrote: > Regarding DS v2, I'd like to remove > SPARK-26785

Re: [build system] IMPORTANT! northern california fire danger, potential power outage(s)

2019-10-08 Thread Xiao Li
Hi, Shane, Thank you for letting us know in advance! Xiao On Tue, Oct 8, 2019 at 12:50 PM Shane Knapp wrote: > here in the lovely bay area, we are currently experiencing some > absolutely lovely weather: temps around 20C, light winds, and not a > drop of moisture anywhere. > > this means

Re: [DISCUSS] Spark 2.5 release

2019-09-20 Thread Xiao Li
+1 on Jungtaek's point. We can revisit this when we release Spark 3.1? After the release of 3.0, I believe we will get more feedback about DSv2 from the community. The current design is just made by a small group of contributors. DSv2 + catalog APIs are still evolving. It is very likely we will

Re: Thoughts on Spark 3 release, or a preview release

2019-09-17 Thread Xiao Li
https://issues.apache.org/jira/browse/SPARK-28264 SPARK-28264 Revisiting Python / pandas UDF sounds critical for 3.0 preview Xiao On Mon, Sep 16, 2019 at 12:22 PM Erik Erlandson wrote: > > I'm in favor of adding SPARK-25299 > - Use remote

Re: Welcoming some new committers and PMC members

2019-09-09 Thread Xiao Li
Congratulations to all of you! Xiao On Mon, Sep 9, 2019 at 5:32 PM Matei Zaharia wrote: > Hi all, > > The Spark PMC recently voted to add several new committers and one PMC > member. Join me in welcoming them to their new roles! > > New PMC member: Dongjoon Hyun > > New committers: Ryan Blue,

Re: maven 3.6.1 removed from apache maven repo

2019-09-03 Thread Xiao Li
Hi, Tom, To unblock the build, I merged the upgrade to master. https://github.com/apache/spark/pull/25665 Thanks! Xiao On Tue, Sep 3, 2019 at 10:58 AM Tom Graves wrote: > It looks like maven 3.6.1 was removed from the repo - see SPARK-28960. It > looks like they pushed 3.6.2, but I don't

Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-30 Thread Xiao Li
+1 Xiao Felix Cheung 于2019年8月30日周五 上午2:03写道: > +1 > > Run tests, R tests, r-hub Debian, Ubuntu, mac, Windows > > -- > *From:* Hyukjin Kwon > *Sent:* Wednesday, August 28, 2019 9:14 PM > *To:* Takeshi Yamamuro > *Cc:* dev; Dongjoon Hyun > *Subject:* Re: [VOTE]

Re: JDK11 Support in Apache Spark

2019-08-24 Thread Xiao Li
Thank you for your contributions! This is a great feature for Spark 3.0! We finally achieve it! Xiao On Sat, Aug 24, 2019 at 12:18 PM Felix Cheung wrote: > That’s great! > > -- > *From:* ☼ R Nair > *Sent:* Saturday, August 24, 2019 10:57:31 AM > *To:* Dongjoon Hyun

Re: Release Spark 2.3.4

2019-08-16 Thread Xiao Li
+1 On Fri, Aug 16, 2019 at 4:11 PM Takeshi Yamamuro wrote: > +1, too > > Bests, > Takeshi > > On Sat, Aug 17, 2019 at 7:25 AM Dongjoon Hyun > wrote: > >> +1 for 2.3.4 release as the last release for `branch-2.3` EOL. >> >> Also, +1 for next week release. >> >> Bests, >> Dongjoon. >> >> >> On

Re: [SPARK-23207] Repro

2019-08-10 Thread Xiao Li
Hi, Tyson, Could you open a new JIRA with correctness label? SPARK-23207 might not cover all the scenarios, especially when you using cache. Cheers, Xiao On Fri, Aug 9, 2019 at 9:26 AM wrote: > Hi Sean, > > To finish the job, I did need to set spark.stage.maxConsecutiveAttempts to > a large

Re: Spark SQL upgrade / migration guide: discoverability and content organization

2019-07-14 Thread Xiao Li
Yeah, Josh! All these ideas sound good to me. All the top commercial database products have very detailed guide/document about the version upgrading. You can easily find them. Currently, only SQL and ML modules have the migration or upgrade guides. Since Spark 2.3 release, we strictly require the

Re: Jenkins Jobs for Hadoop-3.2 profile

2019-06-19 Thread Xiao Li
Thank you, Shane!!! Will do it next time. : ) On Wed, Jun 19, 2019 at 3:15 PM shane knapp wrote: > i will do it later this week. also, in the future, please file jiras for > stuff like this rather than pinging me on the list. ;) > > On Wed, Jun 19, 2019 at 1:39 PM Xi

Re: Jenkins Jobs for Hadoop-3.2 profile

2019-06-19 Thread Xiao Li
That sounds good to me! @shane knapp Could you help this? Or Dongjoon can do it by himself since he has the access? Cheers, Xiao On Wed, Jun 19, 2019 at 10:56 AM Dongjoon Hyun wrote: > Hi, All. > > So far, we have only `hadoop-2.7` profile jobs. > > - SBT with hadoop-2.7 > - Maven with

Re: Filter cannot be pushed via a Join

2019-06-18 Thread Xiao Li
Hi, William, Thanks for reporting it. Could you open a JIRA? Cheers, Xiao William Wong 于2019年6月18日周二 上午8:57写道: > BTW, I noticed a workaround is creating a custom rule to remove 'empty > local relation' from a union table. However, I am not 100% sure if it is > the right approach. > > On Tue,

Re: Master maven build failing for 6 days -- may need some more eyes

2019-05-30 Thread Xiao Li
Thanks! Yuming and Gengliang are working on this. On Thu, May 30, 2019 at 8:21 AM Sean Owen wrote: > I might need some help figuring this out. The master Maven build has > been failing for almost a week, and I'm having trouble diagnosing why. > Of course, the PR builder has been fine. > > >

[ANNOUNCE] Announcing Apache Spark 2.4.3

2019-05-09 Thread Xiao Li
been possible without you. Xiao Li

Re: [VOTE] Release Apache Spark 2.4.3

2019-05-06 Thread Xiao Li
This vote passes! I'll follow up with a formal release announcement soon. +1: Michael Heuer (non-binding) Gengliang Wang (non-binding) Sean Owen (binding) Felix Cheung (binding) Wenchen Fan (binding) Herman van Hovell (binding) Xiao Li (binding) Cheers, Xiao antonkulaga 于2019年5月6日周一 下午2:36写道

[VOTE] Release Apache Spark 2.4.3

2019-05-01 Thread Xiao Li
Please vote on releasing the following candidate as Apache Spark version 2.4.3. The vote is open until May 5th PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.3 [ ] -1 Do not release this package because ... To

Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-25 Thread Xiao Li
Thanks, DB! The Hive UDAF fix https://github.com/apache/spark/commit/0cfefa7e864f443cfd76cff8c50617a8afd080fb was merged this weekend. Xiao DB Tsai 于2019年3月25日周一 下午9:46写道: > RC9 was just cut. Will send out another thread once the build is finished. > > Sincerely, > > DB Tsai >

Re: [build system] VERY IMPORTANT: please file JIRAs for issues w/jenkins

2019-03-02 Thread Xiao Li
Thank you, Shane! Xiao shane knapp 于2019年3月2日周六 下午4:28写道: > adding new k8s functionality? > > something need upgrading in jenkins? > > are logs not being archived? > > odd build failure (and i mean *odd*)? > > PLEASE FILE A JIRA! :) > > adding a @shaneknapp to a PR github is no longer working

Re: PR tests not running?

2019-02-26 Thread Xiao Li
Thanks for reporting it! It sounds like Shane is working on it. I manually triggered the test for the PR https://github.com/apache/spark/pull/23894 . Cheers, Xiao Bruce Robbins 于2019年2月26日周二 上午11:33写道: > Sorry for stating what is likely obvious, but PR tests don't appear to be > running.

Re: [VOTE] SPIP: Identifiers for multi-catalog Spark

2019-02-21 Thread Xiao Li
+1 This is in the right direction. The resolution rules and catalog APIs need more discussion when we implement it. In the current stage, we can disallow the runtime creation of the catalog. This will complicate the name resolution in a multi-session environment. For example, when one user

Re: Compatibility on build-in DateTime functions with Hive/Presto

2019-02-17 Thread Xiao Li
> date_part > --- > 2017 > (1 row) > > > We'd better follow the Hive semantics. And removing support for and > -d[d] will simplify the routine. > > I'll create a Pull Request later. > > > On Sat, 16 Feb 2019 00:51:43 +0800

Re: Compatibility on build-in DateTime functions with Hive/Presto

2019-02-15 Thread Xiao Li
We normally do not follow MySQL. Check the commercial database [like Oracle]? or the open source PostgreSQL? Sean Owen 于2019年2月15日周五 上午5:34写道: > year("1912") == 1912 makes sense; month("1912") == 1 is odd but not > wrong. On the one hand, some answer might be better than none. But > then, we

Re: Apache Spark git repo moved to gitbox.apache.org

2019-02-12 Thread Xiao Li
The above instruction is different from what the website document: https://github.com/apache/spark-website/commit/92606b2e7849b9d743ef2a8176438142420a83e5#diff-17faa4bab13b7530a3e1b627bb798ad0 Some committers are using gitbox, but the others are following the website instruction and using github.

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-08 Thread Xiao Li
Hi, Takeshi, Many PMCs are on vacation or offsite during this week. If possible, could you extend it to next Wed? Happy Lunar New Year! Xiao Marcelo Vanzin 于2019年2月8日周五 下午5:03写道: > Hi Takeshi, > > Since we only really have one +1 binding vote, do you want to extend > this vote a bit? > >

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-04 Thread Xiao Li
. > > > > 2019년 2월 5일 (화) 오전 1:16, Xiao Li 님이 작성: > >> To reduce the impact and risk of upgrading Hive execution JARs, we can >> just upgrade the built-in Hive to 2.x when using the profile of Hadoop 3.x. >> The support of Hadoop 3 will be still experimental in our

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-04 Thread Xiao Li
> > > We have real users getting blocked by this issue. > > > > > > > > From: Xiao Li > > Sent: Wednesday, January 16, 2019 9:37 AM > > To: Ryan Blue > > Cc: Marcelo Vanzin; Hyukjin Kwon; Sean Owen; Felix Che

Re: [VOTE] [SPARK-25994] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-01-30 Thread Xiao Li
gt;> Q7. How long will it take? >> >>- >> >>If accepted by the community by the end of December 2018, we predict >>to be feature complete by mid-end March, allowing for QA during April >> 2019, >>making the SPIP part of the next major Spark

Re: Welcome Jose Torres as a Spark committer

2019-01-29 Thread Xiao Li
Congratulations! Xiao Shixiong Zhu 于2019年1月29日周二 上午10:48写道: > Hi all, > > The Apache Spark PMC recently added Jose Torres as a committer on the > project. Jose has been a major contributor to Structured Streaming. Please > join me in welcoming him! > > Best Regards, > > Shixiong Zhu > >

Re: [VOTE] [SPARK-25994] SPIP: DataFrame-based Property Graphs, Cypher Queries, and Algorithms

2019-01-29 Thread Xiao Li
+1 Jules Damji 于2019年1月29日周二 上午8:14写道: > +1 (non-binding) > (Heard their proposed tech-talk at Spark + A.I summit in London. Well > attended & well received.) > > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > On Jan 29, 2019, at 7:30 AM, Denny Lee wrote: > > +1 > > yay - let's

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-23 Thread Xiao Li
-1 https://issues.apache.org/jira/browse/SPARK-26709 is another blocker ticket that returns incorrect results. Marcelo Vanzin 于2019年1月23日周三 下午12:01写道: > -1 too. > > I just upgraded https://issues.apache.org/jira/browse/SPARK-26682 to > blocker. It's a small fix and we should make it in 2.3.3.

Re: Removing old HiveMetastore(0.12~0.14) from Spark 3.0.0?

2019-01-22 Thread Xiao Li
Based on my experience in development of Spark SQL, the maintenance cost is very small for supporting different versions of Hive metastore. Feel free to ping me if we hit any issue about it. Cheers, Xiao Reynold Xin 于2019年1月22日周二 下午11:18写道: > Actually a non trivial fraction of users /

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-16 Thread Xiao Li
Thanks for your feedbacks! Working with Yuming to reduce the risk of stability and quality. Will keep you posted when the proposal is ready. Cheers, Xiao Ryan Blue 于2019年1月16日周三 上午9:27写道: > +1 for what Marcelo and Hyukjin said. > > In particular, I agree that we can't expect Hive to release

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Xiao Li
> need to happen at Spark 3. > Separately, its usage could be reduced or removed -- this I don't know > much about. But it doesn't really make it harder or easier. > > On Tue, Jan 15, 2019 at 12:40 PM Xiao Li wrote: > > > > Since Spark 2.0, we have been trying to mo

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Xiao Li
ible we should >> still upgrade or replace the hive jar from a fork, as Sean says, from a ASF >> release process standpoint. Unless there is a plan for removing hive >> integration (all of it) from the spark core project.. >> >> >>

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Xiao Li
ive... > > > -- > *From:* Ryan Blue > *Sent:* Tuesday, January 15, 2019 9:53 AM > *To:* Xiao Li > *Cc:* Yuming Wang; dev > *Subject:* Re: [DISCUSS] Upgrade built-in Hive to 2.3.4 > > How do we know that most Spark users are not using Hive? I w

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Xiao Li
Hi, Yuming, Thank you for your contributions! The community aims at reducing the dependence on Hive. Currently, most of Spark users are not using Hive. The changes looks risky to me. To support Hadoop 3.x, we just need to resolve this JIRA: https://issues.apache.org/jira/browse/HIVE-16391

Re: Apache Spark 2.2.3 ?

2019-01-08 Thread Xiao Li
Thank you, Takeshi! Dongjoon Hyun 于2019年1月8日周二 下午10:13写道: > Great! Thank you, Takeshi! :D > > Bests, > Dongjoon. > > On Tue, Jan 8, 2019 at 8:47 PM Takeshi Yamamuro > wrote: > >> If there is no other volunteer for the release of 2.3.3, I'd like to. >> >> best, >> takeshi >> >> On Fri, Jan 4,

Re: [DISCUSS] Handling correctness/data loss jiras

2019-01-05 Thread Xiao Li
+1 Reynold Xin 于2019年1月4日周五 上午9:28写道: > Committers, > > When you merge tickets fixing correctness bugs, please make sure you tag > the tickets with "correctness" label. I've found multiple tickets today that > didn't do that. > > > On Fri, Aug 17, 2018 at 7:11 AM, Tom Graves > wrote: > >>

Re: Run a specific PySpark test or group of tests

2018-12-06 Thread Xiao Li
Yes! This is very helpful! On Wed, Dec 5, 2018 at 9:21 PM Wenchen Fan wrote: > great job! thanks a lot! > > On Thu, Dec 6, 2018 at 9:39 AM Hyukjin Kwon wrote: > >> It's merged now and in developer tools page - >> http://spark.apache.org/developer-tools.html#individual-tests >> Have some func

Re: DataSourceV2 community sync #3

2018-12-03 Thread Xiao Li
d the Spark catalog be the common denominator of the other > catalogs (least featured) or a super-feature catalog? > > > > *From: *Xiao Li > *Date: *Saturday, December 1, 2018 at 10:49 PM > *To: *Ryan Blue > *Cc: *"u...@spark.apache.org" > *Subject

Re: DataSourceV2 community sync #3

2018-12-01 Thread Xiao Li
code paths that > don’t support them. The use of table identifiers with a catalog part was > discussed in the “Multiple catalog support” thread. I’ve also brought it up > and pointed out how I think it should be used in syncs a couple of times. > > Sorry if this discussion isn’t how you wou

Re: DataSourceV2 community sync #3

2018-12-01 Thread Xiao Li
have yet to hear your argument for why that is not the > case. > > rb > > On Sat, Dec 1, 2018 at 12:36 PM Xiao Li wrote: > >> Hi, Ryan, >> >> Catalog is a really important component for Spark SQL or any analytics >> platform, I have to emphasize. Thus, a

Re: DataSourceV2 community sync #3

2018-12-01 Thread Xiao Li
em directly >or return other existing implementations. Here’s how it worked in the >old API > > <https://github.com/apache/spark/pull/21306/files#diff-db51e7934b9ee539ad599197a935cb86R35> >. > > I hope that you don’t think I expect you to go “without seeing th

Re: DataSourceV2 community sync #3

2018-11-29 Thread Xiao Li
. > > If you still have questions about how you might plug in Glue, let me know > and I can clarify. > > rb > > On Thu, Nov 29, 2018 at 2:56 PM Xiao Li wrote: > >> Ryan, >> >> Thanks for leading the discussion and sending out the memo! >> >

Re: DataSourceV2 community sync #3

2018-11-29 Thread Xiao Li
ext sync, please start sending them > to me. Thank you! > > *Attendees:* > > Ryan Blue > John Zhuge > Jamison Bennett > Yuanjian Li > Xiao Li > stczwd > Matt Cheah > Wenchen Fan > Genglian Wang > Kevin Yu > Maryann Xue > Cody Koeninger > Bru

Re: DataSourceV2 community sync #3

2018-11-28 Thread Xiao Li
Based on my understanding, we are not inventing anything new here. Basically, we are building a federated database system especially after we supporting multiple catalog. There are many mature commercial products in the market. For example,

Re: How to manually kick off an ASF -> github git sync

2018-11-19 Thread Xiao Li
That is how I did it in the past. It should work. On Mon, Nov 19, 2018 at 3:08 PM Sean Owen wrote: > I noticed the sync hasn't happened for about 2 days, and noticed > https://issues.apache.org/jira/browse/INFRA-17269 and also noticed > from there that we can trigger them manually, at >

Re: Jenkins down?

2018-11-19 Thread Xiao Li
Thanks, Shane! On Mon, Nov 19, 2018 at 12:15 PM Marco Gaido wrote: > Thanks Shane! > > Il giorno lun 19 nov 2018 alle ore 19:14 shane knapp > ha scritto: > >> alright, we're back and building. >> >> On Mon, Nov 19, 2018 at 10:11 AM shane knapp wrote: >> >>> thanks for the heads up... looks

<    1   2   3   4   >