Re: [ANNOUNCE] Announcing Apache Spark 2.4.0

2018-11-08 Thread Xiao Li
Try to clear your browsing data or use a different web browser. Enjoy it, Xiao On Thu, Nov 8, 2018 at 4:15 PM Reynold Xin wrote: > Do you have a cached copy? I see it here > > http://spark.apache.org/downloads.html > > > > On Thu, Nov 8, 2018 at 4:12 PM Li Gao wrote: > >> this is wonderful !

Re: Did the 2.4 release email go out?

2018-11-08 Thread Xiao Li
me too. Reynold Xin 于2018年11月8日周四 上午9:56写道: > The website is already up but I didn’t see any email announcement yet. >

Happy Diwali everyone!!!

2018-11-07 Thread Xiao Li
Happy Diwali everyone!!! Xiao Li

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-30 Thread Xiao Li
Yes, this is not a blocker. "spark.sql.optimizer.nestedSchemaPruning.enabled" is intentionally off by default. As DB Tsai said, column pruning of nested schema for Parquet tables is experimental. In this release, we encourage the whole community to try this new feature but it might have bugs like

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Xiao Li
+1 On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.0. > > The vote is open until November 1 PST and passes if a majority +1 PMC > votes are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release this package

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Xiao Li
+1 Reynold Xin 于2018年10月25日周四 下午4:16写道: > +1 > > > > On Thu, Oct 25, 2018 at 4:12 PM Li Jin wrote: > >> Although I am not specifically involved in DSv2, I think having this kind >> of meeting is definitely helpful to discuss, move certain effort forward >> and keep people on the same page.

Re: What if anything to fix about k8s for the 2.4.0 RC5?

2018-10-25 Thread Xiao Li
Hopefully, this will not delay RC5. Since this is not a blocker ticket, RC5 will start if all the blocker tickets are resolved. Thanks, Xiao Sean Owen 于2018年10月25日周四 上午8:44写道: > Yes, I agree, and perhaps you are best placed to do that for 2.4.0 RC5 :) > > On Thu, Oct 25, 2018 at 10:41 AM

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-24 Thread Xiao Li
@Dongjoon Hyun Thanks! This is a blocking ticket. It returns a wrong result due to our undefined behavior. I agree we should revert the newly added map-oriented functions. In 3.0 release, we need to define the behavior of duplicate keys in the data type MAP and fix all the related issues that

Re: Documentation of boolean column operators missing?

2018-10-23 Thread Xiao Li
They are documented at the link below https://spark.apache.org/docs/2.3.0/api/sql/index.html On Tue, Oct 23, 2018 at 10:27 AM Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > I can’t seem to find any documentation of the &, |, and ~ operators for > PySpark DataFrame columns. I assume

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Xiao Li
https://github.com/apache/spark/pull/22144 is also not a blocker of Spark 2.4 release, as discussed in the PR. Thanks, Xiao Xiao Li 于2018年10月23日周二 上午9:20写道: > Thanks for reporting this. https://github.com/apache/spark/pull/22514 is > not a blocker. We can fix it in the next minor r

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Xiao Li
Thanks for reporting this. https://github.com/apache/spark/pull/22514 is not a blocker. We can fix it in the next minor release, if we are unable to make it in this release. Thanks, Xiao Sean Owen 于2018年10月23日周二 上午9:14写道: > (I should add, I only observed this with the Scala 2.12 build. It all

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

2018-10-22 Thread Xiao Li
Hi, Kazuaki, Thanks for your great SPIP! I am willing to be the shepherd of this SPIP. Cheers, Xiao On Mon, Oct 22, 2018 at 12:05 AM Kazuaki Ishizaki wrote: > Hi Yamamuro-san, > Thank you for your comments. This SPIP gets several valuable comments and > feedback on Google Doc: >

Re: [DISCUSS][K8S][TESTS] Include Kerberos integration tests for Spark 2.4

2018-10-16 Thread Xiao Li
We need to strictly follow the backport and release policy. We can't merge such a new feature into a RC branch or a minor release (e.g., 2.4.1). Cheers, Xiao Bolke de Bruin 于2018年10月16日周二 下午12:48写道: > Chiming in here. We are in the same boat as Bloomberg. > > (But being a release manager

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-11 Thread Xiao Li
-1. We have two correctness bugs: https://issues.apache.org/jira/browse/SPARK-25714 and https://issues.apache.org/jira/browse/SPARK-25708. Let us fix all the three issues in ScalaUDF, as mentioned by Sean. Xiao Sean Owen 于2018年10月11日周四 上午9:04写道: > This is a legitimate question about the

Re: Random sampling in tests

2018-10-08 Thread Xiao Li
particular reason. You can vary the seed but >> as a rule the same random subset of tests is always chosen. Could be >> fine if there's no reason at all to prefer some cases over others. But >> I am guessing any wild guess at the most important subset of cases to >> test is better t

Re: Random sampling in tests

2018-10-08 Thread Xiao Li
For this specific case, I do not think we should test all the timezone. If this is fast, I am fine to leave it unchanged. However, this is very slow. Thus, I even prefer to reducing the tested timezone to a smaller number or just hardcoding some specific time zones. In general, I like Reynold’s

Re: welcome a new batch of committers

2018-10-05 Thread Xiao Li
Congratulations all! Weiqing Yang 于2018年10月3日周三 下午11:20写道: > Congratulations everyone! > > On Wed, Oct 3, 2018 at 11:14 PM, Driesprong, Fokko > wrote: > >> Congratulations all! >> >> Op wo 3 okt. 2018 om 23:03 schreef Bryan Cutler : >> >>> Congratulations everyone! Very well deserved!! >>> >>>

Spark github sync works now

2018-10-05 Thread Xiao Li
FYI. The Spark github sync was 7 hour behind this morning. You might get fail merges because of this. Just triggered a re-sync. It should work now. Thanks, Xiao

Re: time for Apache Spark 3.0?

2018-09-29 Thread Xiao Li
to wrist injury > > > On Fri, Sep 28, 2018 at 11:01 PM Xiao Li wrote: > >> Based on the above discussions, we have a "rough consensus" that the next >> release will be 3.0. Now, we can start working on the API breaking changes >> (e.g., the ones mentioned in the

Re: [DISCUSS] Syntax for table DDL

2018-09-29 Thread Xiao Li
Are they consistent with the current syntax defined in SqlBase.g4? I think we are following the Hive DDL syntax: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/Partition/Column Ryan Blue 于2018年9月28日周五 下午3:47写道: > Hi everyone, > > I’m currently

Re: time for Apache Spark 3.0?

2018-09-29 Thread Xiao Li
l evolving and not very > close to stable. I had hoped to have stabilized the API and behaviors for a > 3.0 release. But we could also wait on that for a 4.0 release, depending on > when we think that will be. > > > > Unless there is a pressing need to move to 3.0 for some

Re: [DISCUSS] Cascades style CBO for Spark SQL

2018-09-25 Thread Xiao Li
Hi, Xiaoju, Thanks for sending this to the dev list. The current join reordering rule is just a stats based optimizer rule. Either top-down or bottom-up optimization can achieve the same-level optimized plans. DB2 is using bottom up. In the future, we plan to move the stats based join reordering

Re: [VOTE] SPARK 2.3.2 (RC6)

2018-09-20 Thread Xiao Li
+1 John Zhuge 于2018年9月19日周三 下午1:17写道: > +1 (non-binding) > > Built on Ubuntu 16.04 with Maven flags: -Phadoop-2.7 -Pmesos -Pyarn > -Phive-thriftserver -Psparkr -Pkinesis-asl -Phadoop-provided > > java version "1.8.0_181" > Java(TM) SE Runtime Environment (build 1.8.0_181-b13) > Java

Re: [VOTE] SPARK 2.4.0 (RC1)

2018-09-17 Thread Xiao Li
Hi, Erik and Stavros, This bug fix SPARK-23200 is not a blocker of the 2.4 release. It sounds important for the Streaming on K8S. Could the K8S oriented committers speed up the reviews? Thanks, Xiao Erik Erlandson 于2018年9月17日周一 上午11:04写道: > > I have no binding vote but I second Stavros’

Re: time for Apache Spark 3.0?

2018-09-06 Thread Xiao Li
Yesterday, the 2.4 branch was created. Based on the above discussion, I think we can bump the master branch to 3.0.0-SNAPSHOT. Any concern? Thanks, Xiao vaquar khan 于2018年6月16日周六 上午10:21写道: > +1 for 2.4 next, followed by 3.0. > > Where we can get Apache Spark road map for 2.4 and 2.5

Spark github sync works now

2018-08-22 Thread Xiao Li
FYI. The Spark github sync was 10 hour behind this morning. You might get fail merges because of this. Just triggered a re-sync. It should work now. Thanks, Xiao

Re: [DISCUSS][SQL] Control the number of output files

2018-08-05 Thread Xiao Li
FYI, the new hints have been merged. They will be available in the upcoming release (Spark 2.4). *John Zhuge*, thanks for your work! Really appreciate it! Please submit more PRs and help the community improve Spark. : ) Xiao 2018-08-05 21:06 GMT-04:00 Koert Kuipers : > lukas, > what is the

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-29 Thread Xiao Li
at one. > > Xiao Li 于2018年7月28日周六 上午12:05写道: > >> The following blocker/important fixes have been merged to Spark 2.3 >> branch: >> >> https://issues.apache.org/jira/browse/SPARK-24927 >> https://issues.apache.org/jira/browse/SPARK-24867 >> https://issues.apach

Re: [Spark SQL] Future of CalendarInterval

2018-07-27 Thread Xiao Li
The code freeze of the upcoming release Spark 2.4 is very close. How about revisiting this and explicitly defining the support scope of CalendarIntervalType in the next release (Spark 3.0)? Thanks, Xiao 2018-07-27 10:45 GMT-07:00 Reynold Xin : > CalendarInterval is definitely externally

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-27 Thread Xiao Li
2018, 8:13:23 PM CDT, Saisai Shao < > sai.sai.s...@gmail.com> wrote: > > > Sure, I can wait for this and create another RC then. > > Thanks, > Saisai > > Xiao Li 于2018年7月20日周五 上午9:11写道: > > Yes. https://issues.apache.org/jira/browse/SPARK-24867 is the one I > created

Re: [build system] upped build retention for GHPRB builds

2018-07-27 Thread Xiao Li
Hi, Shane, Thank you for your help! Xiao 2018-07-24 11:03 GMT-07:00 shane knapp : > the PRB was set to rotate build logs out after two weeks, but due to the > sheer number of builds (yay! a great thing!), i just went in and upped it > to 30 days. > > we've got plenty of disk space on our

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-19 Thread Xiao Li
t; >>> On Thu, Jul 19, 2018 at 5:24 PM Saisai Shao >>> wrote: >>> >>>> Hi Xiao, >>>> >>>> Are you referring to this JIRA (https://issues.apache.org/ >>>> jira/browse/SPARK-24865)? >>>> >>>> Xiao Li 于

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-19 Thread Xiao Li
> rationale. > > On Thu, Jul 19, 2018 at 1:27 PM Xiao Li wrote: > >> I would first vote -1. >> >> I might find another regression caused by the analysis barrier. Will keep >> you posted. >> >>

Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-19 Thread Xiao Li
I would first vote -1. I might find another regression caused by the analysis barrier. Will keep you posted. Xiao 2018-07-18 18:05 GMT-07:00 Takeshi Yamamuro : > +1 (non-binding) > > I run tests on a EC2 m4.2xlarge instance; > [ec2-user]$ java -version > openjdk version "1.8.0_171" > OpenJDK

Re: [VOTE] SPIP: Standardize SQL logical plans

2018-07-18 Thread Xiao Li
+1 (binding) Like what Ryan and I discussed offline, the contents of implementation sketch is not part of this vote. Cheers, Xiao 2018-07-18 8:00 GMT-07:00 Russell Spitzer : > +1 (non-binding) > > On Wed, Jul 18, 2018 at 1:32 AM Marco Gaido > wrote: > >> +1 (non-binding) >> >> >> On Wed, 18

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Xiao Li
Three business days might be too short. Let us open the vote until the end of this Friday (July 13th)? Cheers, Xiao 2018-07-08 10:15 GMT-07:00 Sean Owen : > Just checking that the doc issue in https://issues.apache.org/ > jira/browse/SPARK-24530 is worked around in this release? > > This was

Re: Time for 2.3.2?

2018-06-28 Thread Xiao Li
+1. Thanks, Saisai! The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP. Thanks, Xiao 2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro : > +1, I heard some Spark users have skipped v2.3.1 because of these bugs. > > On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang > wrote: > >> +1

Re: time for Apache Spark 3.0?

2018-06-16 Thread Xiao Li
+1 2018-06-15 14:55 GMT-07:00 Reynold Xin : > Yes. At this rate I think it's better to do 2.4 next, followed by 3.0. > > > On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan > wrote: > >> I agree, I dont see pressing need for major version bump as well. >> >> >> Regards, >> Mridul >> On Fri,

Re: Missing HiveConf when starting PySpark from head

2018-06-14 Thread Xiao Li
Thanks for catching this. Please feel free to submit a PR. I do not think Vanzin wants to introduce the behavior changes in that PR. We should do the code review more carefully. Xiao 2018-06-14 9:18 GMT-07:00 Li Jin : > Are there objection to restore the behavior for PySpark users? I am happy >

Re: Optimizer rule ConvertToLocalRelation causes expressions to be eager-evaluated in Planning phase

2018-06-10 Thread Xiao Li
For stateful/non-deterministic UDFs, we do not evaluate them in the optimizer stage. For deterministic UDFs, each invocation should return the same result. Before Spark 2.3 release, we assume all the UDFs are deterministic and stateless. In the recent release Spark 2.3, we allow users to mark the

Re: Time for 2.2.2 release

2018-06-10 Thread Xiao Li
+1 Tom, thanks for helping this! Xiao 2018-06-07 9:40 GMT-07:00 Marcelo Vanzin : > Took a look at our branch and most of the stuff that is not already in > 2.2 are flaky test fixes, so +1. > > On Wed, Jun 6, 2018 at 7:54 AM, Tom Graves > wrote: > > Hello all, > > > > I think its time for

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-04 Thread Xiao Li
+1 On Mon, Jun 4, 2018 at 12:44 PM Henry Robinson wrote: > +1 (non-binding) > > On 4 June 2018 at 11:15, Bryan Cutler wrote: > >> +1 >> >> On Mon, Jun 4, 2018 at 10:18 AM, Joseph Bradley >> wrote: >> >>> +1 >>> >>> On Mon, Jun 4, 2018 at 10:16 AM, Mark Hamstra >>> wrote: >>> +1

Re: [VOTE] [SPARK-24374] SPIP: Support Barrier Scheduling in Apache Spark

2018-06-01 Thread Xiao Li
+1 2018-06-01 15:41 GMT-07:00 Xingbo Jiang : > +1 > > 2018-06-01 9:21 GMT-07:00 Xiangrui Meng : > >> Hi all, >> >> I want to call for a vote of SPARK-24374 >> . It introduces a new >> execution mode to Spark, which would help both integration

Re: [VOTE] Spark 2.3.1 (RC3)

2018-06-01 Thread Xiao Li
ted that and skipped this whole thread. >> >> This vote is canceled. I'll prepare a new RC right away. I hope this >> does not happen again. >> >> >> On Fri, Jun 1, 2018 at 1:20 PM, Xiao Li wrote: >> > Sorry, I need to say -1 >> > >> > Thi

Re: [VOTE] Spark 2.3.1 (RC3)

2018-06-01 Thread Xiao Li
fo, but > this does seem like too much rapid change at this stage of an RC. > > On Fri, Jun 1, 2018 at 3:20 PM Xiao Li wrote: > >> Sorry, I need to say -1 >> >> This morning, just found a regression in 2.3.1 and reverted >> https://github.com/apache/spark/pull/2144

Re: [VOTE] Spark 2.3.1 (RC3)

2018-06-01 Thread Xiao Li
Sorry, I need to say -1 This morning, just found a regression in 2.3.1 and reverted https://github.com/apache/spark/pull/21443 Xiao 2018-06-01 13:09 GMT-07:00 Marcelo Vanzin : > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > Given that I expect at least a

Re: [VOTE] Spark 2.3.1 (RC2)

2018-05-23 Thread Xiao Li
-1 Yeah, we should fix it in Spark 2.3.1. https://issues.apache.org/jira/browse/SPARK-24257 is a correctness bug. The PR can be merged soon. Thus, let us have another RC? Thanks, Xiao 2018-05-23 8:04 GMT-07:00 chenliang613 : > Hi > > Agree with Wenchen, it is better

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Xiao Li
-1 We have a correctness bug fix that was merged after 2.3 RC1. It would be nice to have that in Spark 2.3.1 release. https://issues.apache.org/jira/browse/SPARK-24259 Xiao 2018-05-15 14:00 GMT-07:00 Marcelo Vanzin : > Please vote on releasing the following candidate as

Re: SparkR test failures in PR builder

2018-05-03 Thread Xiao Li
Thank you for working on this. It helps a lot! Xiao 2018-05-03 8:42 GMT-07:00 Felix Cheung : > This is resolved. > > Please see https://issues.apache.org/jira/browse/SPARK-24152 > > -- > *From:* Kazuaki Ishizaki >

Re: [build system] jenkins master unreachable, build system currently down

2018-05-01 Thread Xiao Li
Thank you very much, Shane! Yeah, it works now! Xiao 2018-05-01 8:40 GMT-07:00 shane knapp : > and we're back! there was apparently a firewall migration yesterday that > went sideways. > > shane > > On Mon, Apr 30, 2018 at 8:27 PM, shane knapp wrote:

Re: [build system] jenkins master unreachable, build system currently down

2018-04-30 Thread Xiao Li
Hi, Shane, Thank you! Xiao 2018-04-30 20:27 GMT-07:00 shane knapp : > we just noticed that we're unable to connect to jenkins, and have reached > out to our NOC support staff at our colo. until we hear back, there's > nothing we can do. > > i'll update the list as soon as

Re: Maintenance releases for SPARK-23852?

2018-04-16 Thread Xiao Li
Yes, it sounds good to me. We can upgrade both Parquet 1.8.2 to 1.8.3 and ORC 1.4.1 to 1.4.3 in our upcoming Spark 2.3.1 release. Thanks for your efforts! @Henry and @Dongjoon Xiao 2018-04-16 14:41 GMT-07:00 Henry Robinson : > Seems like there aren't any objections. I'll pick

Re: 回复: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Xiao Li
Congratulations! Xiao On Mon, Apr 2, 2018 at 4:57 AM Hadrien Chicault wrote: > Congrats > > Le lun. 2 avr. 2018 à 12:06, Weichen Xu a > écrit : > >> Congrats Zhenhua! >> >> On Mon, Apr 2, 2018 at 5:32 PM, Gengliang

Re: Welcoming some new committers

2018-03-02 Thread Xiao Li
Congrats and welcome! 2018-03-02 14:49 GMT-08:00 Holden Karau : > Congratulations and welcome everyone! So excited to see the project grow > our committer base. > > On Mar 2, 2018 2:42 PM, "Reynold Xin" wrote: > >> Congrats and welcome! >> >> >> On

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-24 Thread Xiao Li
+1 (binding) in Spark SQL, Core and PySpark. Xiao 2018-02-24 14:49 GMT-08:00 Ricardo Almeida : > +1 (non-binding) > > same as previous RC > > On 24 February 2018 at 11:10, Hyukjin Kwon wrote: > >> +1 >> >> 2018-02-24 16:57 GMT+09:00 Bryan

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Xiao Li
Hi, Ryan, In this release, Data Source V2 is experimental. We are still collecting the feedbacks from the community and will improve the related APIs and implementation in the next 2.4 release. Thanks, Xiao 2018-02-21 9:43 GMT-08:00 Xiao Li <gatorsm...@gmail.com>: > Hi, Justin,

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Xiao Li
Hi, Justin, Based on my understanding, SPARK-17147 is also not a regression. Thus, Spark 2.3.0 is unable to contain it. We have to wait for the committers who are familiar with Spark Streaming to make a decision whether we can fix the issue in Spark 2.3.1. Since this is open source, feel free to

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-21 Thread Xiao Li
Hi, Ryan, Thank you for bringing it up. Since it is in the RC4 already, we only can accept the regression fixes in the 2.3 branch. This is also the strategy in the previous Spark releases. Data source APIs V2 is newly introduced in this release. In this stage, we are unable to accept any change

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Xiao Li
+1. So far, no function/performance regression in Spark SQL, Core and PySpark. Thanks! Xiao 2018-02-19 19:47 GMT-08:00 Hyukjin Kwon : > Ah, I see. For 1), I overlooked Felix's input here. I couldn't foresee > this when I added this documentation because it worked in my

Re: data source v2 online meetup

2018-01-31 Thread Xiao Li
Hi, Ryan, wow, your Iceberg already used data source V2 API! That is pretty cool! I am just afraid these new APIs are not stable. We might deprecate or change some data source v2 APIs in the next version (2.4). Sorry for the inconvenience it might introduce. Thanks for your feedback always,

Re: no-reopen-closed?

2018-01-31 Thread Xiao Li
>> >>> Yeah you'd have to create a new one. You could link the two. >>> >>> >>> On Sat, Jan 27, 2018, 7:07 PM Xiao Li <gatorsm...@gmail.com> wrote: >>> >>>> Hi, Sean, >>>> >>>> Thanks for your quick reply. F

Re: PSA: Release and commit quality

2018-01-30 Thread Xiao Li
Hi, Ryan, Thanks for your inputs. These comments are pretty helpful! Please continue to help us improve Spark and Spark community. Thanks again, Xiao 2018-01-30 12:58 GMT-08:00 Ryan Blue : > Hi everyone, > > I’ve noticed some questionable practices around commits

Re: no-reopen-closed?

2018-01-27 Thread Xiao Li
on reopen a JIRA > over and over despite being told not to. We changed the workflow such that > Closed can't become Reopened. > > I would not move anything to Closed unless you need it to be permanent for > reasons like that. Resolved is the normal end state of JIRAs. > > On Sat

no-reopen-closed?

2018-01-27 Thread Xiao Li
Unable to reopen the closed JIRA? I am wondering if anybody changed the workflow? Thanks, Xiao

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-23 Thread Xiao Li
+1 Xiao Li 2018-01-23 9:44 GMT-08:00 Marcelo Vanzin <van...@cloudera.com>: > On Tue, Jan 23, 2018 at 7:01 AM, Sean Owen <so...@cloudera.com> wrote: > > I'm not seeing that same problem on OS X and /usr/bin/tar. I tried > unpacking > > it with 'xvzf' and also unzip

Re: Decimals

2017-12-21 Thread Xiao Li
Losing precision is not acceptable to financial customers. Thus, instead of returning NULL, I saw DB2 issues the following error message: SQL0802N Arithmetic overflow or other arithmetic exception occurred. SQLSTATE=22003 DB2 on z/OS is being used by most of biggest banks and financial

Re: [VOTE] Spark 2.2.1 (RC1)

2017-11-15 Thread Xiao Li
Another issue https://issues.apache.org/jira/browse/SPARK-22479 is also critical for security. We should also merge it to 2.2.1? 2017-11-15 9:12 GMT-08:00 Xiao Li <gatorsm...@gmail.com>: > Hi, Felix, > > https://issues.apache.org/jira/browse/SPARK-22469 > > Maybe also in

Re: [VOTE] Spark 2.2.1 (RC1)

2017-11-15 Thread Xiao Li
Hi, Felix, https://issues.apache.org/jira/browse/SPARK-22469 Maybe also include this regression of 2.2? It works in 2.1 Thanks, Xiao 2017-11-14 22:25 GMT-08:00 Felix Cheung : > Please vote on releasing the following candidate as Apache Spark version > 2.2.1. The

Re: [VOTE][SPIP] SPARK-22026 data source v2 write path

2017-10-11 Thread Xiao Li
+1 Xiao On Mon, 9 Oct 2017 at 7:31 PM Reynold Xin wrote: > +1 > > One thing with MetadataSupport - It's a bad idea to call it that unless > adding new functions in that trait wouldn't break source/binary > compatibility in the future. > > > On Mon, Oct 9, 2017 at 6:07 PM,

Re: Welcoming Tejas Patil as a Spark committer

2017-10-02 Thread Xiao Li
Congratulations! Xiao 2017-10-02 10:47 GMT-07:00 Tejas Patil : > Thanks everyone !!! It's a great privilege to be part of the Spark > community. > > ~tejasp > > On Sat, Sep 30, 2017 at 2:27 PM, Jacek Laskowski wrote: > >> Hi, >> >> Oh, yeah. Seen

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-16 Thread Xiao Li
This is a bug introduced in 2.1. It works fine in 2.0 2017-09-16 16:15 GMT-07:00 Holden Karau <hol...@pigscanfly.ca>: > Ok :) Was this working in 2.1.1? > > On Sat, Sep 16, 2017 at 3:59 PM Xiao Li <gatorsm...@gmail.com> wrote: > >> Still -1 >> >> Unable

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-16 Thread Xiao Li
Still -1 Unable to pass the tests in my local environment. Open a JIRA https://issues.apache.org/jira/browse/SPARK-22041 - SPARK-16625: General data types to be mapped to Oracle *** FAILED *** types.apply(9).equals(org.apache.spark.sql.types.DateType) was false

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-15 Thread Xiao Li
Sorry, this release candidate is 2.1.2. The issue is in 2.2.1. 2017-09-15 14:21 GMT-07:00 Xiao Li <gatorsm...@gmail.com>: > -1 > > See the discussion in https://github.com/apache/spark/pull/19074 > > Xiao > > > > 2017-09-15 12:28 GMT-07:00 Holden Karau <hol..

Re: [VOTE] Spark 2.1.2 (RC1)

2017-09-15 Thread Xiao Li
-1 See the discussion in https://github.com/apache/spark/pull/19074 Xiao 2017-09-15 12:28 GMT-07:00 Holden Karau : > That's a good question, I built the release candidate however the Jenkins > scripts don't take a parameter for configuring who signs them rather it >

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Xiao Li
+1 Xiao On Mon, 11 Sep 2017 at 6:44 PM Matei Zaharia wrote: > +1 (binding) > > > On Sep 11, 2017, at 5:54 PM, Hyukjin Kwon wrote: > > > > +1 (non-binding) > > > > > > 2017-09-12 9:52 GMT+09:00 Yin Huai : > > +1 > > > > On Mon,

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2 read path

2017-09-06 Thread Xiao Li
+1 Xiao 2017-09-06 19:37 GMT-07:00 Wenchen Fan : > adding my own +1 (binding) > > On Thu, Sep 7, 2017 at 10:29 AM, Wenchen Fan wrote: > >> Hi all, >> >> In the previous discussion, we decided to split the read and write path >> of data source v2 into 2

Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-31 Thread Xiao Li
Congratulations! Xiao 2017-08-31 9:38 GMT-07:00 Imran Rashid : > Congrats Jerry! > > On Mon, Aug 28, 2017 at 8:28 PM, Matei Zaharia > wrote: > >> Hi everyone, >> >> The PMC recently voted to add Saisai (Jerry) Shao as a committer. Saisai >> has

Updates on migration guides

2017-08-30 Thread Xiao Li
a dedicated page for migration guides of all the components. Hopefully, this can assist the migration efforts. Thanks, Xiao Li

Re: [VOTE] [SPIP] SPARK-15689: Data Source API V2

2017-08-28 Thread Xiao Li
+1 2017-08-28 12:45 GMT-07:00 Cody Koeninger : > Just wanted to point out that because the jira isn't labeled SPIP, it > won't have shown up linked from > > http://spark.apache.org/improvement-proposals.html > > On Mon, Aug 28, 2017 at 2:20 PM, Wenchen Fan

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-08 Thread Xiao Li
Congrats! On Mon, 7 Aug 2017 at 10:21 PM Takuya UESHIN wrote: > Congrats! > > On Tue, Aug 8, 2017 at 11:38 AM, Felix Cheung > wrote: > >> Congrats!! >> >> -- >> *From:* Kevin Kim (Sangwoo) >>

Re: [SQL] Syntax "case when" doesn't be supported in JOIN

2017-07-17 Thread Xiao Li
same thing for GroupBy non-deterministic. From Map-Reduce >> point >> >>> of >> >>> view, Join is also GroupBy in essence . >> >>> >> >>> @Liang Chi Hsieh >> >>> https://plus.google.com/u/0/10317936259208565073

Re: [SQL] Syntax "case when" doesn't be supported in JOIN

2017-07-16 Thread Xiao Li
If the join condition is non-deterministic, pushing it down to the underlying project will change the semantics. Thus, we are unable to do it in PullOutNondeterministic. Users can do it manually if they do not care the semantics difference. Thanks, Xiao 2017-07-16 20:07 GMT-07:00 Chang Chen

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-07 Thread Xiao Li
+1 Xiao Li 2017-07-06 22:18 GMT-07:00 Yin Huai <yh...@databricks.com>: > +1 > > On Thu, Jul 6, 2017 at 8:40 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote: > >> +1 >> >> 2017-07-07 6:41 GMT+09:00 Reynold Xin <r...@databricks.com>: >> >&

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-20 Thread Xiao Li
Found another bug about the case preserving of column names of persistent views. This regression was introduced in 2.2. https://issues.apache.org/jira/browse/SPARK-21150 Thanks, Xiao 2017-06-19 8:03 GMT-07:00 Liang-Chi Hsieh : > > I mean it is not a bug has been fixed before

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-13 Thread Xiao Li
-1 Spark 2.2 is unable to read the partitioned table created by Spark 2.1 or earlier. Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085 Will fix it soon. Thanks, Xiao Li 2017-06-13 9:39 GMT-07:00 Joseph Bradley <jos...@databricks.com>: > Re: the QA JIRAs:

Re: Uploading PySpark 2.1.1 to PyPi

2017-05-26 Thread Xiao Li
version string in 2.2.1, but rest assured this isn't something > I've lost track of. > > On Wed, May 24, 2017 at 12:11 AM Xiao Li <gatorsm...@gmail.com> wrote: > >> Hi, Holden, >> >> Based on the PR, https://github.com/pypa/packaging-problems/issues/90 , >> th

Re: Uploading PySpark 2.1.1 to PyPi

2017-05-23 Thread Xiao Li
Hi, Holden, Based on the PR, https://github.com/pypa/packaging-problems/issues/90 , the limit has been increased to 250MB. Just wondering if we can publish PySpark to PyPI now? Have you created the account? Thanks, Xiao Li 2017-05-12 11:35 GMT-07:00 Sameer Agarwal <sam...@databricks.

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-03-31 Thread Xiao Li
+1 Xiao 2017-03-30 16:09 GMT-07:00 Michael Armbrust : > Please vote on releasing the following candidate as Apache Spark version > 2.1.0. The vote is open until Sun, April 2nd, 2018 at 16:30 PST and > passes if a majority of at least 3 +1 PMC votes are cast. > > [ ] +1

Re: Strange behavior with 'not' and filter pushdown

2017-02-13 Thread Xiao Li
https://github.com/apache/spark/pull/16894 Already backported to Spark 2.0 Thanks! Xiao 2017-02-13 17:41 GMT-08:00 Takeshi Yamamuro : > cc: xiao > > IIUC a xiao's commit below fixed this issue in master. >

Re: Spark Improvement Proposals

2017-02-11 Thread Xiao Li
need to do it phase by phase or sometimes they have to accept the workarounds. That is the reality everyone has to face, I think. Thanks, Xiao Li 2017-02-11 7:57 GMT-08:00 Cody Koeninger <c...@koeninger.org>: > At the spark summit this week, everyone from PMC members to users I had &g

Re: A question about creating persistent table when in-memory catalog is used

2017-01-23 Thread Xiao Li
such a capability. Thus, we need to see how to resolve this. Hopefully, it answers your question. BTW, the issue you mentioned at the beginning has been resolved. Please fetch the latest master. You are unable to create such a hive serde table without Hive support. Thanks, Xiao Li 2017-01-23 0

Re: A question about creating persistent table when in-memory catalog is used

2017-01-22 Thread Xiao Li
f catalog does not impact the functionality > in Spark other than where the catalog is stored. > > > On Sun, Jan 22, 2017 at 11:18 AM Xiao Li <gatorsm...@gmail.com> wrote: > >> We have a pending PR to block users to create the Hive serde table when >> using InMemroyCatalog. Se

Re: A question about creating persistent table when in-memory catalog is used

2017-01-22 Thread Xiao Li
is whether the metadata is persistently stored or not. Thanks, Xiao Li 2017-01-22 11:14 GMT-08:00 Reynold Xin <r...@databricks.com>: > I think this is something we are going to change to completely decouple > the Hive support and catalog. > > > On Sun, Jan 22, 2017 at 4:51 AM

Re: Parquet patch release

2017-01-06 Thread Xiao Li
Hi, Ryan, Really thank you for your help! Happy New Year! Xiao Li 2017-01-06 15:46 GMT-08:00 Ryan Blue <rb...@netflix.com.invalid>: > Last month, there was interest in a Parquet patch release on PR #16281 > <https://github.com/apache/spark/pull/16281>. I went ahead and

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-16 Thread Xiao Li
+1 Xiao Li 2016-12-16 12:19 GMT-08:00 Felix Cheung <felixcheun...@hotmail.com>: > For R we have a license field in the DESCRIPTION, and this is standard > practice (and requirement) for R packages. > > https://cran.r-project.org/doc/manuals/R-e

Re: issues with github pull request notification emails missing

2016-11-17 Thread Xiao Li
Just FYI, normally, when we ping a people, the github can show the full name after we type the github id. Below is an example: [image: 内嵌图片 2] Starting from last week, Reynold's full name is not shown. Does github update their hash functions? [image: 内嵌图片 1] Thanks, Xiao Li 2016-11-16 23

Re: LIMIT issue of SparkSQL

2016-10-23 Thread Xiao Li
2.0/sql/ > core/src/main/scala/org/apache/spark/sql/execution/limit.scala > But during query plan generation, GlobalLimit / LocalLimit is not applied > to the query plan. > > Could you please help us to inspect LIMIT problem? > Thanks. > > Best, > Liz > > On 23 Oct 2016, at

Re: LIMIT issue of SparkSQL

2016-10-23 Thread Xiao Li
Hi, Liz, CollectLimit means `Take the first `limit` elements and collect them to a single partition.` Thanks, Xiao 2016-10-23 5:21 GMT-07:00 Ran Bai : > Hi all, > > I found the runtime for query with or without “LIMIT” keyword is the same. > We looked into it and found

Re: Quotes within a table name (phoenix table) getting failure: identifier expected at Spark level parsing

2016-10-10 Thread Xiao Li
gt; > at org.apache.phoenix.jdbc.PhoenixPreparedStatement.( > PhoenixPreparedStatement.java:94) > > at org.apache.phoenix.jdbc.PhoenixConnection.prepareStatement( > PhoenixConnection.java:714) > > > It appears that Phoenix and Spark's query parsers are in disagreement. > > Any ideas?

Re: Quotes within a table name (phoenix table) getting failure: identifier expected at Spark level parsing

2016-10-10 Thread Xiao Li
HI, Nico, We use back ticks to quote it. For example, CUSTOM_ENTITY.`z02` Thanks, Xiao Li 2016-10-10 12:49 GMT-07:00 Nico Pappagianis <nico.pappagia...@salesforce.com >: > Hello, > > *Some context:* > I have a Phoenix tenant-specific view named CUSTOM_ENTITY."z02"

<    1   2   3   4   >