Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread Hyukjin Kwon
Couple of flaky tests can happen. It's usual. Seems it got better now at least. I will keep monitoring the builds. 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성: > Looks like Jenkins isn't stable still. My PR fails two times in a row: > >

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread Hyukjin Kwon
Thank you Shane. 2020년 7월 10일 (금) 오전 2:35, shane knapp ☠ 님이 작성: > and -06 is back! i'll keep an eye on things today, but suffice to say > on each worker i: > > 1) rebooted > 2) cleaned ~/.ivy2, ~/.m2, and other associated caches > > we should be g2g! please reply here if you continue to

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-08 Thread Hyukjin Kwon
Thanks Shane! BTW, it's getting serious .. e.g) https://github.com/apache/spark/pull/28969 . The tests could not pass in 7 days .. Hopefully restarting the machines will make the current situation better :-) Separately, I am working on a PR to run the Spark tests in Github Actions. We could

Re: m2 cache issues in Jenkins?

2020-07-05 Thread Hyukjin Kwon
s, please) in Github comment. > > On Thu, Jul 2, 2020 at 2:12 PM Hyukjin Kwon wrote: > >> Ah, okay. Actually there already is - >> https://issues.apache.org/jira/browse/SPARK-31693. I am reopening. >> >> 2020년 7월 2일 (목) 오후 2:06, Holden Karau 님이 작성: >> >

Re: Jenkins is down

2020-07-05 Thread Hyukjin Kwon
0 minutes ago. >> >> Bests, >> Dongjoon. >> >> >> On Fri, Jul 3, 2020 at 4:43 AM Hyukjin Kwon wrote: >> >>> Hi all and Shane, >>> >>> Is there something wrong with the Jenkins machines? Seems they are down. >>> >> > > -- > Shane Knapp > Computer Guy / Voice of Reason > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu >

Jenkins is down

2020-07-03 Thread Hyukjin Kwon
Hi all and Shane, Is there something wrong with the Jenkins machines? Seems they are down.

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-02 Thread Hyukjin Kwon
gt;> Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if >>> folks really can’t upgrade there’s conda. >>> >>> Is there anyone with a large Python 3.5 fleet who can’t use conda? >>> >>> On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon wrote

Re: m2 cache issues in Jenkins?

2020-07-01 Thread Hyukjin Kwon
Ah, okay. Actually there already is - https://issues.apache.org/jira/browse/SPARK-31693. I am reopening. 2020년 7월 2일 (목) 오후 2:06, Holden Karau 님이 작성: > We don't I didn't file one originally, but Shane reminded me to in the > future. > > On Wed, Jul 1, 2020 at 9:44 PM Hyukjin

Re: m2 cache issues in Jenkins?

2020-07-01 Thread Hyukjin Kwon
Nope, do we have an existing ticket? I think we can reopen if there is. 2020년 7월 2일 (목) 오후 1:43, Holden Karau 님이 작성: > Huh interesting that it’s the same worker. Have you filed a ticket to > Shane? > > On Wed, Jul 1, 2020 at 8:50 PM Hyukjin Kwon wrote: > >> Hm .. seems th

Re: m2 cache issues in Jenkins?

2020-07-01 Thread Hyukjin Kwon
Hm .. seems this is happening again in amp-jenkins-worker-04 ;(. 2020년 6월 25일 (목) 오전 3:15, shane knapp ☠ 님이 작성: > done: > -bash-4.1$ cd .m2 > -bash-4.1$ ls > repository > -bash-4.1$ time rm -rf * > > real17m4.607s > user0m0.950s > sys 0m18.816s > -bash-4.1$ > > On Wed, Jun 24, 2020

Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-01 Thread Hyukjin Kwon
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we should make such changes in maintenance releases 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성: > To be clear the plan is to drop them in Spark 3.1 onwards, yes? > > On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon wrote

[DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-01 Thread Hyukjin Kwon
Hi all, I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I assume people support it in general but I am writing this to make sure everybody is happy. Fokko made a very good investigation on it, see

Re: [VOTE] Decommissioning SPIP

2020-07-01 Thread Hyukjin Kwon
+1 2020년 7월 2일 (목) 오전 10:08, Marcelo Vanzin 님이 작성: > I reviewed the docs and PRs from way before an SPIP was explicitly > asked, so I'm comfortable with giving a +1 even if I haven't really > fully read the new document, > > On Wed, Jul 1, 2020 at 6:05 PM Holden Karau wrote: > > > > Hi Spark

Re: [DISCUSS][SPIP] Graceful Decommissioning

2020-06-25 Thread Hyukjin Kwon
Thank you so much, Holden. PS: I cc'ed some people who might be interested in this too FYI. 2020년 6월 26일 (금) 오전 11:26, Holden Karau 님이 작성: > At the recommendation of Hyukjin, I'm converting the graceful > decommissioning work to an SPIP. The SPIP document is at >

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2020-06-25 Thread Hyukjin Kwon
I dont have a strong opinion on changing default too but I also a little bit more prefer to have the option to switch Hadoop version first just to stay safer. To be clear, we're more now discussing about the timing about when to set Hadoop 3.0.0 by default, and which change has to be first,

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Hyukjin Kwon
+1. Just as a note, - SPARK-31918 is fixed now, and there's no blocker. - When we build SparkR, we should use the latest R version at least 4.0.0+. 2020년 6월 24일 (수) 오전 11:20, Dongjoon Hyun 님이 작성: > +1 > > Bests, > Dongjoon. > > On Tue, Jun 23,

Re: Initial Decom PR for Spark 3?

2020-06-22 Thread Hyukjin Kwon
On Sun, 21 Jun 2020 at 19:05, Hyukjin Kwon wrote: > >> Yeah, I believe the community decided to do a SPIP for such significant >> changes. It would be best if we stick to the standard approaches. >> >> 2020년 6월 21일 (일) 오전 8:52, Holden Karau 님이 작성: >> >>&g

Re: Initial Decom PR for Spark 3?

2020-06-21 Thread Hyukjin Kwon
:23 PM Stephen Boesch wrote: > >> Hi given there is a design doc (contrary to that common) - is this going >> to move forward? >> >> On Thu, 18 Jun 2020 at 18:05, Hyukjin Kwon wrote: >> >>> Looks it had to be with SPIP and a proper design doc to discuss. >

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Hyukjin Kwon
Yay! 2020년 6월 19일 (금) 오전 4:46, Mridul Muralidharan 님이 작성: > Great job everyone ! Congratulations :-) > > Regards, > Mridul > > On Thu, Jun 18, 2020 at 10:21 AM Reynold Xin wrote: > >> Hi all, >> >> Apache Spark 3.0.0 is the first release of the 3.x line. It builds on >> many of the innovations

Re: Initial Decom PR for Spark 3?

2020-06-18 Thread Hyukjin Kwon
Looks it had to be with SPIP and a proper design doc to discuss. 2020년 2월 9일 (일) 오전 1:23, Erik Erlandson 님이 작성: > I'd be willing to pull this in, unless others have concerns post > branch-cut. > > On Tue, Feb 4, 2020 at 2:51 PM Holden Karau wrote: > >> Hi Y’all, >> >> I’ve got a K8s graceful

Re: [ANNOUNCE] Apache Spark 2.4.6 released

2020-06-10 Thread Hyukjin Kwon
Yay! 2020년 6월 11일 (목) 오전 10:38, Holden Karau 님이 작성: > We are happy to announce the availability of Spark 2.4.6! > > Spark 2.4.6 is a maintenance release containing stability, correctness, > and security fixes. > This release is based on the branch-2.4 maintenance branch of Spark. We > strongly

Re: Quick sync: what goes in migration guide vs release notes?

2020-06-10 Thread Hyukjin Kwon
I think the proposal doesn't mean to don't add the JIRAs with release-notes into the release notes (?). People will still label the JIRAs when the change is significant or breaking whether it's a bug or not, and they will be in the release notes. I guess the proposal TL;DR is: - If that's a

Re: [vote] Apache Spark 3.0 RC3

2020-06-09 Thread Hyukjin Kwon
+1 2020년 6월 9일 (화) 오후 3:16, Xiao Li 님이 작성: > +1 (binding) > > Xiao > > On Mon, Jun 8, 2020 at 10:13 PM Xingbo Jiang > wrote: > >> +1(non-binding) >> >> Jiaxin Shan 于2020年6月8日 周一下午9:50写道: >> >>> +1 >>> I build binary using the following command, test spark workloads on >>> Kubernetes (AWS EKS)

Re: Build time limit in PR builder

2020-05-28 Thread Hyukjin Kwon
I remember we were able to cut down pretty considerably in the past. For example, I investigated ( https://github.com/apache/spark/pull/21822#issuecomment-407295739) and fixed some before at, like https://github.com/apache/spark/pull/23111. Maybe we could skim again to reduce the build/testing

Build time limit in PR builder

2020-05-28 Thread Hyukjin Kwon
Hi all, Seems we're hitting the time limit in PR builders (see https://github.com/apache/spark/pull/28627), in particular wen it's Maven build which takes more time compared to SBT in general. Should we maybe increase the PR builder a bit more (10 ~ 20 mins?) to unblock these PRs and focus on

Re: [VOTE] Apache Spark 3.0 RC2

2020-05-22 Thread Hyukjin Kwon
Ryan, > I'm fine with the commit, other than the fact that it violated ASF norms to commit without waiting for a review. Looks it became the different proposal as you and other people discussed and suggested there, which you didn't technically vote

Re: [build system] jenkins rebooting now

2020-05-14 Thread Hyukjin Kwon
Thanks Shane. On Fri, 15 May 2020, 02:29 Dongjoon Hyun, wrote: > Thank you so much, Shane! > > > On Thu, May 14, 2020 at 9:51 AM Xiao Li wrote: > >> Thank you, Shane! >> >> On Thu, May 14, 2020 at 9:50 AM shane knapp ☠ >> wrote: >> >>> we're back. doesn't seem to have fixed the issue of the

Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-11 Thread Hyukjin Kwon
we should be putting API >> policy on that page, it should live on an Apache Spark page. >> >> I think if you want to implement an API policy like this it should go >> through an official vote thread, not just a discuss thread where we have >> not had a lot of f

Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-11 Thread Hyukjin Kwon
ent an API policy like this it should go > through an official vote thread, not just a discuss thread where we have > not had a lot of feedback on it. > > Tom > > > > On Monday, May 11, 2020, 06:44:31 AM CDT, Hyukjin Kwon < > gurwls...@gmail.com> wrote: >

Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-11 Thread Hyukjin Kwon
I will wait a couple of more days and if there's no objection I hear, I will document this at https://github.com/databricks/scala-style-guide#java-interoperability. 2020년 5월 7일 (목) 오후 9:18, Hyukjin Kwon 님이 작성: > Hi all, I would like to proceed this. Are there more thoughts on this? If >

Re: [DISCUSS] Java specific APIs design concern and choice

2020-05-07 Thread Hyukjin Kwon
Hi all, I would like to proceed this. Are there more thoughts on this? If not, I would like to go ahead with the proposal here. 2020년 4월 30일 (목) 오후 10:54, Hyukjin Kwon 님이 작성: > Nothing is urgent. I just don't want to leave it undecided and just keep > adding Java APIs inconsistently a

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-30 Thread Hyukjin Kwon
gt; Scala types conversions by self when Java programmers prepare to > invoke Scala libraries. I'm not sure which one is the Java programmers' > root complaint, Scala type instance or Scala Jar file. > > My 2 cents. > > -- > Cheers, > -z > > On Thu, 30 Apr 20

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2020-04-29 Thread Hyukjin Kwon
Let actually me just take a look by myself and bring some updates soon. 2020년 4월 30일 (목) 오전 9:13, Hyukjin Kwon 님이 작성: > WDYT @Josh Rosen ? > Seems > https://github.com/databricks/spark-pr-dashboard/blob/1e799c9e510fa8cdc9a6c084a777436bebeabe10/sparkprs/controllers/tasks.py#

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-29 Thread Hyukjin Kwon
buy the argument about Scala/Java friendly because using Java instance is already documented in the official Scala documentation. Users still need to search if we have Java specific methods for *some* APIs. 2020년 4월 30일 (목) 오전 8:58, Hyukjin Kwon 님이 작성: > Hm, I thought you meant you prefer 3. o

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2020-04-29 Thread Hyukjin Kwon
link from a Jira > ticket to the PRs that mention that ticket. I don't think it will update > the ticket's status, though. > > Would you like me to file a ticket with Infra and see what they say? > > On Tue, Apr 28, 2020 at 12:21 AM Hyukjin Kwon wrote: > >> Maybe it's t

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-29 Thread Hyukjin Kwon
a vote or just waiting to get more feedback? I disagree > with saying option 4 is the rule but agree having a general rule makes > sense. I think we need a lot more input to make the rule as it affects the > api's. > > Tom > > On Wednesday, April 29, 2020, 09:53:22 AM C

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-29 Thread Hyukjin Kwon
후 5:03, Hyukjin Kwon 님이 작성: > Spark has targeted to have a unified API set rather than having separate > Java classes to reduce the maintenance cost, > e.g.) JavaRDD <> RDD vs DataFrame. These JavaXXX are more about the legacy. > > I think it's best to stick to the approach 4

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-28 Thread Hyukjin Kwon
cala` or `.asJava`'s help if Java API > is not ready. Then switch to Java API when it's well cooked. > > The cons is more efforts to maintain. > > My 2 cents. > > -- > Cheers, > -z > > On Tue, 28 Apr 2020 12:07:36 +0900 > Hyukjin Kwon wrote: > > > The problem is

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2020-04-27 Thread Hyukjin Kwon
ation instead. We use it > at my day job, for example. > > On Fri, Apr 24, 2020 at 12:39 AM Hyukjin Kwon wrote: > >> Hi all, >> >> Seems like this github_jira_sync.py >> <https://github.com/apache/spark/blob/master/dev/github_jira_sync.py> script >> seems

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
ections.html > [2] > https://www.scala-lang.org/api/2.13.0/scala/jdk/javaapi/CollectionConverters$.html > [3] > https://www.scala-lang.org/api/2.13.0/scala/jdk/CollectionConverters$.html > [4] > https://www.scala-lang.org/api/2.12.11/scala/collection/convert/ImplicitConversionsToJava

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
is closer to what Spark has targeted so far. 2020년 4월 28일 (화) 오전 8:34, Hyukjin Kwon 님이 작성: > > One thing we could do here is use Java collections internally and make > the Scala API a thin wrapper around Java -- like how Python works. > > Then adding a method to the Scala

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
> into internals. > > On Mon, Apr 27, 2020 at 8:49 AM Hyukjin Kwon wrote: > >> Let's stick to the less maintenance efforts then rather than we leave it >> undecided and delay with leaving this inconsistency. >> >> I dont think we can have some very meaningful dat

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
ption 3 or 4. We may need to > collect more data points from actual users. > > On Mon, Apr 27, 2020 at 9:50 PM Hyukjin Kwon wrote: > >> Scala users are arguably more prevailing compared to Java users, yes. >> Using the Java instances in Scala side is legitimate, and they a

Re: [DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
t; On Monday, April 27, 2020, 04:04:28 AM CDT, Hyukjin Kwon < > gurwls...@gmail.com> wrote: > > > Hi all, > > I would like to discuss Java specific APIs and which design we will choose. > This has been discussed in multiple places so far, for example, at > https://

[DISCUSS] Java specific APIs design concern and choice

2020-04-27 Thread Hyukjin Kwon
Hi all, I would like to discuss Java specific APIs and which design we will choose. This has been discussed in multiple places so far, for example, at https://github.com/apache/spark/pull/28085#discussion_r407334754 *The problem:* In short, I would like us to have clear guidance on how we

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2020-04-23 Thread Hyukjin Kwon
hich JIRA is in progress with a PR or not. 2019년 7월 26일 (금) 오후 1:20, Hyukjin Kwon 님이 작성: > Just FYI, I had to come up with a better JQL to filter out the JIRAs that > already have linked PRs. > In case it helps someone, I use this JQL now to look through the open > JIRAs: >

Re: Automatic PR labeling

2020-04-13 Thread Hyukjin Kwon
Thanks! 2020년 4월 14일 (화) 오전 7:42, Jungtaek Lim 님이 작성: > Nice addition, looks pretty good! > > On Tue, Apr 14, 2020 at 1:17 AM Xiao Li wrote: > >> Looks great! >> >> Thanks for making this happen. This is pretty helpful. >> >> Xiao >> >> O

Re: Automatic PR labeling

2020-04-13 Thread Hyukjin Kwon
Okay, now it started to work. Let's see if it works well! 2020년 4월 3일 (금) 오전 11:41, Hyukjin Kwon 님이 작성: > Seems like this email missed to cc the mailing list, forwarding it for > trackability. > > -- Forwarded message - > 보낸사람: Ismaël Mejía > Date: 2020년

Fwd: Automatic PR labeling

2020-04-02 Thread Hyukjin Kwon
Seems like this email missed to cc the mailing list, forwarding it for trackability. -- Forwarded message - 보낸사람: Ismaël Mejía Date: 2020년 4월 2일 (목) 오후 4:46 Subject: Re: Automatic PR labeling To: Hyukjin Kwon +1 Just for ref there is a really simple Github App for this: https

Re: Automatic PR labeling

2020-04-02 Thread Hyukjin Kwon
Awesome! 2020년 4월 3일 (금) 오전 7:13, Nicholas Chammas 님이 작성: > SPARK-31330 <https://issues.apache.org/jira/browse/SPARK-31330>: > Automatically label PRs based on the paths they touch > > On Wed, Apr 1, 2020 at 11:34 PM Hyukjin Kwon wrote: > >> @Nicholas Chamm

Re: Automatic PR labeling

2020-04-01 Thread Hyukjin Kwon
@Nicholas Chammas Would you be interested in tacking a look? I would love this to be done. 2020년 3월 25일 (수) 오전 10:30, Hyukjin Kwon 님이 작성: > That should be cool. There were a bit of discussions about which account > should label. If we can replace it, I think it sounds great! > > 202

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Hyukjin Kwon
> 2) check with older versions to fill up affects version for bug I don't agree with this in general. To me usually it's "For the type of bug, assign one valid version" instead. > The only place where I can see some amount of investigation being required would be for security issues or

Re: Automatic PR labeling

2020-03-24 Thread Hyukjin Kwon
That should be cool. There were a bit of discussions about which account should label. If we can replace it, I think it sounds great! 2020년 3월 25일 (수) 오전 5:08, Nicholas Chammas 님이 작성: > Public Service Announcement: There is a GitHub action that lets you > automatically label PRs based on what

Re: [DISCUSS] Null-handling of primitive-type of untyped Scala UDF in Scala 2.12

2020-03-17 Thread Hyukjin Kwon
Option 2 seems fine to me. 2020년 3월 17일 (화) 오후 3:41, Wenchen Fan 님이 작성: > I don't think option 1 is possible. > > For option 2: I think we need to do it anyway. It's kind of a bug that the > typed Scala UDF doesn't support case class that thus can't support > struct-type input columns. > > For

Re: Auto-linking from PRs to Jira tickets

2020-03-11 Thread Hyukjin Kwon
Cool, nice! 2020년 3월 12일 (목) 오전 8:54, Takeshi Yamamuro 님이 작성: > Cool! Thanks, Dongjoon! > > Bests, > Takeshi > > On Thu, Mar 12, 2020 at 8:27 AM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Autolinking from PR to JIRA started. >> >> *Inside PR* >> https://github.com/apache/spark/pull/27881 >> >>

Re: [VOTE] Amend Spark's Semantic Versioning Policy

2020-03-09 Thread Hyukjin Kwon
The proposal itself seems good as the factors to consider, Thanks Michael. Several concerns mentioned look good points, in particular: > ... assuming that this is for public stable APIs, not APIs that are marked as unstable, evolving, etc. ... I would like to confirm this. We already have API

Re: 'spark-master-docs' job missing in Jenkins

2020-02-26 Thread Hyukjin Kwon
> > > On Tue, Feb 25, 2020 at 9:10 PM Hyukjin Kwon wrote: > >> Hm, we should still run this I believe. PR builders do not run doc build >> (more specifically `cd docs && jekyll build`) >> >> Fortunately, Javadoc, Scaladoc, SparkR documentation and PySpar

Re: 'spark-master-docs' job missing in Jenkins

2020-02-25 Thread Hyukjin Kwon
dbricks-spark-tp25325p26222.html > > shane > > On Tue, Feb 25, 2020 at 6:18 PM Hyukjin Kwon wrote: > >> Hi all, >> >> I just noticed we apparently don't build the documentation in the Jenkins >> anymore. >> I remember we have the job: >> https://ampl

'spark-master-docs' job missing in Jenkins

2020-02-25 Thread Hyukjin Kwon
Hi all, I just noticed we apparently don't build the documentation in the Jenkins anymore. I remember we have the job: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-docs Does anybody know what happened to this job? Thanks.

Re: [DOCS] Spark SQL Upgrading Guide

2020-02-16 Thread Hyukjin Kwon
Thanks for checking it, Jacek. 2020년 2월 16일 (일) 오후 7:23, Jacek Laskowski 님이 작성: > Hi, > > Never mind. Found this [1]: > > > This config is deprecated and it will be removed in 3.0.0. > > And so it has :) Thanks and sorry for the trouble. > > [1] >

Re: Request to document the direct relationship between other configurations

2020-02-14 Thread Hyukjin Kwon
;. > > Thanks again to initiate the discussion thread - this thread led the > following thread for the final goal. > > On Fri, Feb 14, 2020 at 1:43 PM Hyukjin Kwon wrote: > >> It's okay to just follow one prevailing style. The main point I would >> like to say

Re: Request to document the direct relationship between other configurations

2020-02-13 Thread Hyukjin Kwon
on education. > > The codebase is the reference of implicit rules/policies which would apply > to all contributors including newcomers. Let's just put our best efforts on > being consistent on codebase. (We should have consensus to do this anyway.) > > > On Thu, Feb 13, 2020 at 12:44

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Hyukjin Kwon
dd to the contribution > doc, as that is the thing we agree about.) > > Without the details it's going to be a some sort of "preference" which is > natural to have disagreement, hence it doesn't make sense someone is forced > to do something if something turns o

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Hyukjin Kwon
if this still confuses or disagree. 2020년 2월 13일 (목) 오전 9:47, Hyukjin Kwon 님이 작성: > Yes, that's probably our final goal to revisit the configurations to make > it structured and deduplicated documentation cleanly. +1. > > One point I would like to add is though to add such information to the >

Re: [DISCUSS] naming policy of Spark configs

2020-02-12 Thread Hyukjin Kwon
+1. 2020년 2월 13일 (목) 오전 9:30, Gengliang Wang 님이 작성: > +1, this is really helpful. We should make the SQL configurations > consistent and more readable. > > On Wed, Feb 12, 2020 at 3:33 PM Rubén Berenguel > wrote: > >> I love it, it will make configs easier to read and write. Thanks Wenchen. >>

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Hyukjin Kwon
fetch them programmatically, one still has to know what >> specific config one islooking for. >> >> Cheers >> Jules >> >> Sent from my iPhone >> Pardon the dumb thumb typos :) >> >> On Feb 12, 2020, at 5:19 AM, Hyukjin Kwon wrote: >> >>

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Hyukjin Kwon
; names, e.g. spark.shuffle.service.enabled > and spark.dynamicAllocation.enabled. > > On Wed, Feb 12, 2020 at 7:54 PM Hyukjin Kwon wrote: > >> Also, I would like to hear other people' thoughts on here. Could I ask >> what you guys think about this in general? >> >> 2020년 2월 12일 (수) 오후

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Hyukjin Kwon
Also, I would like to hear other people' thoughts on here. Could I ask what you guys think about this in general? 2020년 2월 12일 (수) 오후 12:02, Hyukjin Kwon 님이 작성: > To do that, we should explicitly document such structured configuration > and implicit effect, which is currently missing. >

Re: Request to document the direct relationship between other configurations

2020-02-11 Thread Hyukjin Kwon
topic, `spark.dynamicAllocation` is having another issue on > practice - whether to duplicate description between configuration code and > doc. I have been asked to add description on configuration code > regardlessly, and existing codebase doesn't. This configuration is > widely-used one. &g

Re: Request to document the direct relationship between other configurations

2020-02-11 Thread Hyukjin Kwon
. I > agree this is the good step of "be kind" but less pragmatic. > > I'd be happy to follow the consensus we would make in this thread. > Appreciate more voices. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > On Wed, Feb 12, 2020 at 10:36 AM Hyukjin Kwon w

Re: Request to document the direct relationship between other configurations

2020-02-11 Thread Hyukjin Kwon
> I don't plan to document this officially yet Just to prevent confusion, I meant I don't yet plan to document the fact that we should write the relationships in configurations as a code/review guideline in https://spark.apache.org/contributing.html 2020년 2월 12일 (수) 오전 9:57, Hyukjin Kwon 님이

Request to document the direct relationship between other configurations

2020-02-11 Thread Hyukjin Kwon
Hi all, I happened to review some PRs and I noticed that some configurations don't have some information necessary. To be explicit, I would like to make sure we document the direct relationship between other configurations in the documentation. For example,

Re: Apache Spark Docker image repository

2020-02-10 Thread Hyukjin Kwon
Quick question. Roughly how much overhead is it required to maintain minimal version? If that looks not too much, I think it's fine to give a shot. 2020년 2월 8일 (토) 오전 6:51, Dongjoon Hyun 님이 작성: > Thank you, Sean, Jiaxin, Shane, and Tom, for feedbacks. > > 1. For legal questions, please see the

Re: [ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-10 Thread Hyukjin Kwon
Thanks Dongjoon! 2020년 2월 9일 (일) 오전 10:49, Takeshi Yamamuro 님이 작성: > Happy to hear the release news! > > Bests, > Takeshi > > On Sun, Feb 9, 2020 at 10:28 AM Dongjoon Hyun > wrote: > >> There was a typo in one URL. The correct release note URL is here. >> >>

Re: new branch-3.0 jenkins job configs are ready to be deployed...

2020-02-10 Thread Hyukjin Kwon
FWIW, I believe all tests are fixed in PySpark and SparkR with JDK 11. Let me know if you guys meet some test failures. 2020년 2월 1일 (토) 오전 10:45, Dongjoon Hyun 님이 작성: > Oops. I found this flaky test fails event in `Hadoop 2.7 with Hive 1.2`. > > >

Re: More publicly documenting the options under spark.sql.*

2020-02-09 Thread Hyukjin Kwon
The PR was merged. Now all external SQL configurations will be automatically documented. 2020년 2월 5일 (수) 오전 9:46, Hyukjin Kwon 님이 작성: > FYI, PR was open at https://github.com/apache/spark/pull/27459. Thanks > Nicholas. > Hope guys find some time to take a look. > > 2020년 1월

Re: Spark 3.0 branch cut and code freeze on Jan 31?

2020-02-05 Thread Hyukjin Kwon
Awesome Shane. 2020년 2월 5일 (수) 오전 7:29, Xiao Li 님이 작성: > Thank you, Shane! > > Xiao > > On Tue, Feb 4, 2020 at 2:16 PM Dongjoon Hyun > wrote: > >> Thank you, Shane! :D >> >> Bests, >> Dongjoon >> >> On Tue, Feb 4, 2020 at 13:28 shane knapp ☠ wrote: >> >>> all the 3.0 builds have been created

Re: More publicly documenting the options under spark.sql.*

2020-02-04 Thread Hyukjin Kwon
FYI, PR was open at https://github.com/apache/spark/pull/27459. Thanks Nicholas. Hope guys find some time to take a look. 2020년 1월 28일 (화) 오전 8:15, Nicholas Chammas 님이 작성: > I am! Thanks for the reference. > > On Thu, Jan 16, 2020 at 9:53 PM Hyukjin Kwon wrote: > >> Nicholas,

Re: [VOTE] Release Apache Spark 2.4.5 (RC2)

2020-02-03 Thread Hyukjin Kwon
+1 from me too. 2020년 2월 4일 (화) 오후 12:26, Wenchen Fan 님이 작성: > AFAIK there is no ongoing critical bug fixes, +1 > > On Mon, Feb 3, 2020 at 11:46 PM Dongjoon Hyun > wrote: > >> Yes, it does officially since 2.4.0. >> >> 2.4.5 is a maintenance release of 2.4.x line and the community didn't >>

Re: [FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-01 Thread Hyukjin Kwon
Thanks Dongjoon. On Sun, 2 Feb 2020, 09:08 Dongjoon Hyun, wrote: > Hi, All. > > From Today, we have `branch-3.0` as a tool of `Feature Freeze`. > > https://github.com/apache/spark/tree/branch-3.0 > > All open JIRA issues whose type is `Improvement` or `New Feature` and had > `3.0.0` as a

Re: Closing stale PRs with a GitHub Action

2020-01-27 Thread Hyukjin Kwon
5, 2019 at 11:16 AM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Just an FYI to everyone, we’ve merged in an Action to close stale PRs: >> https://github.com/apache/spark/pull/26877 >> >> 2019년 12월 8일 (일) 오전 9:49, Hyukjin Kwon 님이 작성: >> >>> It do

Block a user from spark-website who repeatedly open the invalid same PR

2020-01-25 Thread Hyukjin Kwon
Hi all, I am thinking about opening an infra ticket to block @DataWanderer user from spark-website repository, who repeatedly opens the invalid PRs. The PR is about fix a documentation in the released version 2.4.4, and it should be fixed in spark repository. It

Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-22 Thread Hyukjin Kwon
you mind if I ask answers for these questions? 2020년 1월 17일 (금) 오전 10:25, Hyukjin Kwon 님이 작성: > Thanks for giving me some context and clarification, Ryan. > > I think I was rather trying to propose to revert because I don't see the > explicit plan here and it was just left half-don

Re: Adding Maven Central mirror from Google to the build?

2020-01-21 Thread Hyukjin Kwon
+1. If it becomes a problem for any reason, we can consider another option ( https://github.com/apache/spark/pull/27307#issuecomment-576951473) later 2020년 1월 22일 (수) 오전 8:23, Dongjoon Hyun 님이 작성: > +1, I'm supporting the following proposal. > > > this mirror as the primary repo in the build,

Re: More publicly documenting the options under spark.sql.*

2020-01-16 Thread Hyukjin Kwon
quot;spark.sql("set -v")" returns a Dataset that has all non-internal SQL > configurations. Should be pretty easy to automatically generate a SQL > configuration page. > > Best Regards, > Ryan > > > On Wed, Jan 15, 2020 at 5:47 AM Hyukjin Kwon wrote:

Re: More publicly documenting the options under spark.sql.*

2020-01-16 Thread Hyukjin Kwon
> configuration page. >> >> Best Regards, >> Ryan >> >> >> On Wed, Jan 15, 2020 at 5:47 AM Hyukjin Kwon wrote: >> >>> I think automatically creating a configuration page isn't a bad idea >>> because I think we deprecate and remove co

Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-16 Thread Hyukjin Kwon
that's not to say that we need to revert it. > > None of this has been confusing or misleading for our users, who caught on > quickly. > > On Thu, Jan 16, 2020 at 5:14 AM Hyukjin Kwon wrote: > >> I think the problem here is if there is an explicit plan or not. >> The PR

Re: [DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-16 Thread Hyukjin Kwon
ssible to have well-defined semantic, and also different > sources may have different semantic for the same Transform name. > > I'd suggest we forbid arbitrary string as Transform (the ApplyTransform > class). We can even follow DS V1 Filter and expose the classes directly. > > On

[DISCUSS] Revert and revisit the public custom expression API for partition (a.k.a. Transform API)

2020-01-16 Thread Hyukjin Kwon
Hi all, I would like to suggest to take one step back at https://github.com/apache/spark/pull/24117 and rethink about it. I am writing this email as I raised the issue few times but could not have enough responses promptly, and the code freeze is being close. In particular, please refer the

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-15 Thread Hyukjin Kwon
Shall we include them? > > > On Wed, Jan 15, 2020 at 9:51 PM Hyukjin Kwon wrote: > >> +1 >> >> On Wed, 15 Jan 2020, 08:24 Takeshi Yamamuro, >> wrote: >> >>> +1; >>> >>> I checked the links and materials, then I run the tests

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-15 Thread Hyukjin Kwon
+1 On Wed, 15 Jan 2020, 08:24 Takeshi Yamamuro, wrote: > +1; > > I checked the links and materials, then I run the tests with > `-Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pkubernetes > -Psparkr` > on macOS (Java 8). > All the things look fine and I didn't see the error on my env >

Re: More publicly documenting the options under spark.sql.*

2020-01-15 Thread Hyukjin Kwon
built-in functions and I'm pretty sure we can do the similar thing for configurations as well. We could perhaps mimic what hadoop does https://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/core-default.xml On Wed, 15 Jan 2020, 22:46 Hyukjin Kwon, wrote: > I think automatica

Re: Revisiting Python / pandas UDF (new proposal)

2020-01-10 Thread Hyukjin Kwon
t together with the proposal. 2020년 1월 6일 (월) 오후 10:52, Hyukjin Kwon 님이 작성: > I happened to propose a somewhat big refactoring PR as a preparation for > this. > Basically, grouping all related codes into one sub-package since currently > all pandas and PyArrow related codes are here and there. >

Re: Revisiting Python / pandas UDF (new proposal)

2020-01-06 Thread Hyukjin Kwon
for cordiality. I have > commented on more details in the doc. > > Li > > On Thu, Jan 2, 2020 at 9:42 AM Li Jin wrote: > >> I am going to review this carefully today. Thanks for the work! >> >> Li >> >> On Wed, Jan 1, 2020 at 10:34 PM Hyukjin Kwon

Re: Release Apache Spark 2.4.5

2020-01-05 Thread Hyukjin Kwon
Yeah, I think it's nice to have another maintenance release given Spark 3.0 timeline. 2020년 1월 6일 (월) 오전 7:58, Dongjoon Hyun 님이 작성: > Hi, All. > > Happy New Year (2020)! > > Although we slightly missed the timeline for 3.0 branch cut last month, > it seems that we keep 2.4.x timeline on track. >

Re: Revisiting Python / pandas UDF (new proposal)

2020-01-01 Thread Hyukjin Kwon
Thanks for comments Maciej - I am addressing them. adding Li Jin too. I plan to proceed this late this week or early next week to make it on time before code freeze. I am going to pretty actively respond so please give feedback if there's any :-). 2019년 12월 30일 (월) 오후 6:45, Hyukjin Kwon 님이 작성

Revisiting Python / pandas UDF (new proposal)

2019-12-30 Thread Hyukjin Kwon
Hi all, I happen to come up with another idea about pandas redesign. Thanks Reynold, Bryan, Xiangrui, Takuya and Tim for offline discussions and helping me to write this proposal. Please take a look and let me know what you guys think. -

Re: Fail to use SparkR of 3.0 preview 2

2019-12-26 Thread Hyukjin Kwon
I was randomly googling out of curiosity, and seems indeed that's the problem ( https://r.789695.n4.nabble.com/Error-in-rbind-info-getNamespaceInfo-env-quot-S3methods-quot-td4755490.html ). Yes, seems we should make sure we build SparkR in an old version. Since that support for R prior to version

Re: Spark 3.0 branch cut and code freeze on Jan 31?

2019-12-23 Thread Hyukjin Kwon
Sounds fine. I am trying to get pandas UDF redesign done (SPARK-28264 ) on time. Hope I can make it. 2019년 12월 24일 (화) 오후 4:17, Wenchen Fan 님이 작성: > Sounds good! > > On Tue, Dec 24, 2019 at 7:48 AM Reynold Xin wrote: > >> We've pushed out 3.0

Re: I would like to add JDBCDialect to support Vertica database

2019-12-11 Thread Hyukjin Kwon
I am not so sure about it too. I think it is enough to expose JDBCDialect as an API (which seems already is). It brings some overhead to dev (e.g., to test and review PRs related to another third party). Such third party integration might better exist as a third party library without a strong

<    1   2   3   4   5   6   7   >