Re: Closing stale PRs with a GitHub Action

2019-12-08 Thread Hyukjin Kwon
It doesn't need to exactly follow the conditions I used before as long as Github Actions can provide other good options or conditions. I just wanted to make sure the condition is reasonable. 2019년 12월 7일 (토) 오전 11:23, Hyukjin Kwon 님이 작성: > lol how did you know I'm going to read this email S

Re: Closing stale PRs with a GitHub Action

2019-12-06 Thread Hyukjin Kwon
lol how did you know I'm going to read this email Sean? When I manually identified the stale PRs, I used this conditions below: 1. Author's inactivity over a year. If the PRs were simply waiting for a review, I excluded it from stale PR list. 2. Ping one time and see if there are any updates

Revisiting Python / pandas UDF (continues)

2019-12-04 Thread Hyukjin Kwon
Hi all, I would like to finish redesigning Pandas UDF ones in Spark 3.0. If you guys don't have a minor concern in general about (see https://issues.apache.org/jira/browse/SPARK-28264), I would like to start soon after addressing existing comments. Please take a look and comment on the design

Re: Slower than usual on PRs

2019-12-03 Thread Hyukjin Kwon
Yeah, please take care of your heath first! 2019년 12월 3일 (화) 오후 1:32, Wenchen Fan 님이 작성: > Sorry to hear that. Hope you get better soon! > > On Tue, Dec 3, 2019 at 1:28 AM Holden Karau wrote: > >> Hi Spark dev folks, >> >> Just an FYI I'm out dealing with recovering from a motorcycle accident

Re: Auto-linking Jira tickets to their PRs

2019-12-03 Thread Hyukjin Kwon
I think it's broken .. cc Josh Rosen 2019년 12월 4일 (수) 오전 10:25, Nicholas Chammas 님이 작성: > We used to have a bot or something that automatically linked Jira tickets > to PRs that mentioned them in their title. I don't see that happening > anymore.

Re: Adding JIRA ID as the prefix for the test case name

2019-11-21 Thread Hyukjin Kwon
I opened a PR - https://github.com/apache/spark-website/pull/232 2019년 11월 19일 (화) 오전 9:22, Hyukjin Kwon 님이 작성: > Let me document as below in few days: > > 1. For Python and Java, write a single comment that starts with JIRA ID > and short description, e.g. (SPARK-X: test bl

Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-11-19 Thread Hyukjin Kwon
We don't have an official Spark with Hadoop 3 yet (except the preview) if I am not mistaken. I think it's more natural to one minor release term before switching this ... How about we target Hadoop 3 as default in Spark 3.1? 2019년 11월 20일 (수) 오전 7:40, Cheng Lian 님이 작성: > Hey Steve, > > In terms

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Hyukjin Kwon
e to declare a binary release a default. > The published POM will be agnostic to Hadoop / Hive; well, it will > link against a particular version but can be overridden. That's what > you're getting at? > > > On Tue, Nov 19, 2019 at 7:11 PM Hyukjin Kwon wrote: > > > > So, ar

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-19 Thread Hyukjin Kwon
So, are we able to conclude our plans as below? 1. In Spark 3, we release as below: - Hadoop 3.2 + Hive 2.3 + JDK8 build that also works JDK 11 - Hadoop 2.7 + Hive 2.3 + JDK8 build that also works JDK 11 - Hadoop 2.7 + Hive 1.2.1 (fork) + JDK8 (default) 2. In Spark 3.1, we target: -

Re: Removing the usage of forked `hive` in Apache Spark 3.0 (SPARK-20202)

2019-11-18 Thread Hyukjin Kwon
I struggled hard to deal with this issue multiple times over a year and thankfully we finally decided to use the official version of Hive 2.3.x too (thank you, Yuming, Alan, and guys) I think this is already a huge progress that we started to use the official version of Hive. I think we should at

Re: Adding JIRA ID as the prefix for the test case name

2019-11-18 Thread Hyukjin Kwon
Let me document as below in few days: 1. For Python and Java, write a single comment that starts with JIRA ID and short description, e.g. (SPARK-X: test blah blah) 2. For R, use JIRA ID as a prefix for its test name. assuming everybody is happy. 2019년 11월 18일 (월) 오전 11:36, Hyukjin Kwon 님이

Re: Adding JIRA ID as the prefix for the test case name

2019-11-17 Thread Hyukjin Kwon
upported custom reporters, to let you experiment with it. > > Maybe this is an opportunity to change things. > > On Sun, Nov 17, 2019 at 1:42 AM Hyukjin Kwon wrote: > >> DisplayName looks good in general but actually here I would like first to >> find a existing pattern to document

Re: Adding JIRA ID as the prefix for the test case name

2019-11-16 Thread Hyukjin Kwon
non Scala tests? Other languages >> (Java, Python, R) don't support using string as a test name. >> >> Best Regards, >> Ryan >> >> >> On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon wrote: >> >>> I opened a PR - https://github.com/apache/s

Re: Adding JIRA ID as the prefix for the test case name

2019-11-14 Thread Hyukjin Kwon
support using string as a test name. > > Best Regards, > Ryan > > > On Thu, Nov 14, 2019 at 4:04 AM Hyukjin Kwon wrote: > >> I opened a PR - https://github.com/apache/spark-website/pull/231 >> >> 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성: >> >>>

Re: Adding JIRA ID as the prefix for the test case name

2019-11-14 Thread Hyukjin Kwon
I opened a PR - https://github.com/apache/spark-website/pull/231 2019년 11월 13일 (수) 오전 10:43, Hyukjin Kwon 님이 작성: > > In general a test should be self descriptive and I don't think we should > be adding JIRA ticket references wholesale. Any action that the reader has > to take to un

Re: Adding JIRA ID as the prefix for the test case name

2019-11-12 Thread Hyukjin Kwon
gt; case you need to understand why a test asserts something, you can go > back and find what added it in the git history without much trouble. > > On Mon, Nov 11, 2019 at 10:46 AM Hyukjin Kwon wrote: > > > > Hi all, > > > > Maybe it's not a big deal but it bro

Re: Adding JIRA ID as the prefix for the test case name

2019-11-11 Thread Hyukjin Kwon
23 PM Takeshi Yamamuro > wrote: > >> +1 for having that consistent rule in test names. >> This is a trivial problem though, I think documenting this rule in the >> contribution guide >> might be able to make reviewer overhead a little smaller. >> >> Best

Adding JIRA ID as the prefix for the test case name

2019-11-11 Thread Hyukjin Kwon
Hi all, Maybe it's not a big deal but it brought some confusions time to time into Spark dev and community. I think it's time to discuss about when/which format to add a JIRA ID as a prefix for the test case name in Scala test cases. Currently we have many test case names with prefixes as below:

Re: dev/merge_spark_pr.py broken on python 2

2019-11-10 Thread Hyukjin Kwon
Yeah.. let's stick to Python 3 in general .. I plan to drop Python 2 completely right after Spark 3.0 release. The exception you face .. seems like run_cmd now produces unicode instead of bytes in Python 2 with the merge script. Later, seems this unicode is attempted to be casted to bytes

Re: [DISCUSS] Remove sorting of fields in PySpark SQL Row construction

2019-11-07 Thread Hyukjin Kwon
+1 2019년 11월 6일 (수) 오후 11:38, Wenchen Fan 님이 작성: > Sounds reasonable to me. We should make the behavior consistent within > Spark. > > On Tue, Nov 5, 2019 at 6:29 AM Bryan Cutler wrote: > >> Currently, when a PySpark Row is created with keyword arguments, the >> fields are sorted

Re: [VOTE] SPARK 3.0.0-preview (RC2)

2019-11-01 Thread Hyukjin Kwon
+1 On Fri, 1 Nov 2019, 15:36 Wenchen Fan, wrote: > The PR builder uses Hadoop 2.7 profile, which makes me think that 2.7 is > more stable and we should make releases using 2.7 by default. > > +1 > > On Fri, Nov 1, 2019 at 7:16 AM Xiao Li wrote: > >> Spark 3.0 will still use the Hadoop 2.7

Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-28 Thread Hyukjin Kwon
+1 from me as well. 2019년 10월 29일 (화) 오전 5:34, Xiangrui Meng 님이 작성: > +1. And we should start testing 3.7 and maybe 3.8 in Jenkins. > > On Thu, Oct 24, 2019 at 9:34 AM Dongjoon Hyun > wrote: > >> Thank you for starting the thread. >> >> In addition to that, we currently are testing Python 3.6

Re: [VOTE][SPARK-28885] Follow ANSI store assignment rules in table insertion by default

2019-10-10 Thread Hyukjin Kwon
+1 (binding) 2019년 10월 10일 (목) 오후 5:11, Takeshi Yamamuro 님이 작성: > Thanks for the great work, Gengliang! > > +1 for that. > As I said before, the behaviour is pretty common in DBMSs, so the change > helps for DMBS users. > > Bests, > Takeshi > > > On Mon, Oct 7, 2019 at 5:24 PM Gengliang Wang < >

Re: Auto-closing PRs when there are no feedback or response from its author

2019-10-09 Thread Hyukjin Kwon
g. I'd prefer any process or tool that > implements the above. > > > On Tue, Oct 8, 2019 at 8:19 PM Hyukjin Kwon wrote: > > > > Hi all, > > > > I think we talked about this before. Roughly speaking, there are two > cases of PRs: > > 1. PRs waiting for

Re: Auto-closing PRs when there are no feedback or response from its author

2019-10-09 Thread Hyukjin Kwon
> If there's little overhead to adoption, cool, though I doubt people >> will consistently use a new tag. I'd prefer any process or tool that >> implements the above. >> >> >> On Tue, Oct 8, 2019 at 8:19 PM Hyukjin Kwon wrote: >> > >> > Hi all, >

Auto-closing PRs when there are no feedback or response from its author

2019-10-08 Thread Hyukjin Kwon
Hi all, I think we talked about this before. Roughly speaking, there are two cases of PRs: 1. PRs waiting for review and 2. PRs waiting for author's reaction We might not have to take an action but wait for reviewing for the first case. However, we can ping and/or take an action for the second

Re: Resolving all JIRAs affecting EOL releases

2019-10-07 Thread Hyukjin Kwon
I am going to resolve those JIRAs now. 2019년 9월 9일 (월) 오전 9:46, Hyukjin Kwon 님이 작성: > Yup, no worries. I roughly set the one week delay considering the official > release date :D > > On Mon, 9 Sep 2019, 09:45 Dongjoon Hyun, wrote: > >> Thank you, Hyukjin. >> >>

Re: Spark 3.0 preview release feature list and major changes

2019-10-07 Thread Hyukjin Kwon
Cogroup Pandas UDF missing: SPARK-27463 Support Dataframe Cogroup via Pandas UDFs Vectorized R execution: SPARK-26759 Arrow optimization in SparkR's interoperability 2019년 10월 8일 (화) 오전

Re: [DISCUSS] Spark 2.5 release

2019-09-22 Thread Hyukjin Kwon
+1 for Matei's as well. On Sun, 22 Sep 2019, 14:59 Marco Gaido, wrote: > I agree with Matei too. > > Thanks, > Marco > > Il giorno dom 22 set 2019 alle ore 03:44 Dongjoon Hyun < > dongjoon.h...@gmail.com> ha scritto: > >> +1 for Matei's suggestion! >> >> Bests, >> Dongjoon. >> >> On Sat, Sep

Re: Resolving all JIRAs affecting EOL releases

2019-09-08 Thread Hyukjin Kwon
query, but yeah I see some value in trying to limit >> the scope this way. >> >> On Sat, Sep 7, 2019 at 10:15 PM Hyukjin Kwon wrote: >> > >> > HI all, >> > >> > We have resolved JIRAs that targets EOL releases (up to Spark 2.2.x) in >> o

Re: Resolving all JIRAs affecting EOL releases

2019-09-07 Thread Hyukjin Kwon
epn prs (e.g., SPARK-25211 > <https://issues.apache.org/jira/browse/SPARK-25211>). > We can also close these PRs according to the bulk close? > (But, we might need to check the corresponding PRs manually?) > > Bests, > Takeshi > > > On Sun, Sep 8, 2019 at 12:15 PM Hy

Resolving all JIRAs affecting EOL releases

2019-09-07 Thread Hyukjin Kwon
HI all, We have resolved JIRAs that targets EOL releases (up to Spark 2.2.x) in order to make it the manageable size before. Since Spark 2.3.4 will be EOL release, I plan to do this again roughly in a week. The JIRAs that has not been updated for the last year, and having affect version of EOL

Re: DataSourceV2: pushFilters() is not invoked for each read call - spark 2.3.2

2019-09-06 Thread Hyukjin Kwon
I believe this issue was fixed in Spark 2.4. Spark DataSource V2 has been still being radically developed - It is not complete yet until now. So, I think the feasible option to get through at the current moment is: 1. upgrade to higher Spark versions 2. disable filter push down at your

Re: [ANNOUNCE] Announcing Apache Spark 2.4.4

2019-09-01 Thread Hyukjin Kwon
YaY! 2019년 9월 2일 (월) 오후 1:27, Wenchen Fan 님이 작성: > Great! Thanks! > > On Mon, Sep 2, 2019 at 5:55 AM Dongjoon Hyun > wrote: > >> We are happy to announce the availability of Spark 2.4.4! >> >> Spark 2.4.4 is a maintenance release containing stability fixes. This >> release is based on the

Re: [VOTE] Release Apache Spark 2.4.4 (RC3)

2019-08-28 Thread Hyukjin Kwon
+1 (from the last blocker PR) 2019년 8월 29일 (목) 오전 8:20, Takeshi Yamamuro 님이 작성: > I checked the tests passed again on the same env. > It looks ok. > > > On Thu, Aug 29, 2019 at 6:15 AM Marcelo Vanzin > wrote: > >> +1 >> >> On Tue, Aug 27, 2019 at 4:06 PM Dongjoon Hyun >> wrote: >> > >> >

Re: JDK11 Support in Apache Spark

2019-08-27 Thread Hyukjin Kwon
YaY! 2019년 8월 27일 (화) 오후 3:36, Dongjoon Hyun 님이 작성: > Hi, All. > > Thank you for your attention! > > UPDATE: We succeeded to build with JDK8 and test with JDK11. > > - https://github.com/apache/spark/pull/25587 > - >

Re: [VOTE] Release Apache Spark 2.4.4 (RC2)

2019-08-26 Thread Hyukjin Kwon
-1 Seems there's one critical correctness issue specifically in branch-2.4 ... Please take a look for https://github.com/apache/spark/pull/25593 2019년 8월 27일 (화) 오후 2:38, Takeshi Yamamuro 님이 작성: > Hi, Dongjoon > > I checked that all the test passed on my Mac/x86_64 env with: > -Pyarn

Re: Release Spark 2.3.4

2019-08-17 Thread Hyukjin Kwon
+1 too 2019년 8월 17일 (토) 오후 3:06, Dilip Biswal 님이 작성: > +1 > > Regards, > Dilip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > - Original message - > From: John Zhuge > To: Xiao Li > Cc: Takeshi Yamamuro , Spark dev list < > dev@spark.apache.org>, Kazuaki Ishizaki > Subject:

Re: [DISCUSS] Migrate development scripts under dev/ from Python2 to Python 3

2019-08-15 Thread Hyukjin Kwon
- Python 2.x will be EOL end this year > > I have a strong preference to migrate everything to Python 3. > > Cheers, Fokko > > > Op wo 7 aug. 2019 om 12:14 schreef Weichen Xu : > >> All right we could support both Python 2 and Python 3 for spark 3.0. >> >> On Wed

Re: [DISCUSS] Migrate development scripts under dev/ from Python2 to Python 3

2019-08-15 Thread Hyukjin Kwon
I mean python 2 _will be_ deprecated in Spark 3. On Thu, 15 Aug 2019, 18:37 Hyukjin Kwon, wrote: > Yeah, we will probably drop Python 2 entirely after 3.0.0. Python 2 is > already deprecated. > > On Thu, 15 Aug 2019, 18:25 Driesprong, Fokko, > wrote: > >> Sorry for t

Re: Release Apache Spark 2.4.4

2019-08-14 Thread Hyukjin Kwon
Adding Shixiong WDYT? 2019년 8월 14일 (수) 오후 2:30, Terry Kim 님이 작성: > Can the following be included? > > [SPARK-27234][SS][PYTHON] Use InheritableThreadLocal for current epoch in > EpochTracker (to support Python UDFs) > > > Thanks, > Terry > > On Tue,

Re: Release Apache Spark 2.4.4

2019-08-13 Thread Hyukjin Kwon
+1 2019년 8월 14일 (수) 오전 9:13, Takeshi Yamamuro 님이 작성: > Hi, > > Thanks for your notification, Dongjoon! > I put some links for the other committers/PMCs to access the info easily: > > A commit list in github from the last release: >

Re: Recognizing non-code contributions

2019-08-07 Thread Hyukjin Kwon
> Currently, I have heard some ideas or attitudes that I consider to be overly motivated by fear of unlikely occurrences. > And I've heard some statements disregard widely accepted principles of inclusiveness at the Apache Software Foundation. > But I suspect that there's more to the attitude of

Re: [DISCUSS] Migrate development scripts under dev/ from Python2 to Python 3

2019-08-07 Thread Hyukjin Kwon
We didn't drop Python 2 yet although it's deprecated. So I think It should support both Python 2 and Python 3 at the current status. 2019년 8월 7일 (수) 오후 6:54, Weichen Xu 님이 작성: > Hi all, > > I would like to discuss the compatibility for dev scripts. Because we > already decided to deprecate

Re: Recognizing non-code contributions

2019-08-06 Thread Hyukjin Kwon
> I wonder which project nominees non-coding only committers but I at least know multiple projects. They all have that serious problem then. I mean It know multiple projects don't do that and according to what you said, they all have that serious problem. 2019년 8월 7일 (수) 오전 1:05, Hyukjin Kwon

Re: Recognizing non-code contributions

2019-08-06 Thread Hyukjin Kwon
Well, actually I am rather less conservative on adding committers. There are multiple people who are active in both non-coding and coding activities. I as an example am one of Korean meetup admin and my main focus was to management JIRA. In addition, review the PRs that are not being reviewed. As

Re: Recognizing non-code contributions

2019-08-06 Thread Hyukjin Kwon
ility to make such mistakes. 2019년 8월 6일 (화) 오후 7:26, Myrle Krantz 님이 작성: > Hey Hyukjin, > > Apologies for sending this to you twice. : o) > > On Tue, Aug 6, 2019 at 9:55 AM Hyukjin Kwon wrote: > >> Myrle, >> >> > We need to balance two sets of risks here. Bu

Re: Recognizing non-code contributions

2019-08-06 Thread Hyukjin Kwon
the in-between status officially (e.g. Apache email or something), it should be asked and discussed in ASF, not in a single project here. 2019년 8월 6일 (화) 오후 4:55, Hyukjin Kwon 님이 작성: > Myrle, > > > We need to balance two sets of risks here. But in the case of access to > our sof

Re: Recognizing non-code contributions

2019-08-06 Thread Hyukjin Kwon
Myrle, > We need to balance two sets of risks here. But in the case of access to our software artifacts, the risk is very small, and already has *multiple* mitigating factors, from the fact that all changes are tracked to an individual, to the fact that there are notifications sent when changes

Re: [DISCUSS] New sections in Github Pull Request description template

2019-07-31 Thread Hyukjin Kwon
I opened a PR https://github.com/apache/spark/pull/25310. Please take a look 2019년 7월 29일 (월) 오후 4:35, Hyukjin Kwon 님이 작성: > Thanks, guys. Let me probably mimic the template and open a PR soon - > currently I am stuck in some works. I will take a look in few days later. > > 2019년 7월

Re: [Discuss] Follow ANSI SQL on table insertion

2019-07-30 Thread Hyukjin Kwon
>From my look, +1 on the proposal, considering ASCI and other DBMSes in general. 2019년 7월 30일 (화) 오후 3:21, Wenchen Fan 님이 작성: > We can add a config for a certain behavior if it makes sense, but the most > important thing we want to reach an agreement here is: what should be the > default

Re: [DISCUSS] New sections in Github Pull Request description template

2019-07-29 Thread Hyukjin Kwon
kubernetes/kubernetes/master/.github/PULL_REQUEST_TEMPLATE.md >> >> >> >> On Tue, Jul 23, 2019 at 8:27 PM, Hyukjin Kwon >> wrote: >> >>> (Plus, it helps to track history too. Spark's commit logs are growing >>> and now it's pretty difficult to track the histo

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-07-25 Thread Hyukjin Kwon
BY created DESC, priority DESC, updated DESC 2019년 7월 19일 (금) 오후 4:54, Hyukjin Kwon 님이 작성: > That's a great explanation. Thanks I didn't know that. > > Josh, do you know who I should ping on this? > > On Fri, 19 Jul 2019, 16:52 Dongjoon Hyun, wrote: > >> Hi, Hyukjin.

Re: [DISCUSS] New sections in Github Pull Request description template

2019-07-23 Thread Hyukjin Kwon
(Plus, it helps to track history too. Spark's commit logs are growing and now it's pretty difficult to track the history and see what change introduced a specific behaviour) 2019년 7월 24일 (수) 오후 12:20, Hyukjin Kwon 님이 작성: > Hi all, > > I would like to discuss about some new secti

[DISCUSS] New sections in Github Pull Request description template

2019-07-23 Thread Hyukjin Kwon
Hi all, I would like to discuss about some new sections under "## What changes were proposed in this pull request?": ### Do the changes affect _any_ user/dev-facing input or output? (Please answer yes or no. If yes, answer the questions below) ### What was the previous behavior? (Please

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-07-19 Thread Hyukjin Kwon
PI key used by the bot is rejected > by Apache JIRA and forwarded to CAPCHAR. > > Bests, > Dongjoon. > > On Thu, Jul 18, 2019 at 8:24 PM Hyukjin Kwon wrote: > >> Hi all, >> >> Seems this issue is re-happening again. Seems the PR link is properly >> creat

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-07-18 Thread Hyukjin Kwon
-28440 https://issues.apache.org/jira/browse/SPARK-28436 https://issues.apache.org/jira/browse/SPARK-28434 https://issues.apache.org/jira/browse/SPARK-28433 https://issues.apache.org/jira/browse/SPARK-28431 Josh and Dongjoon, do you guys maybe have any idea? 2019년 4월 25일 (목) 오후 3:09, Hyukjin Kwon 님이 작성

Re: Contribution help needed for sub-tasks of an umbrella JIRA - port *.sql tests to improve coverage of Python, Pandas, Scala UDF cases

2019-07-09 Thread Hyukjin Kwon
, Jul 9, 2019 at 6:17 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I am currently targeting to improve Python, Pandas UDFs Scala UDF test >> cases by integrating our existing *.sql files at >> https://issues.apache.org/jira/browse/SPARK-27921 >> >> I w

Contribution help needed for sub-tasks of an umbrella JIRA - port *.sql tests to improve coverage of Python, Pandas, Scala UDF cases

2019-07-08 Thread Hyukjin Kwon
Hi all, I am currently targeting to improve Python, Pandas UDFs Scala UDF test cases by integrating our existing *.sql files at https://issues.apache.org/jira/browse/SPARK-27921 I would appreciate that anyone who's interested in Spark contribution takes some sub-tasks. It's too many for me to do

Re: Disabling `Merge Commits` from GitHub Merge Button

2019-07-01 Thread Hyukjin Kwon
+1 2019년 7월 2일 (화) 오전 9:39, Takeshi Yamamuro 님이 작성: > I'm also using the script in both cases, anyway +1. > > On Tue, Jul 2, 2019 at 5:58 AM Sean Owen wrote: > >> I'm using the merge script in both repos. I think that was the best >> practice? >> So, sure, I'm fine with disabling it. >> >> On

Re: Exposing JIRA issue types at GitHub PRs

2019-06-16 Thread Hyukjin Kwon
Labels look good and useful. On Sat, 15 Jun 2019, 02:36 Dongjoon Hyun, wrote: > Now, you can see the exposed component labels (ordered by the number of > PRs) here and click the component to search. > > https://github.com/apache/spark/labels?sort=count-desc > > Dongjoon. > > > On Fri, Jun

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-15 Thread Hyukjin Kwon
On Fri, Jun 14, 2019 at 11:36 AM Felix Cheung > wrote: > >> How about pyArrow? >> >> -- >> *From:* Holden Karau >> *Sent:* Friday, June 14, 2019 11:06:15 AM >> *To:* Felix Cheung >> *Cc:* Bryan Cutler; Dongjoon

Re: [DISCUSS] Increasing minimum supported version of Pandas

2019-06-13 Thread Hyukjin Kwon
I am +1 to go for 0.23.2 - it brings some overhead to test PyArrow and pandas combinations. Spark 3 should be good time to increase. 2019년 6월 14일 (금) 오전 9:46, Bryan Cutler 님이 작성: > Hi All, > > We would like to discuss increasing the minimum supported version of > Pandas in Spark, which is

Re: Exposing JIRA issue types at GitHub PRs

2019-06-13 Thread Hyukjin Kwon
Yea, I think we can automate this process via, for instance, https://github.com/apache/spark/blob/master/dev/github_jira_sync.py +1 for such sort of automatic categorizing and matching metadata between JIRA and github Adding Josh and Sean as well. On Thu, 13 Jun 2019, 13:17 Dongjoon Hyun,

Re: Resolving all JIRAs affecting EOL releases

2019-05-20 Thread Hyukjin Kwon
sues.apache.org/jira/browse/SPARK-22766> > 2. > 3. > > > On Sun, May 19, 2019 at 6:43 PM Hyukjin Kwon wrote: > >> Thanks Shane .. the URL I linked somehow didn't work in other people >> browser. Hope this link works: >> >> >> https://i

Re: Resolving all JIRAs affecting EOL releases

2019-05-19 Thread Hyukjin Kwon
%3D%20-52w I will take an action around this time tomorrow considering there were some more changes to make at the last minute. 2019년 5월 19일 (일) 오후 6:39, Hyukjin Kwon 님이 작성: > I will add one more condition for "updated". So, it will additionally > avoid things updated within o

Re: Resolving all JIRAs affecting EOL releases

2019-05-19 Thread Hyukjin Kwon
ported against > 2.1.0. > > On the other hand, I'd go further and close _anything_ not updated in a > long time, like a year (or 2 if feeling conservative). That is there's > probably a lot of old cruft out there that wasn't marked with an Affected > Version, before that was required. >

Re: Resolving all JIRAs affecting EOL releases

2019-05-18 Thread Hyukjin Kwon
, 2019 at 9:07 AM Imran Rashid > wrote: > >> +1, thanks for taking this on >> >> On Wed, May 15, 2019 at 7:26 PM Hyukjin Kwon wrote: >> >>> oh, wait. 'Incomplete' can still make sense in this way then. >>> Yes, I am good with 'Incomplete' too. >>

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
sed that. Maybe that's simpler than a label. But, anything like > that sounds good. > > On Wed, May 15, 2019 at 8:40 PM Hyukjin Kwon wrote: > >> BTW, affected version became a required field (I don't remember when >> exactly was .. I believe it's around when we work on Sp

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
oh, wait. 'Incomplete' can still make sense in this way then. Yes, I am good with 'Incomplete' too. 2019년 5월 16일 (목) 오전 11:24, Hyukjin Kwon 님이 작성: > I actually recently used 'Incomplete' a bit when the JIRA is basically > too poorly formed (like just copying and pasting an error) ...

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
release, >> but, many items aren't marked, so may need to cast the net wider. >> >> I think only then does it make sense to look at bothering to reproduce >> or evaluate the 1000s that will still remain. >> >> On Wed, May 15, 2019 at 4:25 AM Hyukjin Kwon wrote:

Re: Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
ectedVersion in > versionMatch("^2.4.*") OR affectedVersion in versionMatch("^2.3.*") OR > affectedVersion in versionMatch("^2.2.*")) > AND priority NOT IN (Urgent, Blocker, Critical, High) > > > On Wed, May 15, 2019, 14:55 Hyukjin Kwon wrote: > >

Resolving all JIRAs affecting EOL releases

2019-05-15 Thread Hyukjin Kwon
Hi all, I would like to propose to resolve all JIRAs that affects EOL releases - 2.2 and below. and affected version not specified. I was rather against this way and considered this as last resort in roughly 3 years ago when we discussed. Now I think we should go ahead with this. See below. I

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-04-25 Thread Hyukjin Kwon
gt;> error, r.url, request=request, response=r, **kwargs) >> JIRAError: JiraError HTTP 403 url: >> https://issues.apache.org/jira/rest/api/2/serverInfo >> text: CAPTCHA_CHALLENGE; login-url= >> https://issues.apache.org/jira/login.jsp > > > It looks like ASF JIRA was

Re: In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-04-24 Thread Hyukjin Kwon
Can anyone take a look for this one? OPEN status JIRAs are being rapidly increased (from around 2400 to 2600) 2019년 4월 19일 (금) 오후 8:05, Hyukjin Kwon 님이 작성: > Hi all, > > Looks 'spark/dev/github_jira_sync.py' is not running correctly somewhere. > Usually the JIRA's status shoul

Re: pyspark.sql.functions ide friendly

2019-04-19 Thread Hyukjin Kwon
+1 I'm good with changing too. On Thu, 18 Apr 2019, 01:18 Reynold Xin, wrote: > Are you talking about the ones that are defined in a dictionary? If yes, > that was actually not that great in hindsight (makes it harder to read & > change), so I'm OK changing it. > > E.g. > > _functions = { >

In Apache Spark JIRA, spark/dev/github_jira_sync.py not running properly

2019-04-19 Thread Hyukjin Kwon
Hi all, Looks 'spark/dev/github_jira_sync.py' is not running correctly somewhere. Usually the JIRA's status should be updated to "IN PROGRESS" when somebody opens a PR against a JIRA. Looks now it only leaves a link and does not change JIRA's status. Can someone else who knows where it's running

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Hyukjin Kwon
st 2.7 >> and 3.5. >> >> On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin wrote: >> >>> +1 on doing this in 3.0. >>> >>> >>> On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung >> > wrote: >>> >>>> I’m +1 if 3.0 >>

Re: PySpark syntax vs Pandas syntax

2019-03-26 Thread Hyukjin Kwon
BTW, I am working on the documentation related with this subject at https://issues.apache.org/jira/browse/SPARK-26022 to describe the difference 2019년 3월 26일 (화) 오후 3:34, Reynold Xin 님이 작성: > We have some early stuff there but not quite ready to talk about it in > public yet (I hope soon

Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Hyukjin Kwon
Hi all, We really need to upgrade the minimal version soon. It's actually slowing down the PySpark dev, for instance, by the overhead that sometimes we need currently to test all multiple matrix of Arrow and Pandas. Also, it currently requires to add some weird hacks or ugly codes. Some bugs

Re: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark

2019-03-13 Thread Hyukjin Kwon
Thanks, I opened https://issues.apache.org/jira/browse/INFRA-18004 2019년 3월 14일 (목) 오전 8:35, Marcelo Vanzin 님이 작성: > Go for it. I would do it now, instead of waiting, since there's been > enough time for them to take action. > > On Wed, Mar 13, 2019 at 4:32 PM Hyukjin Kwon wrote: &

Re: Request to disable a bot account, 'Thincrs' in JIRA of Apache Spark

2019-03-13 Thread Hyukjin Kwon
Looks this bot keeps working. I am going to open a INFRA JIRA to block this bot in few days. Please let me know if you guys have a different idea to prevent this. 2019년 3월 13일 (수) 오전 8:16, Hyukjin Kwon 님이 작성: > Hi whom it may concern in Thincrs > > > > I am still observing t

Re: [pyspark] dataframe map_partition

2019-03-10 Thread Hyukjin Kwon
Because both dapply in R and Scalar Pandas UDF in Python are similar, and cover each other. FWIW, it somewhat sounds like SPARK-26413 and SPARK-26412 2019년 3월 9일 (토) 오후 12:32, peng yu 님이 작성: > Cool, thanks for letting me know, but why not support dapply >

Re: [build system] Jenkins stopped working

2019-02-19 Thread Hyukjin Kwon
t;> wrote: >>>> >>>>> yep, it got wedged. issued a restart and it should be back up in a >>>>> few minutes. >>>>> >>>>> On Tue, Feb 19, 2019 at 7:32 AM Parth Gandhi >>>>> wrote: >>>>> >>>&

[build system] Jenkins stopped working

2019-02-19 Thread Hyukjin Kwon
Hi all, Looks Jenkins stopped working. Did I maybe miss a thread, or anybody didn't report this yet? Thanks!

Re: Vectorized R gapply[Collect]() implementation

2019-02-14 Thread Hyukjin Kwon
wesome! > > > -- > *From:* Shivaram Venkataraman > *Sent:* Saturday, February 9, 2019 8:33 AM > *To:* Hyukjin Kwon > *Cc:* dev; Felix Cheung; Bryan Cutler; Liang-Chi Hsieh; Shivaram > Venkataraman > *Subject:* Re: Vectorized R gapply[Collect]()

Re: Time to cut an Apache 2.4.1 release?

2019-02-12 Thread Hyukjin Kwon
+1 for 2.4.1 2019년 2월 12일 (화) 오후 4:56, Dongjin Lee 님이 작성: > > SPARK-23539 is a non-trivial improvement, so probably would not be > back-ported to 2.4.x. > > Got it. It seems reasonable. > > Committers: > > Please don't omit SPARK-23539 from 2.5.0. Kafka community needs this > feature. > >

Vectorized R gapply[Collect]() implementation

2019-02-09 Thread Hyukjin Kwon
Guys, as continuation of Arrow optimization for R DataFrame to Spark DataFrame, I am trying to make a vectorized gapply[Collect] implementation as an experiment like vectorized Pandas UDFs It brought 820%+ performance improvement. See https://github.com/apache/spark/pull/23746 Please come and

Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-08 Thread Hyukjin Kwon
Sorry for the last minute vote. +1 2019년 2월 8일 (금) 오전 10:15, Takeshi Yamamuro 님이 작성: > Thanks, all. > > Yea, I think we don't need to block the release, too. > > > Jungtaek > Thanks! That is very helpful! > If you find something, please let me know. > > Best, > Takeshi > > On Fri, Feb 8, 2019

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-04 Thread Hyukjin Kwon
sal is to minimize > the risk and ensure the release stability and quality. > > Hyukjin Kwon 于2019年2月4日周一 下午12:01写道: > >> Xiao, to check if I understood correctly, do you mean the below? >> >> 1. Use our fork with Hadoop 2.x profile for now, and use Hive 2.x with >> Hadoo

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-04 Thread Hyukjin Kwon
n't clear whether those concerns specifically argue against >> these PRs. >> >> >> On Fri, Feb 1, 2019 at 2:03 PM Felix Cheung >> wrote: >> > >> > What’s the update and next step on this? >> > >> > We have real users getting blocked

Missing SparkR in CRAN

2019-01-24 Thread Hyukjin Kwon
Hi all, I happened to find SparkR is missing in CRAN. See https://cran.r-project.org/web/packages/SparkR/index.html I remember I saw some threads about this in spark-dev mailing list a long long ago IIRC. Is it in progress to fix it somewhere? or is it something I misunderstood?

Re: Removing old HiveMetastore(0.12~0.14) from Spark 3.0.0?

2019-01-22 Thread Hyukjin Kwon
Yea, I was thinking about that too. They are too old to keep. +1 for removing them out. 2019년 1월 23일 (수) 오전 11:30, Dongjoon Hyun 님이 작성: > Hi, All. > > Currently, Apache Spark supports Hive Metastore(HMS) 0.12 ~ 2.3. > Among them, HMS 0.x releases look very old since we are in 2019. > If these

Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Hyukjin Kwon
Resolving HIVE-16391 means Hive to release 1.2.x that contains the fixes of our Hive fork (correct me if I am mistaken). Just to be honest by myself and as a personal opinion, that basically says Hive to take care of Spark's dependency. Hive looks going ahead for 3.1.x and no one would use the

Re: Ask for reviewing on Structured Streaming PRs

2019-01-13 Thread Hyukjin Kwon
But it's true that imho there's less activity in SS in general. Should be noted. Maybe it's also because committers are busy for other stuffs. Yea, I agree that one actionable strategy for now might be to make the PR description as clear as possible to make the review easier, and then ping them

Re: [VOTE] SPARK 2.2.3 (RC1)

2019-01-10 Thread Hyukjin Kwon
+1 Thanks. 2019년 1월 11일 (금) 오전 7:01, Takeshi Yamamuro 님이 작성: > ok, thanks for the check. > > best, > takeshi > > On Fri, Jan 11, 2019 at 1:37 AM Dongjoon Hyun > wrote: > >> Hi, Takeshi. >> >> Yep. It's not a release blocker. We don't need that as Sean mentioned >> already. >> Since you are the

Re: Noisy spark-website notifications

2018-12-19 Thread Hyukjin Kwon
Yea, that's a bit noisy .. I would just completely disable it to be honest. I failed https://issues.apache.org/jira/browse/INFRA-17469 before. I would appreciate if there would be more inputs there :-) 2018년 12월 20일 (목) 오전 11:22, Nicholas Chammas 님이 작성: > I'd prefer it if we disabled all git

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-18 Thread Hyukjin Kwon
Similar issues are going on in spark-website as well. I also filed a ticket at https://issues.apache.org/jira/browse/INFRA-17469. 2018년 12월 12일 (수) 오전 9:02, Reynold Xin 님이 작성: > I filed a ticket: https://issues.apache.org/jira/browse/INFRA-17403 > > Please add your support there. > > > On Tue,

Re: How can I help?

2018-12-17 Thread Hyukjin Kwon
Please take a look for https://spark.apache.org/contributing.html . It contains virtually all information it needs for contributions. 2018년 12월 18일 (화) 오전 3:54, Raghunadh Madamanchi 님이 작성: > Hi, > > I am Raghu, I live in Dallas,TX. > Having 15+ years of Experience in Software Development and

Re: [discuss] SparkR CRAN feasibility check server problem

2018-12-12 Thread Hyukjin Kwon
on will be in > https://issues.apache.org/jira/browse/SPARK-24152. I will post here if I > get > reply from CRAN admin. > > Thanks. > > > Liang-Chi Hsieh wrote > > Thanks for letting me know! I will look into it and ask CRAN admin for > > help. > > > > > > Hyuk

<    1   2   3   4   5   6   7   >