Re: [discuss] SparkR CRAN feasibility check server problem

2018-12-12 Thread Hyukjin Kwon
of this problem..! 2018년 11월 12일 (월) 오후 1:47, Hyukjin Kwon 님이 작성: > I made a PR to officially drop R prior to version 3.4 ( > https://github.com/apache/spark/pull/23012). > The tests will probably fail for now since it produces warnings for using > R 3.1.x. > > 2018년 11월 11일 (일) 오전 3:0

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-11 Thread Hyukjin Kwon
Me too. I want to put some input as well if that can be helpful. On Wed, 12 Dec 2018, 8:20 am Reynold Xin Thanks, Sean. Which INFRA ticket is it? It's creating a lot of noise so I > want to put some pressure myself there too. > > > On Mon, Dec 10, 2018 at 9:51 AM, Sean Owen wrote: > >> Agree,

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-10 Thread Hyukjin Kwon
Ah, sorry. I missed it. It works correctly. Thanks. 2018년 12월 11일 (화) 오전 10:47, Sean Owen 님이 작성: > Did you do the step where you sync your GitHub and ASF account? After an > hour you should get an email and then you can. > > On Mon, Dec 10, 2018, 8:01 PM Hyukjin Kwon >> BTW

Re: Apache Spark git repo moved to gitbox.apache.org

2018-12-10 Thread Hyukjin Kwon
BTW, should I be able to close PRs via GitHub UI right now or is there another way to do it? Looks I'm not seeing the close button. 2018년 12월 11일 (화) 오전 1:51, Sean Owen 님이 작성: > Agree, I'll ask on the INFRA ticket and follow up. That's a lot of extra > noise. > > On Mon, Dec 10, 2018 at 11:37 AM

Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
It's merged now and in developer tools page - http://spark.apache.org/developer-tools.html#individual-tests Have some func with PySpark testing! 2018년 12월 5일 (수) 오후 4:30, Hyukjin Kwon 님이 작성: > Hey all, I kind of met the goal with a minimised fix with keeping > available framework and o

Re: Run a specific PySpark test or group of tests

2018-12-05 Thread Hyukjin Kwon
oes support unittest-based tests >>> <https://docs.pytest.org/en/latest/unittest.html>, allowing for >>> incremental adoption. I'll see how convenient it is to use with our current >>> test layout. >>> >>> On Tue, Aug 15, 2017 at 1:03 AM Hyukjin Kwon

A user of thincrs has selected this issue. Deadline: Xxx, Xxx X, XXXX XX:XX

2018-12-01 Thread Hyukjin Kwon
Just out of curiosity, does any one know what kind of account it is? https://issues.apache.org/jira/secure/ViewProfile.jspa?name=Thincrs Was wondering if it's a bot for some purposes

New PySpark test style

2018-11-13 Thread Hyukjin Kwon
Hi all, Lately, https://github.com/apache/spark/pull/23021 is merged, which tries to a big single file that contains all the tests into smaller files. I picked up one example and follow, NumPy. Because the current style looks closer to NumPy structure and looks easier to follow. Please see

Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-11 Thread Hyukjin Kwon
--- > *From:* Liang-Chi Hsieh > *Sent:* Saturday, November 10, 2018 2:32 AM > *To:* dev@spark.apache.org > *Subject:* Re: [discuss] SparkR CRAN feasibility check server problem > > > Yeah, thanks Hyukjin Kwon for bringing this up for discussion. > > I don't know how h

Re: [discuss] SparkR CRAN feasibility check server problem

2018-11-10 Thread Hyukjin Kwon
ematic. > > > On Thu, Nov 1, 2018 at 7:35 PM Hyukjin Kwon wrote: > >> Hi all, >> >> I want to raise the CRAN failure issue because it started to block Spark >> PRs time to time. Since the number >> of PRs grows hugely in Spark community, this is critical to n

Re: Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-10 Thread Hyukjin Kwon
t; Thanks Hyukjin! Very cool results >> >> Shivaram >> On Fri, Nov 9, 2018 at 10:58 AM Felix Cheung >> wrote: >> > >> > Very cool! >> > >> > >> > >> > From: Hyukjin Kwon >> > Sent

Arrow optimization in conversion from R DataFrame to Spark DataFrame

2018-11-08 Thread Hyukjin Kwon
Hi all, I am trying to introduce R Arrow optimization by reusing PySpark Arrow optimization. It boosts R DataFrame > Spark DataFrame up to roughly 900% ~ 1200% faster. Looks working fine so far; however, I would appreciate if you guys have some time to take a look

[discuss] SparkR CRAN feasibility check server problem

2018-11-01 Thread Hyukjin Kwon
Hi all, I want to raise the CRAN failure issue because it started to block Spark PRs time to time. Since the number of PRs grows hugely in Spark community, this is critical to not block other PRs. There has been a problem at CRAN (See https://github.com/apache/spark/pull/20005 for analysis). To

Re: Some PRs not automatically linked to JIRAs

2018-10-30 Thread Hyukjin Kwon
here. Thanks. 2018년 10월 1일 (월) 오후 7:15, Hyukjin Kwon 님이 작성: > Seems fixed but looks it starts to leave duplicated PR links for some > recent JIRAs. Not a big deal but are they being ran in multiple places > maybe? > > For instance, > > https://issues.apache.org/jira/brows

Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-29 Thread Hyukjin Kwon
+1 2018년 10월 30일 (화) 오전 11:03, Gengliang Wang 님이 작성: > +1 > > > 在 2018年10月30日,上午10:41,Sean Owen 写道: > > > > +1 > > > > Same result as in RC4 from me, and the issues I know of that were > > raised with RC4 are resolved. I tested vs Scala 2.12 and 2.11. > > > > These items are still targeted to

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Hyukjin Kwon
I didn't know I live in the same timezone with you Wenchen :D. Monday or Wednesday at 5PM PDT sounds good to me too FWIW. 2018년 10월 26일 (금) 오전 8:29, Ryan Blue 님이 작성: > Good point. How about Monday or Wednesday at 5PM PDT then? > > Everyone, please reply to me (no need to spam the list) with

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Hyukjin Kwon
+1 ! 2018년 10월 26일 (금) 오전 7:21, Dongjoon Hyun 님이 작성: > +1. Thank you for volunteering, Ryan! > > Bests, > Dongjoon. > > > On Thu, Oct 25, 2018 at 4:19 PM Xiao Li wrote: > >> +1 >> >> Reynold Xin 于2018年10月25日周四 下午4:16写道: >> >>> +1 >>> >>> >>> >>> On Thu, Oct 25, 2018 at 4:12 PM Li Jin wrote:

Re: What's a blocker?

2018-10-24 Thread Hyukjin Kwon
> Let's understand statements like "X is not a blocker" to mean "I don't think that X is a blocker". Interpretations not proclamations, backed up by reasons, not all of which are appeals to policy and precedent. Might not be a big deal and out of the topic but I rather hope people explicitly avoid

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Hyukjin Kwon
https://github.com/apache/spark/pull/22514 sounds like a regression that affects Hive CTAS in write path (by not replacing them into Spark internal datasources; therefore performance regression). but yea I suspect if we should block the release by this. https://github.com/apache/spark/pull/22144

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Hyukjin Kwon
I am searching and checking some PRs or JIRAs that state regression. Let me leave a link - it might be good to double check https://github.com/apache/spark/pull/22514 as well. 2018년 10월 23일 (화) 오후 11:58, Stavros Kontopoulos < stavros.kontopou...@lightbend.com>님이 작성: > Sean, > > I will try it

Re: [VOTE] SPARK 2.4.0 (RC4)

2018-10-23 Thread Hyukjin Kwon
I am sorry for raising this late. Out of curiosity, does anyone know why we don't treat SPARK-24935 (https://github.com/apache/spark/pull/22144) as a blocker? It looks it broke a API compatibility, and an actual usecase of an external library (https://github.com/DataSketches/sketches-hive) Also,

Re: GitHub is out of order

2018-10-22 Thread Hyukjin Kwon
It's chaotic now.. can we turn off the Jenkins for a while if the Github is being out of order for a while? My notifications are full of AmblapJenkins bot messages ... On Mon, 22 Oct 2018, 1:13 pm Hyukjin Kwon, wrote: > Yea.. please ignore my duplicated comments if they exist. I didn't k

Re: GitHub is out of order

2018-10-21 Thread Hyukjin Kwon
Yea.. please ignore my duplicated comments if they exist. I didn't know it's globally happening but I thought a problem specific to me so I left duplicated comments multiple times. 2018년 10월 22일 (월) 오후 12:40, Dongjoon Hyun 님이 작성: > Hi, All. > > Currently, GitHub is out of order. Apache Spark

Re: Hadoop 3 support

2018-10-17 Thread Hyukjin Kwon
See the discussion at https://github.com/apache/spark/pull/21588 2018년 10월 17일 (수) 오전 5:06, t4 님이 작성: > has anyone got spark jars working with hadoop3.1 that they can share? i am > looking to be able to use the latest hadoop-aws fixes from v3.1 > > > > -- > Sent from:

Re: Remove Flume support in 3.0.0?

2018-10-12 Thread Hyukjin Kwon
Yea, I thought we are already going to remove this out. +1 for removing it anyway. 2018년 10월 12일 (금) 오전 1:44, Wenchen Fan 님이 작성: > Note that, it was deprecated in 2.3.0 already: > https://spark.apache.org/docs/2.3.0/streaming-flume-integration.html > > On Fri, Oct 12, 2018 at 12:46 AM Reynold

Re: Possible bug in DatasourceV2

2018-10-11 Thread Hyukjin Kwon
oAttributes, options, ident, userSpecifiedSchema) > > } > > > > Correct this? > > > > Or even creating a new create which simply gets the schema as non optional? > > > > Thanks, > > Assaf > > > > *From:* Hyukjin Kwon [mailto:gurwl

Re: Possible bug in DatasourceV2

2018-10-11 Thread Hyukjin Kwon
See https://github.com/apache/spark/pull/22688 +WEnchen, here looks the problem raised. This might have to be considered as a blocker ... On Thu, 11 Oct 2018, 2:48 pm assaf.mendelson, wrote: > Hi, > > I created a datasource writer WITHOUT a reader. When I do, I get an > exception:

Re: [VOTE] SPARK 2.4.0 (RC3)

2018-10-11 Thread Hyukjin Kwon
So, which date is it? 2018년 10월 11일 (목) 오전 1:48, Garlapati, Suryanarayana (Nokia - IN/Bangalore) < suryanarayana.garlap...@nokia.com>님이 작성: > Might be you need to change the date(Oct 1 has already passed). > > > > >> The vote is open until October 1 PST and passes if a majority +1 PMC > votes

Re: DataSourceV2 APIs creating multiple instances of DataSourceReader and hence not preserving the state

2018-10-09 Thread Hyukjin Kwon
I took a look for the codes. val source = classOf[MyDataSource].getCanonicalName spark.read.format(source).load().collect() Looks indeed it calls twice. First all: Looks it creates it first to read the schema for a logical plan

Re: welcome a new batch of committers

2018-10-03 Thread Hyukjin Kwon
Yay! you guys all individuals do deserve it. Congratulations! 2018년 10월 3일 (수) 오후 4:59, Reynold Xin 님이 작성: > Hi all, > > The Apache Spark PMC has recently voted to add several new committers to > the project, for their contributions: > > - Shane Knapp (contributor to infra) > - Dongjoon Hyun

Re: Some PRs not automatically linked to JIRAs

2018-10-01 Thread Hyukjin Kwon
/browse/SPARK-25564 2018년 9월 17일 (월) 오후 10:09, Ilan Filonenko 님이 작성: > Same over here: > > https://issues.apache.org/jira/browse/SPARK-25291 / > https://github.com/apache/spark/pull/22415 > > On Sun, Sep 16, 2018 at 10:09 PM Hyukjin Kwon wrote: > >> Seems sa

Re: Some PRs not automatically linked to JIRAs

2018-09-16 Thread Hyukjin Kwon
Seems same thing is happening again. For instance, - https://issues.apache.org/jira/browse/SPARK-25440 / https://github.com/apache/spark/pull/22429 - https://issues.apache.org/jira/browse/SPARK-25429 / https://github.com/apache/spark/pull/22420 2017년 8월 3일 (목) 오전 9:06, Hyukjin Kwon 님이 작성: >

Re: from_csv

2018-09-16 Thread Hyukjin Kwon
+1 for this idea since text parsing in CSV/JSON is quite common. One thing is about schema inference likewise with JSON functionality. In case of JSON, we added schema_of_json for it and same thing should be able to apply to CSV too. If we see some more needs for it, we can consider a function

Re: Should python-2 be supported in Spark 3.0?

2018-09-16 Thread Hyukjin Kwon
I think we can deprecate it in 3.x.0 and remove it in Spark 4.0.0. Many people still use Python 2. Also, techincally 2.7 support is not officially dropped yet - https://pythonclock.org/ 2018년 9월 17일 (월) 오전 9:31, Aakash Basu 님이 작성: > Removing support for an API in a major release makes poor

Re: data source api v2 refactoring

2018-09-07 Thread Hyukjin Kwon
BTW, do we hold Datasource V2 related PRs for now until we finish this refactoring just for clarification? 2018년 9월 7일 (금) 오전 12:52, Ryan Blue 님이 작성: > Wenchen, > > I'm not really sure what you're proposing here. What is a `LogicalWrite`? > Is it something that mirrors the read side in your PR?

Re: Spark JIRA tags clarification and management

2018-09-06 Thread Hyukjin Kwon
Does anyone know if we still user starter or newbie tags as well? 2018년 9월 4일 (화) 오후 10:00, Kazuaki Ishizaki 님이 작성: > Of course, we would like to eliminate all of the following tags > > "flanky" or "flankytest" > > Kazuaki Ishizaki > > > > Fr

Re: Branch 2.4 is cut

2018-09-06 Thread Hyukjin Kwon
Thanks, Wenchen. 2018년 9월 6일 (목) 오후 3:32, Wenchen Fan 님이 작성: > Hi all, > > I've cut the branch-2.4 since all the major blockers are resolved. If no > objections I'll shortly followup with an RC to get the QA started in > parallel. > > Committers, please only merge PRs to branch-2.4 that are bug

Re: no logging in pyspark code?

2018-09-05 Thread Hyukjin Kwon
FYI, we do have a basic logging by warnings module. 2018년 8월 28일 (화) 오전 2:05, Imran Rashid 님이 작성: > ah, great, thanks! sorry I missed that, I'll watch that jira. > > On Mon, Aug 27, 2018 at 12:41 PM Ilan Filonenko wrote: > >> A JIRA has been opened up on this exact topic: SPARK-25236 >>

Re: code freeze and branch cut for Apache Spark 2.4

2018-09-05 Thread Hyukjin Kwon
Oops, one more - https://github.com/apache/spark/pull/6. I just read this thread. 2018년 9월 6일 (목) 오후 12:12, Sean Owen 님이 작성: > (I slipped https://github.com/apache/spark/pull/22340 in for Scala 2.12. > Maybe it really is the last one. In any event, yes go ahead with a 2.4 RC) > > On Wed, Sep

Re: python test infrastructure

2018-09-05 Thread Hyukjin Kwon
> 1. all of the output in target/test-reports & python/unit-tests.log should be included in the jenkins archived artifacts. Hmmm, I thought they are already archived ( https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95734/artifact/target/unit-tests.log ). FWIW, unit-tests.log

Re: Spark JIRA tags clarification and management

2018-09-03 Thread Hyukjin Kwon
Thanks, Reynold. +Adding Xiao and Wenchen who I saw often used tags. Would you have some tags you think we should document more? 2018년 9월 4일 (화) 오전 9:27, Reynold Xin 님이 작성: > The most common ones we do are: > > releasenotes > > correctness > > > > On Mon, Sep 3, 2

Re: Spark JIRA tags clarification and management

2018-09-03 Thread Hyukjin Kwon
like rel note. Would be good to clarify. > > -- > *From:* Reynold Xin > *Sent:* Sunday, September 2, 2018 11:50 PM > *To:* Hyukjin Kwon > *Cc:* dev > *Subject:* Re: Spark JIRA tags clarification and management > > It would be great to documen

Re: Jenkins automatic disabling service - who and why?

2018-09-03 Thread Hyukjin Kwon
own because when we type "ok to test", the Jenkins asking is gone away. 2018년 9월 3일 (월) 오후 8:54, Hyukjin Kwon 님이 작성: > Not a big deal but it has been few months since I saw this, and wondering > why it suddenly asks Jenkins admin verification from at certain point. > > I had a sma

Re: Jenkins automatic disabling service - who and why?

2018-09-03 Thread Hyukjin Kwon
e web app UI? > > On Mon, Sep 3, 2018, 1:54 AM Hyukjin Kwon wrote: > >> Hi all, >> >> I lately noticed we started to block Jenkins tests in old PRs. For >> instance, see https://github.com/apache/spark/pull/18447 >> I don't explicitly object this idea but

Jenkins automatic disabling service - who and why?

2018-09-03 Thread Hyukjin Kwon
Hi all, I lately noticed we started to block Jenkins tests in old PRs. For instance, see https://github.com/apache/spark/pull/18447 I don't explicitly object this idea but at least can I ask who and why this was started? Is it for notification purpose or to save resource? Did I miss some

Spark JIRA tags clarification and management

2018-09-03 Thread Hyukjin Kwon
Hi all, I lately noticed tags are often used to classify JIRAs. I was thinking we better explicitly document what tags are used and explain which tag means what. For instance, we documented "Contributing to JIRA Maintenance" at https://spark.apache.org/contributing.html before (thanks, Sean Owen)

Re: [DISCUSS] move away from python doctests

2018-08-31 Thread Hyukjin Kwon
IMHO, one thing we should consider before this is, refactoring the PySpark tests all to make them separate pairs for main codes. Now, we put all those unit tests into few several files, which makes hard to follow the tests. 2018년 8월 31일 (금) 오후 2:05, Felix Cheung 님이 작성: > +1 on what Li said. > >

Re: Spark Streaming : Multiple sources found for csv : Error

2018-08-30 Thread Hyukjin Kwon
Yea, this is exactly what I have been worried of the recent changes (discussed in https://issues.apache.org/jira/browse/SPARK-24924) See https://github.com/apache/spark/pull/17916. This should be fine in upper Spark versions. FYI, +Wechen and Dongjoon I want to add Thomas Graves and Gengliang

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
possible. > > > > On Thu, Aug 23, 2018 at 6:38 PM Hyukjin Kwon wrote: > >> If you meant "Code Style Guide", many of them are missing and it refers >> https://docs.scala-lang.org/style/ not >> https://github.com/databricks/scala-style-guide (please correct me if I

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
o follow the code around the code you're changing. > > > > On Thu, Aug 23, 2018 at 8:14 PM Hyukjin Kwon > wrote: > > Hi all, > > > > I usually follow https://github.com/databricks/scala-style-guide for > Apache Spark's style, which is usually generally the same with the S

Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
Hi all, I usually follow https://github.com/databricks/scala-style-guide for Apache Spark's style, which is usually generally the same with the Spark's code base in practice. Thing is, we don't explicitly mention this within Apache Spark as far as I can tell. Can we explicitly mention this or

Re: best way to run one python test?

2018-08-20 Thread Hyukjin Kwon
s.py#L74-L97 > > those don't matter in most cases I guess? > > On Sun, Aug 19, 2018 at 11:54 PM Hyukjin Kwon wrote: > >> There's informal way to test specific tests. For instance: >> >> SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests >> >&g

[DISCUSS] USING syntax for Datasource V2

2018-08-20 Thread Hyukjin Kwon
Hi all, I have been trying to follow `USING` syntax support since that looks currently not supported whereas `format` API supports this. I have been trying to understand why and talked with Ryan. Ryan knows all the details and, He and I thought it's good to post here - I just started to look

Re: best way to run one python test?

2018-08-19 Thread Hyukjin Kwon
There's informal way to test specific tests. For instance: SPARK_TESTING=1 ../bin/pyspark pyspark.sql.tests VectorizedUDFTests I have a partial fix for our testing script to support this way in my local but couldn't have enough time to make a PR for it yet. 2018년 8월 20일 (월) 오전 11:08, Imran

Re: [R] discuss: removing lint-r checks for old branches

2018-08-19 Thread Hyukjin Kwon
SGTM too 2018년 8월 12일 (일) 오전 7:41, shane knapp 님이 작성: > they do seem like real failures on branches 2.0 and 2.1. > > regarding infrastructure, centos and ubuntu have lintr pinned to > 1.0.1.9000, and installed via: > devtools::install_github('jimhester/lintr@5431140') > > builds on branches 2.2+

Re: [discuss][minor] impending python 3.x jenkins upgrade... 3.5.x? 3.6.x?

2018-08-19 Thread Hyukjin Kwon
Actually Python 3.7 is released ( https://www.python.org/downloads/release/python-370/) too and I fixed the compatibility issues accordingly - https://github.com/apache/spark/pull/21714 There has been an issue for 3.6 (comparing to lower versions of Python including 3.5) -

Re: [build system] bumped pull request builder job timeout to 400mins

2018-08-07 Thread Hyukjin Kwon
Thanks, Shane. 2018년 8월 8일 (수) 오전 1:05, shane knapp 님이 작성: > i hate doing this, because our tests and builds take WY too long, > but this should help get PRs through before the code freeze. > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead >

Re: Review notification bot

2018-07-31 Thread Hyukjin Kwon
nt-409035244 > > Is the issue that @-mentions cause emails too? > > Is there any option to maybe only consider pinging someone if they've > touched the code within the last N months? > > On Tue, Jul 31, 2018 at 2:31 AM Hyukjin Kwon wrote: > >> > I originally did t

Re: Review notification bot

2018-07-31 Thread Hyukjin Kwon
>> >>>> Also if we are going to use this, can we rename the bot to something >>>> like spark-bot, rather than holden's personal bot? >>>> >>> I originally did that, but GitHub told me I could only have one personal >>> and one bot account.

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
he folks being pinged are not just committers. The hope > is to get more code authors who aren't committers involved in the reviews > and then eventually become committers. > > On Mon, Jul 30, 2018, 9:09 PM Hyukjin Kwon wrote: > >> *reviewers: I mean people who committed the PR given

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
*reviewers: I mean people who committed the PR given my observation. 2018년 7월 31일 (화) 오전 11:50, Hyukjin Kwon 님이 작성: > I was wondering if we can leave the configuration open and accept some > custom configurations, IMHO, because I saw some people less related or less > active are con

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
nal, is there something you want to try and > change? > > On Mon, Jul 30, 2018 at 7:30 PM Hyukjin Kwon wrote: > >> I see. Thanks. I was wondering if I can see the configuration file since >> that looks needed (https://github.com/holdenk/mention-bot#configuration) >> but I

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
ing is the the form in my own repo (set up for K8s > deployment) - http://github.com/holdenk/mention-bot > > On Mon, Jul 30, 2018 at 3:15 AM Hyukjin Kwon wrote: > >> Holden, so, is it a fork in >> https://github.com/facebookarchive/mention-bot? Would you mind if I ask >> w

Re: [Spark SQL] Future of CalendarInterval

2018-07-30 Thread Hyukjin Kwon
FYI, org.apache.spark.unsafe.types.CalendarInterval is undocumented in both scaladoc/javadoc (entire unsafe module) but org.apache.spark.sql.types.CalendarIntervalType is exposed ( https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.types.CalendarIntervalType ) +1 for

Re: Review notification bot

2018-07-30 Thread Hyukjin Kwon
Holden, so, is it a fork in https://github.com/facebookarchive/mention-bot? Would you mind if I ask where I can see the configurations for it? 2018년 7월 23일 (월) 오전 10:16, Holden Karau 님이 작성: > Yeah so the issue with codeowners is it will only assign to committers on > the repo (the Beam project

Re: Build timeout -- continuous-integration/appveyor/pr — AppVeyor build failed

2018-07-24 Thread Hyukjin Kwon
not mistaken. 2018년 7월 25일 (수) 오전 9:44, shane knapp 님이 작성: > out of curiosity: why are we using appveyor again? > > closing and reopening PRs solely to retrigger builds seems... cumbersome. > > shane > > On Tue, Jul 24, 2018 at 6:09 PM, Hyukjin Kwon wrote: > >> loo

Re: Build timeout -- continuous-integration/appveyor/pr — AppVeyor build failed

2018-07-24 Thread Hyukjin Kwon
ve a lot of time to debug *why* this happened, or > how to go about triggering another build, but at the very least we should > up the timeout. > > On Sun, May 13, 2018 at 7:38 PM, Hyukjin Kwon wrote: > >> Yup, I am not saying it's required but might be better since that's >> written

Re: Cleaning Spark releases from mirrors, and the flakiness of HiveExternalCatalogVersionsSuite

2018-07-16 Thread Hyukjin Kwon
+1 2018년 7월 17일 (화) 오전 7:34, Sean Owen 님이 작성: > Fix is committed to branches back through 2.2.x, where this test was added. > > There is still some issue; I'm seeing that archive.apache.org is > rate-limiting downloads and frequently returning 503 errors. > > We can help, I guess, by avoiding

Stale PR update and review request

2018-07-15 Thread Hyukjin Kwon
Hi all, I was checking https://spark-prs.appspot.com/users who has PRs more then 10. viirya 13 mgaido91 12 wangyum 12 maropu

Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Hyukjin Kwon
think we will live with this bug for long time anyway. 2018년 7월 9일 (월) 오전 9:28, Saisai Shao 님이 작성: > Thanks @Hyukjin Kwon . Yes I'm using python2 to > build docs, looks like Python2 with Sphinx has issues. > > What is the pending thing for this PR ( > https://github.com/apache/s

Re: [VOTE] Spark 2.2.2 (RC2)

2018-07-01 Thread Hyukjin Kwon
enchen Fan wrote: >> >>> +1 >>> >>> On Thu, Jun 28, 2018 at 10:19 AM zhenya Sun wrote: >>> >>>> +1 >>>> >>>> 在 2018年6月28日,上午10:15,Hyukjin Kwon 写道: >>>> >>>> +1 >>>> >>>> 201

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-27 Thread Hyukjin Kwon
+1 2018년 6월 28일 (목) 오전 8:42, Sean Owen 님이 작성: > +1 from me too. > > On Wed, Jun 27, 2018 at 3:31 PM Tom Graves > wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 2.2.2. >> >> The vote is open until Mon, July 2nd @ 9PM UTC (2PM PDT) and passes if a >>

Re: Jenkins availability question

2018-06-16 Thread Hyukjin Kwon
Ooops, I just noticed Shane's email. Please ignore this email. 2018년 6월 16일 (토) 오후 7:43, Hyukjin Kwon 님이 작성: > Is Jenkins down now? I was about to investigate some issues that happened > specifically within Jenkins. > > I would appreciate if anyone could roughly confirm when i

Jenkins availability question

2018-06-16 Thread Hyukjin Kwon
Is Jenkins down now? I was about to investigate some issues that happened specifically within Jenkins. I would appreciate if anyone could roughly confirm when it comes back, or if it's actually now working fine but there's an issue specific to me to access to the Jenkins.

Re: [build system] meet your build engineer @ spark ai summit SF 2018

2018-06-07 Thread Hyukjin Kwon
I regret that I couldn't make it to Spark Summit :(. 2018년 6월 6일 (수) 오전 3:25, Holden Karau 님이 작성: > That's awesome! > > On Tue, Jun 5, 2018 at 12:23 PM, shane knapp wrote: > >> just a reminder to come meet your build engineer! >> >> we'll also be having a couple of demos of current projects in

Re: [VOTE] Spark 2.3.1 (RC4)

2018-06-03 Thread Hyukjin Kwon
+1 2018년 6월 3일 (일) 오후 9:25, Ricardo Almeida 님이 작성: > +1 (non-binding) > > On 3 June 2018 at 09:23, Dongjoon Hyun wrote: > >> +1 >> >> Bests, >> Dongjoon. >> >> On Sat, Jun 2, 2018 at 8:09 PM, Denny Lee wrote: >> >>> +1 >>> >>> On Sat, Jun 2, 2018 at 4:53 PM Nicholas Chammas < >>>

Re: Running lint-java during PR builds?

2018-05-28 Thread Hyukjin Kwon
gt; backed up since - if I recall - all of ASF shares one queue. > >> > >> At the number of PRs Spark has this could be a big issue. > >> > >> > >> > >> From: Marcelo Vanzin <van...@cloudera.com> > >> Sent: Monday, May 21, 2018 9:08:28 AM > &g

Re: Running lint-java during PR builds?

2018-05-22 Thread Hyukjin Kwon
lume of test runs on Travis. >>> > >>> > In ASF projects Travis could get significantly >>> > backed up since - if I recall - all of ASF shares one queue. >>> > >>> > At the number of PRs Spark has this could be a big issue. >>> > >>

Re: Running lint-java during PR builds?

2018-05-21 Thread Hyukjin Kwon
I am going to open an INFRA JIRA if there's no explicit objection in few days. 2018-05-21 13:09 GMT+08:00 Hyukjin Kwon <gurwls...@gmail.com>: > I would like to revive this proposal. Travis CI. Shall we give this try? I > think it's worth trying it. > > 2016-11-17 3:50 GMT+0

Re: Running lint-java during PR builds?

2018-05-20 Thread Hyukjin Kwon
I would like to revive this proposal. Travis CI. Shall we give this try? I think it's worth trying it. 2016-11-17 3:50 GMT+08:00 Dongjoon Hyun : > Hi, Marcelo and Ryan. > > That was the main purpose of my proposal about Travis.CI. > IMO, that is the only way to achieve that

Re: Build timeout -- continuous-integration/appveyor/pr — AppVeyor build failed

2018-05-13 Thread Hyukjin Kwon
>From a very quick look, I believe that's just occasional network issue in AppVeyor. For example, in this case: Downloading: https://repo.maven.apache.org/maven2/org/scala-lang/scala-compiler/2.11.8/scala-compiler-2.11.8.jar This took 26ish mins and seems further downloading jars look mins much

Re: V2.3 Scala API to Github Links Incorrect

2018-04-15 Thread Hyukjin Kwon
t; *To: *"Thakrar, Jayesh" <jthak...@conversantmedia.com> > *Cc: *"dev@spark.apache.org" <dev@spark.apache.org>, Hyukjin Kwon < > gurwls...@gmail.com> > *Subject: *Re: V2.3 Scala API to Github Links Incorrect > > > > [+Hyukjin] > > > > Thank

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-01 Thread Hyukjin Kwon
Congratuation, Zhenhua Wang! Very well deserved. 2018-04-02 13:28 GMT+08:00 Wenchen Fan : > Hi all, > > The Spark PMC recently added Zhenhua Wang as a committer on the project. > Zhenhua is the major contributor of the CBO project, and has been > contributing across several

Re: Welcoming some new committers

2018-03-03 Thread Hyukjin Kwon
Congratulations !! On 3 Mar 2018 4:43 pm, "Saisai Shao" wrote: > Congrats to everyone! > > Thanks > Jerry > > 2018-03-03 15:30 GMT+08:00 Liang-Chi Hsieh : > >> >> Congrats to everyone! >> >> >> Kazuaki Ishizaki wrote >> > Congratulations to everyone! >>

Re: [VOTE] Spark 2.3.0 (RC5)

2018-02-24 Thread Hyukjin Kwon
+1 2018-02-24 16:57 GMT+09:00 Bryan Cutler : > +1 > Tests passed and additionally ran Arrow related tests and did some perf > checks with python 2.7.14 > > On Fri, Feb 23, 2018 at 6:18 PM, Holden Karau > wrote: > >> Note: given the state of Jenkins I'd

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-20 Thread Hyukjin Kwon
+1 too 2018-02-20 14:41 GMT+09:00 Takuya UESHIN : > +1 > > > On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang > wrote: > >> +1 >> >> >> Wenchen Fan 于2018年2月20日 周二下午1:09写道: >> >>> +1 >>> >>> On Tue, Feb 20, 2018 at 12:53 PM,

Re: [VOTE] Spark 2.3.0 (RC4)

2018-02-19 Thread Hyukjin Kwon
Ah, I see. For 1), I overlooked Felix's input here. I couldn't foresee this when I added this documentation because it worked in my simple demo: https://spark-test.github.io/sparksqldoc/search.html?q=approx https://spark-test.github.io/sparksqldoc/#approx_percentile Will try to investigate this

Re: Thoughts on Cloudpickle Update

2018-01-19 Thread Hyukjin Kwon
l at that point for if we want it > to go to master or master & branch-2.3? > > On Fri, Jan 19, 2018 at 12:30 AM, Hyukjin Kwon <gurwls...@gmail.com> > wrote: > >> > So given that it fixes some real world bugs, any particular reason >> why? Would you be com

Re: Thoughts on Cloudpickle Update

2018-01-19 Thread Hyukjin Kwon
n 19, 2018 7:28 PM, "Hyukjin Kwon" <gurwls...@gmail.com> wrote: > > > Is it an option to match the latest version of cloudpickle and still > set protocol level 2? > > IMHO, I think this can be an option but I am not fully sure yet if we > should/could go ahead for it w

Re: Thoughts on Cloudpickle Update

2018-01-18 Thread Hyukjin Kwon
protocol level 2? >> >> I agree that upgrading to try and match version 0.4.2 would be a good >> starting point. Unless no one objects, I will open up a JIRA and try to do >> this. >> >> Thanks, >> Bryan >> >> On Mon, Jan 15, 2018 at 7:5

Re: Thoughts on Cloudpickle Update

2018-01-15 Thread Hyukjin Kwon
Hi Bryan, Yup, I support to match the version. I pushed it forward before to match it with https://github.com/cloudpipe/cloudpickle before few times in Spark's copy and also cloudpickle itself with few fixes. I believe our copy is closest to 0.4.1. I have been trying to follow up the changes in

Re: Assign SPARK JIRA 18844 to me

2018-01-03 Thread Hyukjin Kwon
No need to assign. Just leave a comment saying you are working on it and open a PR. 2018-01-04 14:09 GMT+09:00 Sandeep Kr. Choudhary < tssandeepkumarchoudh...@gmail.com>: > Hi All, > > This is Sandeep from India. I was trying to solve the SPARK- >

Re: [VOTE] Spark 2.2.1 (RC2)

2017-11-28 Thread Hyukjin Kwon
+1 2017-11-29 8:18 GMT+09:00 Henry Robinson : > (My vote is non-binding, of course). > > On 28 November 2017 at 14:53, Henry Robinson wrote: > >> +1, tests all pass for me on Ubuntu 16.04. >> >> On 28 November 2017 at 10:36, Herman van Hövell tot Westerflier

Re: [discuss][SQL] Partitioned column type inference proposal

2017-11-14 Thread Hyukjin Kwon
es, the priority from high to low: >> DecimalType, LongType, IntegerType. This is because DecimalType is used as >> big integer when paring partition column values. >> 4. DoubleType can't be merged with other types, except DoubleType itself. >> 5. when merging TimestampType wit

[discuss][SQL] Partitioned column type inference proposal

2017-11-14 Thread Hyukjin Kwon
Hi dev, I would like to post a proposal about partitioned column type inference (related with 'spark.sql.sources.partitionColumnTypeInference.enabled' configuration). This thread focuses on the type coercion (finding the common type) in partitioned columns, in particular, when the different form

Re: [discuss][PySpark] Can we drop support old Pandas (<0.19.2) or what version should we support?

2017-11-14 Thread Hyukjin Kwon
+0 to drop it as I said in the PR. I am seeing It brings a lot of hard time to get the cool changes through, and is slowing down them to get pushed. My only worry is, users who depends on lower pandas versions (Pandas 0.19.2 seems released less then a year before. In the similar time, Spark 2.1.0

Re: Spark build is failing in amplab Jenkins

2017-11-04 Thread Hyukjin Kwon
I assume it is as it says: Python versions prior to 2.7 are not supported. Looks this happens in worker 2, 6 and 7 given my observation. On 4 Nov 2017 5:15 pm, "Sean Owen" wrote: Agree, seeing this somewhat regularly on the pull request builder. Do some machines

Re: Spark-XML maintenance

2017-10-28 Thread Hyukjin Kwon
I am sorry about the delay. I was super busy ... Will try to take a look for all of it within the next week, 2017-10-27 2:23 GMT+09:00 Reynold Xin : > Adding Hyukjin who has been maintaining it. > > The easiest is probably to leave comments in the repo. > > On Thu, Oct 26,

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-06 Thread Hyukjin Kwon
1 the release > > Perhaps someone can take a look at the R failures on RHEL just in case > though. > > > On Fri, 6 Oct 2017 at 05:58 vaquar khan <vaquar.k...@gmail.com> wrote: > >> +1 (non binding ) tested on Ubuntu ,all test case are passed. >> >> Re

Re: [VOTE] Spark 2.1.2 (RC4)

2017-10-05 Thread Hyukjin Kwon
+1 too. On 6 Oct 2017 10:49 am, "Reynold Xin" wrote: +1 On Mon, Oct 2, 2017 at 11:24 PM, Holden Karau wrote: > Please vote on releasing the following candidate as Apache Spark version 2 > .1.2. The vote is open until Saturday October 7th at 9:00

<    1   2   3   4   5   6   7   >