Re: m2 cache issues in Jenkins?

2020-07-06 Thread Jungtaek Lim
gt; Shane, can we remove .m2 in worker machine 4? >>> >>> 2020년 7월 3일 (금) 오전 8:18, Jungtaek Lim 님이 >>> 작성: >>> >>>> Looks like Jenkins service itself becomes unstable. It took >>>> considerable time to just open the test report for a speci

Re: m2 cache issues in Jenkins?

2020-07-02 Thread Jungtaek Lim
Looks like Jenkins service itself becomes unstable. It took considerable time to just open the test report for a specific build, and Jenkins doesn't pick the request on rebuild (retest this, please) in Github comment. On Thu, Jul 2, 2020 at 2:12 PM Hyukjin Kwon wrote: > Ah, okay. Actually there

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-01 Thread Jungtaek Lim
> >> > >> > >> > >> https://issues.apache.org/jira/browse/SPARK-32136 > >> > >> > >> > >> Thanks, > >> > >> Jason. > >> > >> > >> > >> From: Jungtaek Lim > >&

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-30 Thread Jungtaek Lim
>> On Thu, Jun 25, 2020 at 4:58 AM 郑瑞峰 wrote: >> >>> I volunteer to be a release manager of 3.0.1, if nobody is working on >>> this. >>> >>> >>> -- 原始邮件 -- >>> *发件人:* "Gengliang Wang"; &g

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Jungtaek Lim
Does this count only "new features" (probably major), or also count "improvements"? I'm aware of a couple of improvements which should be ideally included in the next release, but if this counts only major new features then don't feel they should be listed. On Tue, Jun 30, 2020 at 1:32 AM Holden

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-06-26 Thread Jungtaek Lim
in Spark 3.0.0 pulls the schema from serializer, which removes the problem. The remaining question is, would we like to fix it in 2.4.x? On Tue, May 26, 2020 at 2:54 PM Jungtaek Lim wrote: > I meant how to interpret Java Beans in Spark are not consistently defined. > > Unlike you'v

Re: Handling user-facing metadata issues on file stream source & sink

2020-06-25 Thread Jungtaek Lim
was throwing OOME. 1. https://github.com/apache/spark/pull/28904 On Sun, Jun 14, 2020 at 4:14 PM Jungtaek Lim wrote: > Bump again - hope to get some traction because these issues are either > long-standing problems or noticeable improvements (each PR has numbers/UI > graph to show the im

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-06-23 Thread Jungtaek Lim
+1 on a 3.0.1 soon. Probably it would be nice if some Scala experts can take a look at https://issues.apache.org/jira/browse/SPARK-32051 and include the fix into 3.0.1 if possible. Looks like APIs designed to work with Scala 2.11 & Java bring ambiguity in Scala 2.12 & Java. On Wed, Jun 24, 2020

Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread Jungtaek Lim
Great, thanks all for your efforts on the huge step forward! On Fri, Jun 19, 2020 at 12:13 PM Hyukjin Kwon wrote: > Yay! > > 2020년 6월 19일 (금) 오전 4:46, Mridul Muralidharan 님이 작성: > >> Great job everyone ! Congratulations :-) >> >> Regards, >> Mridul >> >> On Thu, Jun 18, 2020 at 10:21 AM Reynold

Re: [DISCUSS] remove the incomplete code path on aggregation for continuous mode

2020-06-16 Thread Jungtaek Lim
Bump this again. I filed SPARK-31985 [1] and plan to submit a PR in a couple of days if there's no voice on the reason we should keep it. 1. https://issues.apache.org/jira/browse/SPARK-31985 On Thu, May 21, 2020 at 8:54 AM Jungtaek Lim wrote: > Let me share the effect on remov

Re: Handling user-facing metadata issues on file stream source & sink

2020-06-14 Thread Jungtaek Lim
pull/28422 2. https://github.com/apache/spark/pull/28363 3. https://github.com/apache/spark/pull/27620 4. https://github.com/apache/spark/pull/27649 5. https://github.com/apache/spark/pull/27694 On Fri, May 22, 2020 at 12:50 PM Jungtaek Lim wrote: > Worth noting that I got similar question

Re: Revisiting the idea of a Spark 2.5 transitional release

2020-06-12 Thread Jungtaek Lim
or the feature has been provided for a long time in competitive products. Thanks, Jungtaek Lim (HeartSaVioR) 1. http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Spark-2-5-release-td27963.html#a27979 On Sat, Jun 13, 2020 at 10:13 AM Ryan Blue wrote: > +1 for a 2.x release with a DSv2

Re: [vote] Apache Spark 3.0 RC3

2020-06-07 Thread Jungtaek Lim
I'm seeing the effort of including the correctness issue SPARK-28067 [1] to 3.0.0 via SPARK-31894 [2]. That doesn't seem to be a regression so technically doesn't block the release, so while it'd be good to weigh its worth (it requires some SS users to discard the state so might bring less

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-25 Thread Jungtaek Lim
not checking the property names, just using > ordering, in your reproducer. That seems different? > > On Sun, May 24, 2020 at 3:00 AM Jungtaek Lim > wrote: > > > > OK I just went through the change, and the change breaks bunch of > existing UTs. > > > > ht

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-24 Thread Jungtaek Lim
nd the difference would be confusing if we don't explain it enough. Any thoughts? On Mon, May 11, 2020 at 1:36 PM Jungtaek Lim wrote: > First case is not tied to the batch / streaming as Encoders.bean simply > fails when inferring schema. > > Second case is tied to the streaming,

Re: Handling user-facing metadata issues on file stream source & sink

2020-05-21 Thread Jungtaek Lim
Worth noting that I got similar question around local community as well. These reporters didn't encounter the edge-case, they're encountered the critical issue in the normal running of streaming query. On Fri, May 8, 2020 at 4:49 PM Jungtaek Lim wrote: > (bump to expose the discussion to m

Re: [VOTE] Apache Spark 3.0 RC2

2020-05-21 Thread Jungtaek Lim
Looks like there're new blocker issues newly figured out. * https://issues.apache.org/jira/browse/SPARK-31786 * https://issues.apache.org/jira/browse/SPARK-31761 (not yet marked as blocker but according to JIRA comment it's a regression issue as well as correctness issue IMHO) Let's collect the

Re: [DISCUSS] "complete" streaming output mode

2020-05-21 Thread Jungtaek Lim
that I build to work magically without > me having to put any thought into it, but then I feel most people in this > email list would be out of jobs. These are typical considerations that you > need to put into how you architect data pipelines. If someone doesn't put > thought i

Re: [DISCUSS] "complete" streaming output mode

2020-05-20 Thread Jungtaek Lim
efore then it's more important to build a consensus that complete mode is only used for few use case (we need to collect these use cases of course) and the cost of maintenance exceeds the benefit. For sure I'm open for disagreement. Thanks, Jungtaek Lim (HeartSaVioR) On Thu, May 21, 2020 at 9:45 A

Re: [DISCUSS] remove the incomplete code path on aggregation for continuous mode

2020-05-20 Thread Jungtaek Lim
t compatibility, etc. while it never be used in production. On Tue, May 19, 2020 at 1:14 PM Jungtaek Lim wrote: > Hi devs, > > during experiment on complete mode I realized we left some incomplete code > parts on supporting aggregation for continuous mode. (shuffle & coalesce) > &g

[DISCUSS] remove the incomplete code path on aggregation for continuous mode

2020-05-18 Thread Jungtaek Lim
yone is working on this). The functionality is undocumented (as the work was only done partially) and continuous mode is experimental so I don't feel risks to get rid of the part. What do you think? If it makes sense then I'll raise a PR to get rid of the incomplete codes. Thanks, Jungtaek

[DISCUSS] "complete" streaming output mode

2020-05-18 Thread Jungtaek Lim
consensus on the viewpoint of complete mode and drop supporting it if we agree with. Would like to hear everyone's opinions. It would be great if someone brings the valid cases where complete mode is being used in production. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/jira/browse/SPA

Re: [VOTE] Apache Spark 3.0 RC2

2020-05-18 Thread Jungtaek Lim
Looks like the priority of SPARK-31706 [1] is incorrectly marked - it sounds like a blocker, as SPARK-26785 [2] / SPARK-26956 [3] dropped the feature of "update" on streaming output mode (as a result) and SPARK-31706 restores it. SPARK-31706 is not yet resolved, which may be valid reason to roll a

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-12 Thread Jungtaek Lim
arameter (even if it > is hidden) > > On Tue, May 12, 2020 at 12:46 PM Ryan Blue wrote: > >> +1 for the approach Jungtaek suggests. That will avoid needing to support >> behavior that is not well understood with minimal changes. >> >> On Tue, May 12, 2

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-12 Thread Jungtaek Lim
Before I forget, we'd better not forget to change the doc, as create table doc looks to represent current syntax which will be incorrect later. On Tue, May 12, 2020 at 5:32 PM Jungtaek Lim wrote: > It's not only for end users, but also for us. Spark itself uses the config > "true&

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-12 Thread Jungtaek Lim
ing the unified syntax into master. The only issue >> appears to be whether or not to pass the presence of the EXTERNAL keyword >> through to a catalog in v2. Maybe it's time to start a discuss thread for >> that issue so we're not stuck for another 6 weeks on it. >> >> On Mon, Ma

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread Jungtaek Lim
Btw another wondering here is, is it good to retain the flag on master as an intermediate step? Wouldn't it be better for us to start "unified create table syntax" from scratch? On Tue, May 12, 2020 at 6:50 AM Jungtaek Lim wrote: > I'm sorry, but I have to agree with Ryan and Rus

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-11 Thread Jungtaek Lim
T support the behavior >> when spark.sql.legacy.createHiveTableByDefault.enabled is disabled, we >> should not ship Spark 3.0 with SPARK-30098. Otherwise, we will have to deal >> with this problem for years to come. >> >> On Mon, May 11, 2020 at 1:06 AM JackyLee wr

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-10 Thread Jungtaek Lim
on the sequence of the columns while matching row with schema, then it could be affected.) On Mon, May 11, 2020 at 1:24 PM Wenchen Fan wrote: > is it a problem only for streaming or it affects batch queries as well? > > On Fri, May 8, 2020 at 11:42 PM Jungtaek Lim > wrote: > >> T

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-05-10 Thread Jungtaek Lim
s us more time to think about how to do it in 3.1. > > If you have other ideas, please reply to this thread. > > Thanks, > Wenchen > > On Thu, Mar 26, 2020 at 7:28 AM Jungtaek Lim > wrote: > >> Thanks, filed SPARK-31257 >> <https://issues.apache.org/jira/browse

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-08 Thread Jungtaek Lim
Fan wrote: > Can you give some simple examples to demonstrate the problem? I think the > inconsistency would bring problems but don't know how. > > On Fri, May 8, 2020 at 3:49 PM Jungtaek Lim > wrote: > >> (bump to expose the discussion to more readers) >> >

Re: Handling user-facing metadata issues on file stream source & sink

2020-05-08 Thread Jungtaek Lim
(bump to expose the discussion to more readers) On Mon, May 4, 2020 at 5:45 PM Jungtaek Lim wrote: > Hi devs, > > I'm seeing more and more structured streaming end users encountered the > metadata issues on file stream source and sink. They have been known-issues > and the

Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-08 Thread Jungtaek Lim
(bump to expose the discussion to more readers) On Mon, May 4, 2020 at 4:57 PM Jungtaek Lim wrote: > Hi devs, > > There're couple of issues being reported on the user@ mailing list which > results in being affected by inconsistent schema on Encoders.bean. > > 1. Typed d

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-05-07 Thread Jungtaek Lim
I don't see any new features/functions for these blockers. For SPARK-31257 (which is filed and marked as a blocker from me), I agree unifying create table syntax shouldn't be a blocker for Spark 3.0.0, as that is a new feature, but even we put the proposal aside, the problem remains the same and

Handling user-facing metadata issues on file stream source & sink

2020-05-04 Thread Jungtaek Lim
s, but I don't think starter would start from there. End users may just try to find alternatives - not alternative of data source, but alternative of streaming processing framework. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://lists.apache.org/thread.html/r0916e2fe8181a58c20ee8a76341aae243c76bbf

Inconsistent schema on Encoders.bean (reported issues from user@)

2020-05-04 Thread Jungtaek Lim
the ideal form of the bean Spark expects. Would like to hear opinions on this. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://lists.apache.org/thread.html/r8f8e680e02955cdf05b4dd34c60a9868288fd10a03f1b1b8627f3d84%40%3Cuser.spark.apache.org%3E 2. http://mail-archives.apache.org/mod_mbox/spark-user

Re: InferFiltersFromConstraints logical optimization rule and Optimizer.defaultBatches?

2020-04-14 Thread Jungtaek Lim
Please correct me if I'm missing something. At a glance, your statements look correct if I understand correctly. I guess it might be simply missed, but it sounds as pretty trivial one as only a line can be removed safely which won't affect anything. (filterNot should be retained even we remove the

Re: Automatic PR labeling

2020-04-13 Thread Jungtaek Lim
Nice addition, looks pretty good! On Tue, Apr 14, 2020 at 1:17 AM Xiao Li wrote: > Looks great! > > Thanks for making this happen. This is pretty helpful. > > Xiao > > On Sun, Apr 12, 2020 at 11:52 PM Hyukjin Kwon wrote: > >> Okay, now it started to work. Let's see if it works well! >> >>

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Jungtaek Lim
10:01 AM Xiao Li wrote: > >> Only the low-risk or high-value bug fixes, and the documentation changes >> are allowed to merge to branch-3.0. I expect all the committers are >> following the same rules like what we did in the previous releases. >> >> Xiao >> >

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Jungtaek Lim
hesitate to test the RC1 (see how many people have been tested RC1 in this thread), as they probably need to test the same with RC2. On Thu, Apr 9, 2020 at 5:50 PM Jungtaek Lim wrote: > I went through some manually tests for the new features of Structured > Streaming in Spark 3.0.0. (Please let m

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-09 Thread Jungtaek Lim
I went through some manually tests for the new features of Structured Streaming in Spark 3.0.0. (Please let me know if there're more features we'd like to test manually.) * file source cleanup - both “archive" and “delete" work. Query fails as expected when the input directory is the output

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-02 Thread Jungtaek Lim
On Fri, Apr 3, 2020 at 12:31 AM Sean Owen wrote: > On Wed, Apr 1, 2020 at 10:28 PM Jungtaek Lim > wrote: > > The definition of "latest version" would matter, especially there's a > time we prepare minor+ version release. > > > > For example, lots of peo

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Jungtaek Lim
t; >>>> Even for bugs, we don't really need to know that a bug in master >>>> affects 2.4.5, 2.4.4, 2.4.3, ... 2.3.6, 2.3.5, etc. It doesn't hurt to >>>> at least say it affects the latest 2.4.x, 2.3.x releases, if known, >>>> because it's possible it

[DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Jungtaek Lim
do E2E manual verification which I would give up. There should have some balance/threshold, and the balance should be the thing the community has a consensus. Would like to hear everyone's voice on this. Thanks, Jungtaek Lim (HeartSaVioR)

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-03-31 Thread Jungtaek Lim
-1 (non-binding) I filed SPARK-31257 as a blocker, and now others start to agree that it's a critical issue which should be dealt before releasing Spark 3.0. Please refer recent comments in https://github.com/apache/spark/pull/28026 It won't delay the release pretty much, as we can either revert

Re: Release Manager's official `branch-3.0` Assessment?

2020-03-30 Thread Jungtaek Lim
it to "false" and deal with it. WDYT? On Tue, Mar 31, 2020 at 7:48 AM Jungtaek Lim wrote: > I'm not sure I understand the direction of resolution. I'm not saying it's > just a confusion - it's "ambiguous" and "indeterministic". > > Two syntaxes were at leas

Re: Release Manager's official `branch-3.0` Assessment?

2020-03-30 Thread Jungtaek Lim
his race, but: Would it be OK to ship 3.0 with >> some release notes and/or prominent documentation calling out this issue, >> and then fixing it in 3.0.1? >> >> On Sat, Mar 28, 2020 at 8:45 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>

Re: Release Manager's official `branch-3.0` Assessment?

2020-03-28 Thread Jungtaek Lim
020 at 11:51 AM, Sean Owen wrote: > >> I'm also curious - there no open blockers for 3.0 but I know a few are >> still floating around open to revert changes. What is the status there? >> From my field of view I'm not aware of other blocking issues. >> >>

Re: Release Manager's official `branch-3.0` Assessment?

2020-03-27 Thread Jungtaek Lim
. Thanks, Jungtaek Lim (HeartSaVioR) On Wed, Mar 25, 2020 at 1:52 PM Xiao Li wrote: > Let us try to finish the remaining major blockers in the next few days. > For example, https://issues.apache.org/jira/browse/SPARK-31085 > > +1 to cut the RC even if we still have the blockers that will fa

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-25 Thread Jungtaek Lim
describes > this and doesn't appear to be done. > > On Wed, Mar 25, 2020 at 4:03 PM Jungtaek Lim > wrote: > >> UPDATE: Sorry I just missed the PR ( >> https://github.com/apache/spark/pull/28026). I still think it'd be nice >> to avoid recycling the JIRA issue which was re

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-25 Thread Jungtaek Lim
UPDATE: Sorry I just missed the PR ( https://github.com/apache/spark/pull/28026). I still think it'd be nice to avoid recycling the JIRA issue which was resolved before. Shall we have a new JIRA issue with linking to SPARK-30098, and set proper priority? On Thu, Mar 26, 2020 at 7:59 AM Jungtaek

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-25 Thread Jungtaek Lim
Would it be better to prioritize this to make sure the change is included in Spark 3.0? (Maybe filing an issue and set as a blocker) Looks like there's consensus that SPARK-30098 brought ambiguous issue which should be fixed (though the consideration of severity seems to be different), and once

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-19 Thread Jungtaek Lim
Anything would be OK if the create table DDL provides a "clear way" to expect the table provider "before" they run the query. Great news that it doesn't require major rework - looking forward to the PR. Thanks again to jump in and sort this out. - Jungtaek Lim (HeartSaVioR)

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Jungtaek Lim
lauses are being used. Yes as I said earlier it may make end users' query to be changed, but better than uncertain. Btw, if the main purpose to add native syntax and change it by default is to discontinue supporting Hive create table rule sooner, simply dropping rule 2 with providing legacy config i

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Jungtaek Lim
don't need to spend a lot > of time understanding the subtle difference between these 2 syntaxes. > > On Wed, Mar 18, 2020 at 7:01 PM Jungtaek Lim > wrote: > >> A bit correction: the example I provided for vice versa is not really a >> correct case for vice versa. It's actua

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Jungtaek Lim
A bit correction: the example I provided for vice versa is not really a correct case for vice versa. It's actually same case (intended to use rule 2 which is not default) but different result. On Wed, Mar 18, 2020 at 7:22 PM Jungtaek Lim wrote: > My concern is that although we simply th

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Jungtaek Lim
ABLE >>> syntax instead of the Hive one. Previously these two rules are mutually >>> exclusive because the native syntax requires the USING clause while the >>> Hive syntax makes ROW FORMAT or STORED AS clause optional. >>> >>> It's a good move

Re: [DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-18 Thread Jungtaek Lim
e create? Internally the > parser rules conflict and we pick the native syntax depending on the rule > order. But the user-facing behavior looks fine. > > CREATE EXTERNAL TABLE is a problem as it works in 2.4 but not in 3.0. > Shall we simply remove EXTERNAL from the native CREATE T

[DISCUSS] Resolve ambiguous parser rule between two "create table"s

2020-03-15 Thread Jungtaek Lim
create Hive table. (Given we will also provide legacy option I'm feeling this is acceptable.) 2. Define "ROW FORMAT" or "STORED AS" as mandatory one. pros. Less invasive for existing queries. cons. Less intuitive, because they have been optional and now become mandatory to fall in

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-03-12 Thread Jungtaek Lim
gt;> >>> If you think these APIs should not be added back, let me know and we can >>> discuss the items further. In general, I think we should provide more >>> evidences and discuss them publicly when we dropping these APIs at the >>> beginning. >>> >>

Re: [Proposal] Modification to Spark's Semantic Versioning Policy

2020-03-07 Thread Jungtaek Lim
+1 for Sean as well. Moreover, as I added a voice on previous thread, if we want to be strict with retaining public API, what we really need to do along with this is having similar level or stricter of policy for adding public API. If we don't apply the policy symmetrically, problems would go

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Jungtaek Lim
ve balance on this to avoid restricting ourselves too much, but I feel there's no balance now - most things are just going through PRs without discussion. It would be ideal we have time to consider on this. On Thu, Feb 20, 2020 at 8:50 AM Jungtaek Lim wrote: > Apache Spark 2.0 was released in July 2

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Jungtaek Lim
Apache Spark 2.0 was released in July 2016. Assuming the project has been trying the best to follow the semantic versioning, it is "more than three years" to wait for the breaking changes. What the community misses to address necessary breaking changes would be going to be technical debts for

Re: Request to document the direct relationship between other configurations

2020-02-13 Thread Jungtaek Lim
we conclude this thread by deciding to document the direct > relationship between configurations preferably in one prevailing style? > > > 2020년 2월 14일 (금) 오전 11:36, Jungtaek Lim 님이 > 작성: > >> Even spark.dynamicAllocation.* doesn't follow 2-2, right? It follows the >> mix

Re: Request to document the direct relationship between other configurations

2020-02-13 Thread Jungtaek Lim
therwise, let's > simplify it to reduce the overhead rather then having a policy for the > mid-term specifically. > > > 2020년 2월 13일 (목) 오후 12:24, Jungtaek Lim 님이 > 작성: > >> I tend to agree that there should be a time to make thing be consistent >> (and I'm very happy

Re: [DISCUSS] naming policy of Spark configs

2020-02-12 Thread Jungtaek Lim
+1 Thanks for the proposal. Looks very reasonable to me. On Thu, Feb 13, 2020 at 10:53 AM Hyukjin Kwon wrote: > +1. > > 2020년 2월 13일 (목) 오전 9:30, Gengliang Wang 님이 > 작성: > >> +1, this is really helpful. We should make the SQL configurations >> consistent and more readable. >> >> On Wed, Feb 12,

Re: Request to document the direct relationship between other configurations

2020-02-12 Thread Jungtaek Lim
;>>>>> >>>>>> 2020년 2월 12일 (수) 오후 12:02, Hyukjin Kwon 님이 작성: >>>>>> >>>>>>> To do that, we should explicitly document such structured >>>>>>> configuration and implicit effect, which is currently missi

Re: Request to document the direct relationship between other configurations

2020-02-11 Thread Jungtaek Lim
les rolling. Then, they realise the log is not rolling > later after the file > size becomes bigger. > > > 2020년 2월 12일 (수) 오전 10:47, Jungtaek Lim 님이 > 작성: > >> I'm sorry if I miss something, but this is ideally better to be started >> as [DISCUSS] as I haven't seen

Re: Request to document the direct relationship between other configurations

2020-02-11 Thread Jungtaek Lim
doc. More redundant if the condition is nested. I agree this is the good step of "be kind" but less pragmatic. I'd be happy to follow the consensus we would make in this thread. Appreciate more voices. Thanks, Jungtaek Lim (HeartSaVioR) On Wed, Feb 12, 2020 at 10:36 AM Hyukjin Kwon wrot

Re: [ANNOUNCE] Announcing Apache Spark 2.4.5

2020-02-10 Thread Jungtaek Lim
Nice work, Dongjoon! Thanks for the huge efforts on sorting out with correctness things as well. On Tue, Feb 11, 2020 at 12:40 PM Wenchen Fan wrote: > Great Job, Dongjoon! > > On Mon, Feb 10, 2020 at 4:18 PM Hyukjin Kwon wrote: > >> Thanks Dongjoon! >> >> 2020년 2월 9일 (일) 오전 10:49, Takeshi

Re: [VOTE] Release Apache Spark 2.4.5 (RC1)

2020-01-15 Thread Jungtaek Lim
Once we decided to cancel the RC1, what about including SPARK-29450 ( https://github.com/apache/spark/pull/27209) into RC2? SPARK-29450 was merged into master, and Xiao figured out it fixed a regression, long lasting one (broken at 2.3.0). The link refers the PR for 2.4 branch. Thanks, Jungtaek

Re: Release Apache Spark 2.4.5

2020-01-05 Thread Jungtaek Lim
+1 to have another Spark 2.4 release, as Spark 2.4.4 was released in 4 months old and there's release window for this. On Mon, Jan 6, 2020 at 12:38 PM Hyukjin Kwon wrote: > Yeah, I think it's nice to have another maintenance release given Spark > 3.0 timeline. > > 2020년 1월 6일 (월) 오전 7:58,

Re: Patch to produce messages with null body using console producer

2019-12-27 Thread Jungtaek Lim
You seem to hit wrong mailing list - please send to Kafka dev. mailing list. On Fri, Dec 27, 2019 at 8:10 PM jelmer wrote: > Hi folks, > > A while back I opened a pull request ( > https://github.com/apache/kafka/pull/7567 ) that makes it possible to > produce messages with a null body using

Re: Spark 3.0 branch cut and code freeze on Jan 31?

2019-12-24 Thread Jungtaek Lim
to get reviewed and merged later? Happy Holiday! Thanks, Jungtaek Lim (HeartSaVioR) On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro wrote: > Looks nice, happy holiday, all! > > Bests, > Takeshi > > On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun > wrote: > >> +1

Re: [ANNOUNCE] Announcing Apache Spark 3.0.0-preview2

2019-12-24 Thread Jungtaek Lim
Great work, Yuming! Happy Holidays. On Wed, Dec 25, 2019 at 9:08 AM Dongjoon Hyun wrote: > Indeed! Thank you again, Yuming and all. > > Bests, > Dongjoon. > > > On Tue, Dec 24, 2019 at 13:38 Takeshi Yamamuro > wrote: > >> Great work, Yuming! >> >> Bests, >> Takeshi >> >> On Wed, Dec 25, 2019

Re: I would like to add JDBCDialect to support Vertica database

2019-12-11 Thread Jungtaek Lim
If I understand correctly, you'll just want to package your implementation with your preference of project manager (maven, sbt, etc.) which registers your dialect implementation into JdbcDialects, and pass the jar and let end users load the jar. That will automatically do everything and they can

Re: [DISCUSS] Add close() on DataWriter interface

2019-12-11 Thread Jungtaek Lim
845) On Thu, Dec 12, 2019 at 3:53 AM Nicholas Chammas wrote: > Is this something that would be exposed/relevant to the Python API? Or is > this just for people implementing their own Spark data source? > > On Wed, Dec 11, 2019 at 12:35 AM Jungtaek Lim < > kabhwan.opensou...@gmai

Re: [DISCUSS] Add close() on DataWriter interface

2019-12-11 Thread Jungtaek Lim
Nice, thanks for the answer! I'll craft a PR soon. Thanks again. On Thu, Dec 12, 2019 at 3:32 AM Ryan Blue wrote: > Sounds good to me, too. > > On Wed, Dec 11, 2019 at 1:18 AM Jungtaek Lim > wrote: > >> Thanks for the quick response, Wenchen! >> >> I'll leave

Re: [DISCUSS] Add close() on DataWriter interface

2019-12-11 Thread Jungtaek Lim
ame > for DataWriter. > > On Wed, Dec 11, 2019 at 1:35 PM Jungtaek Lim > wrote: > >> Hi devs, >> >> I'd like to propose to add close() on DataWriter explicitly, which is the >> place for resource cleanup. >> >> The rationalization of the proposal is

[DISCUSS] Add close() on DataWriter interface

2019-12-10 Thread Jungtaek Lim
rk 3.0, so I feel it may not matter. Would love to hear your thoughts. Thanks in advance, Jungtaek Lim (HeartSaVioR)

Re: DataSourceWriter V2 Api questions

2019-12-06 Thread Jungtaek Lim
pen questions we need to answer: > 1. How to make sure all tasks are launched at the same time to implement > 2PC? barrier execution? > 2. To reach "eventually consistent", we must retry the job until successe. > How shall we guarantee the job retry? > > On Fri, Oct 19, 2018 a

Re: Query regarding stateless aggregations

2019-11-28 Thread Jungtaek Lim
the input is broken down to multiple batches. By the definition of ground rule, streaming aggregation is required to be stateful. Thanks, Jungtaek Lim (HeartSaVioR) On Thu, Nov 28, 2019 at 9:17 PM Chitral Verma wrote: > Hi Devs, > I have a query regarding stateless aggregations.

Re: Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Jungtaek Lim
> rough idea just for possible optimization.) > > > > But again that's very rough idea, and it won't make sense if the > expected output is not acceptable as representation. > > > > -Jungtaek Lim (HeartSaVioR) > > > > > > On Wed, Nov 27, 2019 at 11:25 PM Sean Ow

Re: Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Jungtaek Lim
ed the another point of optimization here which might mitigate the issue heavily... so please treat my idea as rough idea just for possible optimization.) But again that's very rough idea, and it won't make sense if the expected output is not acceptable as representation. -Jungtaek Lim (HeartSaVioR) On Wed, N

Loose the requirement of "median" of the SQL metrics

2019-11-27 Thread Jungtaek Lim
place sorting 100 elements with sorting 10 elements 11 times. The difference would be bigger if the number of tasks is bigger. Just a rough idea so any feedbacks are appreciated. Thanks, Jungtaek Lim (HeartSaVioR)

Re: Does StreamingSymmetricHashJoinExec work with watermark? I don't think so

2019-11-14 Thread Jungtaek Lim
Jacek, would you mind if I ask for the query to reproduce? Not sure I get you without having the example of "not working". Thanks, Jungtaek Lim (HeartSaVioR) On Tue, Nov 12, 2019 at 12:04 AM Jacek Laskowski wrote: > Hi, > > I think watermark does not work for StreamingSy

Re: [Structured Streaming] Robust watermarking calculation with future timestamps

2019-11-14 Thread Jungtaek Lim
requires aggregated calculation. I guess Spark community may consider adding the feature if the community sees more requests on this. Thanks, Jungtaek Lim (HeartSaVioR) On Wed, Nov 13, 2019 at 6:58 PM Anastasios Zouzias wrote: > Hi all, > > We currently have the following issue wit

Re: ASF board report for November 2019

2019-11-11 Thread Jungtaek Lim
nit: - The latest committer was added on Sept 4th, 2019 (Dongjoon Hyun). <= s/committer/PMC member Thanks, Jungtaek Lim (HeartSaVioR) On Tue, Nov 12, 2019 at 11:38 AM Matei Zaharia wrote: > Hi all, > > It’s time to send our quarterly report to the ASF board. Here is my draft >

Re: [SS][2.4.4] Confused with "WatermarkTracker: Event time watermark didn't move"?

2019-10-13 Thread Jungtaek Lim
batch " so if I'm not missing, it should be logged between updating event-time watermark and watermark didn't move. You can attach streaming query listener and get more information about batches. Thanks, Jungtaek Lim (HeartSaVioR) On Tue, Oct 8, 2019 at 6:12 PM Jacek Laskowski wrote: &

Re: [SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

2019-10-13 Thread Jungtaek Lim
with it. I'll file and submit a patch. Btw, there's a metric bug with empty batch as well - see SPARK-29314 [1] which I've submitted a patch recently. Thanks, Jungtaek Lim (HeartSaVioR) 1. http://issues.apache.org/jira/browse/SPARK-29314 On Sun, Oct 13, 2019 at 1:12 AM Jacek Laskowski wrote: >

Re: [SS] number of output rows metric for streaming aggregation (StateStoreSaveExec) in Append output mode not measured?

2019-10-13 Thread Jungtaek Lim
Filed SPARK-29450 [1] and raised a patch [2]. Please let me know if you would like to be assigned as a reporter of SPARK-29450. 1. https://issues.apache.org/jira/browse/SPARK-29450 2. https://github.com/apache/spark/pull/26104 On Sun, Oct 13, 2019 at 4:06 PM Jungtaek Lim wrote: > Tha

Re: Spark 3.0 preview release feature list and major changes

2019-10-07 Thread Jungtaek Lim
n progress, there's ongoing umbrella issue regarding rolling event log & snapshot (SPARK-28594 <https://issues.apache.org/jira/browse/SPARK-28594>) which we struggle to get things done in Spark 3.0. Thanks, Jungtaek Lim (HeartSaVioR) On Tue, Oct 8, 2019 at 7:02 AM Xingbo Jiang wrote: &

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-07 Thread Jungtaek Lim
anges anymore. > > > On Sat, Oct 5, 2019 at 12:24 PM Jungtaek Lim > wrote: > >> I remembered the actual case from developer who implements custom data >> source. >> >> >> https://lists.apache.org/thread.html/c1a210510b48bb1fea89828c8e2f5db8c27eba635e007

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-04 Thread Jungtaek Lim
Laskowski > The Internals of Spark SQL https://bit.ly/spark-sql-internals > The Internals of Spark Structured Streaming > https://bit.ly/spark-structured-streaming > The Internals of Apache Kafka https://bit.ly/apache-kafka-internals > Follow me at https://twitter.com/jaceklaskowski &g

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-02 Thread Jungtaek Lim
according to the affected version of SPARK-26283 (2.4.0 is also there). On Wed, Oct 2, 2019 at 11:47 PM Dongjoon Hyun wrote: > Thank you for the investigation and making a fix. > > So, both issues are on only master (3.0.0) branch? > > Bests, > Dongjoon. > > > On Wed, Oct 2

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-02 Thread Jungtaek Lim
FYI: patch submitted - https://github.com/apache/spark/pull/25996 On Wed, Oct 2, 2019 at 3:25 PM Jungtaek Lim wrote: > I need to do full manual test to make sure, but according to experiment > (small UT) "closeFrameOnFlush" seems to work. > > There was relevant change

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-02 Thread Jungtaek Lim
s to read open frame. With "closeFrameOnFlush" being false for ZstdOutputStream, frame is never closed (even flushing output stream) unless output stream is closed. I'll raise a patch once manual test is passed. Sorry for the false alarm. Thanks, Jungtaek Lim (HeartSaVioR) 1. https://issues.apache.org/j

Re: [DISCUSS] Preferred approach on dealing with SPARK-29322

2019-10-01 Thread Jungtaek Lim
rote: > >> Makes more sense to drop support for zstd assuming the fix is not >> something at spark end (configuration, etc). >> Does not make sense to try to detect deadlock in codec. >> >> Regards, >> Mridul >> >> On Tue, Oct 1, 2019 at 8:39 PM Jungtaek Lim >>

[SS] Possible inconsistent semantics on metric "updated" between stateful operators

2019-10-01 Thread Jungtaek Lim
his to all stateful operators (both removal and eviction) Would like to hear voices on this. Thanks in advance, Jungtaek Lim (HeartSaVioR) * JIRA issue: https://issues.apache.org/jira/browse/SPARK-29312

Re: [SS] How to create a streaming DataFrame (for a custom Source in Spark 2.4.4 / MicroBatch / DSv1)?

2019-10-01 Thread Jungtaek Lim
while transition from old DSv2 to new DSv2 happens and new DSv2 gets stabilized. Would we like to provide necessary changes on DSv1? Thanks, Jungtaek Lim (HeartSaVioR) On Wed, Oct 2, 2019 at 4:27 AM Jacek Laskowski wrote: > Hi, > > I think I've got stuck and without your help I won't

<    1   2   3   4   >