Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-28 Thread Sean Owen
+1, with comments:

There are 5 critical issues for 2.4, and no blockers:
SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4
SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs
SPARK-25319 Spark MLlib, GraphX 2.4 QA umbrella
SPARK-25326 ML, Graph 2.4 QA: Programming guide update and migration guide
SPARK-25323 ML 2.4 QA: API: Python API coverage

Xiangrui, is SPARK-25378 important enough we need to get it into 2.4?

I found two issues resolved for 2.4.1 that got into this RC, so marked
them as resolved in 2.4.0.

I checked the licenses and notice and they look correct now in source
and binary builds.

The 2.12 artifacts are as I'd expect.

I ran all tests for 2.11 and 2.12 and they pass with -Pyarn
-Pkubernetes -Pmesos -Phive -Phadoop-2.7 -Pscala-2.12.




On Thu, Sep 27, 2018 at 10:00 PM Wenchen Fan  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.0.
>
> The vote is open until October 1 PST and passes if a majority +1 PMC votes 
> are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc2 (commit 
> 42f25f309e91c8cde1814e3720099ac1e64783da):
> https://github.com/apache/spark/tree/v2.4.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1287
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/2.4.0
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.0?
> ===
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[DISCUSS] Syntax for table DDL

2018-09-28 Thread Ryan Blue
Hi everyone,

I’m currently working on new table DDL statements for v2 tables. For
context, the new logical plans for DataSourceV2 require a catalog interface
so that Spark can create tables for operations like CTAS. The proposed
TableCatalog API also includes an API for altering those tables so we can
make ALTER TABLE statements work. I’m implementing those DDL statements,
which will make it into upstream Spark when the TableCatalog PR is merged.

Since I’m adding new SQL statements that don’t yet exist in Spark, I want
to make sure that the syntax I’m using in our branch will match the syntax
we add to Spark later. I’m basing this proposed syntax on PostgreSQL
.

   - *Update data type*: ALTER TABLE tableIdentifier ALTER COLUMN
   qualifiedName TYPE dataType.
   - *Rename column*: ALTER TABLE tableIdentifier RENAME COLUMN
   qualifiedName TO qualifiedName
   - *Drop column*: ALTER TABLE tableIdentifier DROP (COLUMN | COLUMNS)
   qualifiedNameList

A few notes:

   - Using qualifiedName in these rules allows updating nested types, like
   point.x.
   - Updates and renames can only alter one column, but drop can drop a
   list.
   - Rename can’t move types and will validate that if the TO name is
   qualified, that the prefix matches the original field.
   - I’m also changing ADD COLUMN to support adding fields to nested
   columns by using qualifiedName instead of identifier.

Please reply to this thread if you have suggestions based on a different
SQL engine or want this syntax to be different for another reason. Thanks!

rb
-- 
Ryan Blue
Software Engineer
Netflix


On Scala 2.12.7

2018-09-28 Thread Sean Owen
I'm forking the discussion about Scala 2.12.7 from the 2.4.0 RC vote thread.

2.12.7 was released yesterday, and, is even labeled as fixing Spark
2.4 compatibility! https://www.scala-lang.org/news/2.12.7 We should
look into it, yes.

Darcy identified, and they fixed, this issue:
https://github.com/scala/scala/pull/7156 while finishing the work for
Scala 2.12.

However we already worked around this in Spark, no? at
https://github.com/apache/spark/commit/f29c2b5287563c0d6f55f936bd5a75707d7b2b1f

So we should go ahead and update to use 2.12.7, yes, and undo this workaround?
But this doesn't necessarily block a 2.4.0 release, if it's already
worked around.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] SPIP: Native support of session window

2018-09-28 Thread Jungtaek Lim
Btw, just wrote up detailed design doc on existing patch:
https://docs.google.com/document/d/1tUO29BDXb9127RiivUS7Hv324dC0YHuokYvyQRpurDY/edit?usp=sharing

This doc is a wall of text, since I guess we already imagine how session
window works (and I showed a simple example in SPIP doc), so try to avoid
drawing something which would take non-trivial efforts. New classes are
linked to the actual source code so that we can read the code directly
whenever curious/wonders about something.

Please let me know anytime if something is unclear and need elaboration.

-Jungtaek Lim (HeartSaVioR)

2018년 9월 28일 (금) 오후 10:18, Jungtaek Lim 님이 작성:

> Thanks for sharing your proposal as well as implementation. Looks like
> your proposal is more like focused to design details: I may be better to
> write one more for design details and share it as well. Stay tuned!
>
> Btw, I'm trying out your patch to see whether it passes the tests I've
> added, and looks like it fails on below UT:
>
> https://github.com/apache/spark/blob/ad0b7466ef3f79354a99bd1b95c23e4c308502d5/sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala#L475-L573
> Could you take a look at UT and see whether I'm missing here or the UT is
> correct?
>
> (Actually most of UTs I've added fail but some UTs are for update mode,
> and the patch doesn't provide same experience with select only session
> window, so I'm pointing only one UT which is testing basic session window.)
>
> -Jungtaek Lim (HeartSaVioR)
>
> 2018년 9월 28일 (금) 오후 9:22, Yuanjian Li 님이 작성:
>
>> Hi Jungtaek:
>>
>>We also meet this problem during migration of streaming application to
>> Structure Streaming in Baidu practice, we solved this in our folk and just
>> steady running in product.
>>As the initial plan we are doing the code clean work and preparing
>> give a SPIP in Oct, happy to see your proposal. Hope we share some spots
>> together.
>>Here’s the PR and doc:
>> https://github.com/apache/spark/pull/22583
>>
>> https://docs.google.com/document/d/1zeAc7QKSO7J4-Yk06kc76kvldl-QHLCDJuu04d7k2bg/edit?usp=sharing
>>
>> Thanks,
>> Yuanjian Li
>>
>>
>> 在 2018年9月28日,06:22,Jungtaek Lim  写道:
>>
>> Hi all,
>>
>> I would like to initiate discussion thread to discuss "Native support of
>> session window".
>> Origin issue is filed to SPARK-10816 [1] but I can file another one for
>> representing SPIP if necessary. WIP but working PR is available as well, so
>> we can even test it directly or see the difference if some of us feel more
>> convenient to go through the source code instead of doc.
>>
>> I've attached PDF version of SPIP in SPARK-10816, but adding Google Docs
>> link [2] for who feel convenient to comment in doc.
>>
>> Please let me know if we would like to see also technical design for
>> this. I avoid to go too deep on SPIP doc so anyone could review and see the
>> benefit of adopting this.
>>
>> Looking forward to hear your feedback.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 1. https://issues.apache.org/jira/browse/SPARK-10816
>> 2.
>> https://docs.google.com/document/d/1_rMLmUSyGzb62RnP2A3WX6D6uRxox8Q_7WcoI_HrTw4/edit?usp=sharing
>> 3. https://github.com/apache/spark/pull/22482
>>
>>
>>
>>


Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-28 Thread Sean Owen
Go ahead and file a JIRA to update to 2.12.7 with these details. We'll
assess whether it is a blocker.

On Fri, Sep 28, 2018 at 12:09 PM Darcy Shen  wrote:
>
> I agree it is a non-important Spark bug. I mean the Option and String 
> comparison. The bug is easy to fix and obvious to confirm. If the desc of PR 
> may not be accurate,feel free to edit the title or content. I am on a 
> vocation from 9.29   :)
>
> But the scala bug of WrappedArray is severe. We should not provide a 
> pre-built spark packages with Scala 2.12.6 . The bug is not about compiler. 
> But about scala-library.
>
> If the prebuilt packages of Spark use scala-library 2.12.6 , the bug exists 
> whatever scala version our application developer use.
>
> For Spark we should be serious about the minor Scala version. A preferred 
> Scala minor version should be officially stated.
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-28 Thread Darcy Shen




I agree it is a non-important Spark bug. I mean the Option and 
String comparison. The bug is easy to fix and obvious to confirm. If the desc 
of PR may not be accurate,feel free to edit the title or content. I am on a 
vocation from 9.29   :)But the scala bug of WrappedArray is severe. We should 
not provide a pre-built spark packages with Scala 2.12.6 . The bug is not about 
compiler. But about scala-library.If the prebuilt packages of Spark use 
scala-library 2.12.6 , the bug exists whatever scala version our application 
developer use.For Spark we should be serious about the minor Scala version. A 
preferred Scala minor version should be officially stated. (hi wenchen sorry 
for the duplication of email ,just forgot to cc the list) On Fri, 28 Sep 
2018 22:38:05 +0800  Wenchen Fan wrote I don't think 
this bug is so serious to fail an RC, it's only about metrics IIUC, and it's 
not a regression in 2.4.I agree we should backport this fix to 2.3 and 2.4, and 
we should update our scala 2.12 jenkins build to use scala 2.12.7. cc Shane do 
you know how to change it?BTW end users can still use scala 2.12.7 with the 
Spark package built with scala 2.12.6, right?On Fri, Sep 28, 2018 at 4:22 PM 
Darcy Shen  wrote:-1 
see:https://github.com/apache/spark/pull/22577We should make sure that Spark 
works with Scala 2.12.7 .https://github.com/scala/bug/issues/11123This resolved 
bug of Scala 2.12.6 is severe and related to correctness.We should warn our 
aggressive users about the Scala version. Latest Scala (2.12.7) is preferred 
and should pass the unit tests at least. On Fri, 28 Sep 2018 10:59:41 +0800 
Wenchen Fan  wrote Please vote on releasing the 
following candidate as Apache Spark version 2.4.0.The vote is open until 
October 1 PST and passes if a majority +1 PMC votes are cast, witha minimum of 
3 +1 votes.[ ] +1 Release this package as Apache Spark 2.4.0[ ] -1 Do not 
release this package because ...To learn more about Apache Spark, please see 
http://spark.apache.org/The tag to be voted on is v2.4.0-rc2 (commit 
42f25f309e91c8cde1814e3720099ac1e64783da):https://github.com/apache/spark/tree/v2.4.0-rc2The
 release files, including signatures, digests, etc. can be found 
at:https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-bin/Signatures used 
for Spark RCs can be found in this 
file:https://dist.apache.org/repos/dist/dev/spark/KEYSThe staging repository 
for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1287The 
documentation corresponding to this release can be found 
at:https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-docs/The list of bug 
fixes going into 2.4.0 can be found at the following 
URL:https://issues.apache.org/jira/projects/SPARK/versions/2.4.0FAQ=How
 can I help test this release?=If you are a Spark user, 
you can help us test this release by takingan existing Spark workload and 
running on this release candidate, thenreporting any regressions.If you're 
working in PySpark you can set up a virtual env and installthe current RC and 
see if anything important breaks, in the Java/Scalayou can add the staging 
repository to your projects resolvers and testwith the RC (make sure to clean 
up the artifact cache before/after soyou don't end up building with a out of 
date RC going forward).===What should 
happen to JIRA tickets still targeting 
2.4.0?===The current list of open 
tickets targeted at 2.4.0 can be found 
at:https://issues.apache.org/jira/projects/SPARK and search for "Target 
Version/s" = 2.4.0Committers should look at those and triage. Extremely 
important bugfixes, documentation, and API tweaks that impact compatibility 
shouldbe worked on immediately. Everything else please retarget to 
anappropriate release.==But my bug isn't 
fixed?==In order to make timely releases, we will typically not 
hold therelease unless the bug in question is a regression from the 
previousrelease. That being said, if there is something which is a 
regressionthat has not been correctly targeted please ping me or a committer 
tohelp target the issue. 








Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-28 Thread Dongjoon Hyun
Hi, Wenchen.

The current issue link seems to be out of order for me.

The list of bug fixes going into 2.4.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/2.4.0

Could you send out with the following issue link for next RCs?

https://issues.apache.org/jira/projects/SPARK/versions/12342385

Bests,
Dongjoon.

On Thu, Sep 27, 2018 at 8:00 PM Wenchen Fan  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.4.0.
>
> The vote is open until October 1 PST and passes if a majority +1 PMC votes
> are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc2 (commit
> 42f25f309e91c8cde1814e3720099ac1e64783da):
> https://github.com/apache/spark/tree/v2.4.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1287
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/2.4.0
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.0?
> ===
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-28 Thread Sean Owen
I don't even know how or if this manifests as a bug. The code is
indeed incorrect and the 2.12 compiler flags it. We fixed a number of
these in SPARK-25398. While I want to get this into 2.4 if we have
another RC, I don't see evidence this is a blocker. It is not specific
to Scala 2.12.

Using Scala 2.12.7 is a not an infra change, but change to the build,
but again it's not even specific to 2.12.7. We should use the latest
if we can though.

On Fri, Sep 28, 2018 at 9:38 AM Wenchen Fan  wrote:
>
> I don't think this bug is so serious to fail an RC, it's only about metrics 
> IIUC, and it's not a regression in 2.4.
>
> I agree we should backport this fix to 2.3 and 2.4, and we should update our 
> scala 2.12 jenkins build to use scala 2.12.7. cc Shane do you know how to 
> change it?
>
> BTW end users can still use scala 2.12.7 with the Spark package built with 
> scala 2.12.6, right?
>
> On Fri, Sep 28, 2018 at 4:22 PM Darcy Shen  wrote:
>>
>> -1
>>
>> see:
>>
>> https://github.com/apache/spark/pull/22577
>>
>> We should make sure that Spark works with Scala 2.12.7 .
>>
>> https://github.com/scala/bug/issues/11123
>>
>> This resolved bug of Scala 2.12.6 is severe and related to correctness.
>>
>> We should warn our aggressive users about the Scala version.
>> Latest Scala (2.12.7) is preferred and should pass the unit tests at least.
>>
>>
>>  On Fri, 28 Sep 2018 10:59:41 +0800 Wenchen Fan  
>> wrote 
>>
>> Please vote on releasing the following candidate as Apache Spark version 
>> 2.4.0.
>>
>> The vote is open until October 1 PST and passes if a majority +1 PMC votes 
>> are cast, with
>> a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.4.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.4.0-rc2 (commit 
>> 42f25f309e91c8cde1814e3720099ac1e64783da):
>> https://github.com/apache/spark/tree/v2.4.0-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1287
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-docs/
>>
>> The list of bug fixes going into 2.4.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/2.4.0
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.4.0?
>> ===
>>
>> The current list of open tickets targeted at 2.4.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> Version/s" = 2.4.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-28 Thread Wenchen Fan
I don't think this bug is so serious to fail an RC, it's only about metrics
IIUC, and it's not a regression in 2.4.

I agree we should backport this fix to 2.3 and 2.4, and we should update
our scala 2.12 jenkins build to use scala 2.12.7. cc Shane do you know how
to change it?

BTW end users can still use scala 2.12.7 with the Spark package built with
scala 2.12.6, right?

On Fri, Sep 28, 2018 at 4:22 PM Darcy Shen  wrote:

> -1
>
> see:
>
> *https://github.com/apache/spark/pull/22577
> *
>
> We should make sure that Spark works with Scala 2.12.7 .
>
> https://github.com/scala/bug/issues/11123
>
> This resolved bug of Scala 2.12.6 is severe and related to correctness.
>
> We should warn our aggressive users about the Scala version.
> Latest Scala (2.12.7) is preferred and should pass the unit tests at least.
>
>
>  On Fri, 28 Sep 2018 10:59:41 +0800 *Wenchen Fan  >* wrote 
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.4.0.
>
> The vote is open until October 1 PST and passes if a majority +1 PMC votes
> are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc2 (commit
> 42f25f309e91c8cde1814e3720099ac1e64783da):
> https://github.com/apache/spark/tree/v2.4.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1287
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/2.4.0
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.0?
> ===
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
>
>


I want read text cluomn from kafka topic pyspark

2018-09-28 Thread hagersaleh


I write code to read data from twitter and send data to kafka topic 
and I write anther code to read data from kafka topic 
I want to return just text column from data 



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] SPIP: Native support of session window

2018-09-28 Thread Jungtaek Lim
Thanks for sharing your proposal as well as implementation. Looks like your
proposal is more like focused to design details: I may be better to write
one more for design details and share it as well. Stay tuned!

Btw, I'm trying out your patch to see whether it passes the tests I've
added, and looks like it fails on below UT:
https://github.com/apache/spark/blob/ad0b7466ef3f79354a99bd1b95c23e4c308502d5/sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala#L475-L573
Could you take a look at UT and see whether I'm missing here or the UT is
correct?

(Actually most of UTs I've added fail but some UTs are for update mode, and
the patch doesn't provide same experience with select only session window,
so I'm pointing only one UT which is testing basic session window.)

-Jungtaek Lim (HeartSaVioR)

2018년 9월 28일 (금) 오후 9:22, Yuanjian Li 님이 작성:

> Hi Jungtaek:
>
>We also meet this problem during migration of streaming application to
> Structure Streaming in Baidu practice, we solved this in our folk and just
> steady running in product.
>As the initial plan we are doing the code clean work and preparing give
> a SPIP in Oct, happy to see your proposal. Hope we share some spots
> together.
>Here’s the PR and doc:
> https://github.com/apache/spark/pull/22583
>
> https://docs.google.com/document/d/1zeAc7QKSO7J4-Yk06kc76kvldl-QHLCDJuu04d7k2bg/edit?usp=sharing
>
> Thanks,
> Yuanjian Li
>
>
> 在 2018年9月28日,06:22,Jungtaek Lim  写道:
>
> Hi all,
>
> I would like to initiate discussion thread to discuss "Native support of
> session window".
> Origin issue is filed to SPARK-10816 [1] but I can file another one for
> representing SPIP if necessary. WIP but working PR is available as well, so
> we can even test it directly or see the difference if some of us feel more
> convenient to go through the source code instead of doc.
>
> I've attached PDF version of SPIP in SPARK-10816, but adding Google Docs
> link [2] for who feel convenient to comment in doc.
>
> Please let me know if we would like to see also technical design for this.
> I avoid to go too deep on SPIP doc so anyone could review and see the
> benefit of adopting this.
>
> Looking forward to hear your feedback.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 1. https://issues.apache.org/jira/browse/SPARK-10816
> 2.
> https://docs.google.com/document/d/1_rMLmUSyGzb62RnP2A3WX6D6uRxox8Q_7WcoI_HrTw4/edit?usp=sharing
> 3. https://github.com/apache/spark/pull/22482
>
>
>
>


Re: [DISCUSS] SPIP: Native support of session window

2018-09-28 Thread Yuanjian Li
Hi Jungtaek:

   We also meet this problem during migration of streaming application to 
Structure Streaming in Baidu practice, we solved this in our folk and just 
steady running in product.
   As the initial plan we are doing the code clean work and preparing give a 
SPIP in Oct, happy to see your proposal. Hope we share some spots together.
   Here’s the PR and doc: 
https://github.com/apache/spark/pull/22583 


https://docs.google.com/document/d/1zeAc7QKSO7J4-Yk06kc76kvldl-QHLCDJuu04d7k2bg/edit?usp=sharing
 


Thanks,
Yuanjian Li


> 在 2018年9月28日,06:22,Jungtaek Lim  > 写道:
> 
> Hi all,
> 
> I would like to initiate discussion thread to discuss "Native support of 
> session window".
> Origin issue is filed to SPARK-10816 [1] but I can file another one for 
> representing SPIP if necessary. WIP but working PR is available as well, so 
> we can even test it directly or see the difference if some of us feel more 
> convenient to go through the source code instead of doc.
> 
> I've attached PDF version of SPIP in SPARK-10816, but adding Google Docs link 
> [2] for who feel convenient to comment in doc. 
> 
> Please let me know if we would like to see also technical design for this. I 
> avoid to go too deep on SPIP doc so anyone could review and see the benefit 
> of adopting this.
> 
> Looking forward to hear your feedback. 
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 1. https://issues.apache.org/jira/browse/SPARK-10816 
> 
> 2. 
> https://docs.google.com/document/d/1_rMLmUSyGzb62RnP2A3WX6D6uRxox8Q_7WcoI_HrTw4/edit?usp=sharing
>  
> 
> 3. https://github.com/apache/spark/pull/22482 
> 
> 
>