Re: SPIP: Support Kafka delegation token in Structured Streaming

2018-09-29 Thread Saisai Shao
I like this proposal. Since Kafka already provides delegation token
mechanism, we can also leverage Spark's delegation token framework to add
Kafka as a built-in support.

BTW I think there's no much difference in support structured streaming and
DStream, maybe we can set both as goal.

Thanks
Saisai

Gabor Somogyi  于2018年9月27日周四 下午7:58写道:

> Hi all,
>
> I am writing this e-mail in order to discuss the delegation token support
> for kafka feature which is reported in SPARK-25501
> . I've prepared a SPIP
> 
>  for
> it. PR is on the way...
>
> Looking forward to hear your feedback.
>
> BR,
> G
>
>


Re: Python friendly API for Spark 3.0

2018-09-29 Thread Stavros Kontopoulos
Regarding Python 3.x upgrade referenced earlier. Some people already gone
down that path of upgrading:

https://blogs.dropbox.com/tech/2018/09/how-we-rolled-out-one-of-the-largest-python-3-migrations-ever

They describe some good reasons.

Stavros

On Tue, Sep 18, 2018 at 6:35 PM, Erik Erlandson  wrote:

> I like the notion of empowering cross platform bindings.
>
> The trend of computing frameworks seems to be that all APIs gradually
> converge on a stable attractor which could be described as "data frames and
> SQL"  Spark's early API design was RDD focused, but these days the center
> of gravity is all about DataFrame (Python's prevalence combined with its
> lack of a static type system substantially dilutes the benefits of DataSet,
> for any library development that aspires to both JVM and python support).
>
> I can imagine optimizing the developer layers of Spark APIs so that cross
> platform support and also 3rd-party support for new and existing Spark
> bindings would be maximized for "parallelizable dataframe+SQL"  Another of
> Spark's strengths is it's ability to federate heterogeneous data sources,
> and making cross platform bindings easy for that is desirable.
>
>
> On Sun, Sep 16, 2018 at 1:02 PM, Mark Hamstra 
> wrote:
>
>> It's not splitting hairs, Erik. It's actually very close to something
>> that I think deserves some discussion (perhaps on a separate thread.) What
>> I've been thinking about also concerns API "friendliness" or style. The
>> original RDD API was very intentionally modeled on the Scala parallel
>> collections API. That made it quite friendly for some Scala programmers,
>> but not as much so for users of the other language APIs when they
>> eventually came about. Similarly, the Dataframe API drew a lot from pandas
>> and R, so it is relatively friendly for those used to those abstractions.
>> Of course, the Spark SQL API is modeled closely on HiveQL and standard SQL.
>> The new barrier scheduling draws inspiration from MPI. With all of these
>> models and sources of inspiration, as well as multiple language targets,
>> there isn't really a strong sense of coherence across Spark -- I mean, even
>> though one of the key advantages of Spark is the ability to do within a
>> single framework things that would otherwise require multiple frameworks,
>> actually doing that is requiring more than one programming style or
>> multiple design abstractions more than what is strictly necessary even when
>> writing Spark code in just a single language.
>>
>> For me, that raises questions over whether we want to start designing,
>> implementing and supporting APIs that are designed to be more consistent,
>> friendly and idiomatic to particular languages and abstractions -- e.g. an
>> API covering all of Spark that is designed to look and feel as much like
>> "normal" code for a Python programmer, another that looks and feels more
>> like "normal" Java code, another for Scala, etc. That's a lot more work and
>> support burden than the current approach where sometimes it feels like you
>> are writing "normal" code for your prefered programming environment, and
>> sometimes it feels like you are trying to interface with something foreign,
>> but underneath it hopefully isn't too hard for those writing the
>> implementation code below the APIs, and it is not too hard to maintain
>> multiple language bindings that are each fairly lightweight.
>>
>> It's a cost-benefit judgement, of course, whether APIs that are heavier
>> (in terms of implementing and maintaining) and friendlier (for end users)
>> are worth doing, and maybe some of these "friendlier" APIs can be done
>> outside of Spark itself (imo, Frameless is doing a very nice job for the
>> parts of Spark that it is currently covering --
>> https://github.com/typelevel/frameless); but what we have currently is a
>> bit too ad hoc and fragmentary for my taste.
>>
>> On Sat, Sep 15, 2018 at 10:33 AM Erik Erlandson 
>> wrote:
>>
>>> I am probably splitting hairs to finely, but I was considering the
>>> difference between improvements to the jvm-side (py4j and the scala/java
>>> code) that would make it easier to write the python layer ("python-friendly
>>> api"), and actual improvements to the python layers ("friendly python api").
>>>
>>> They're not mutually exclusive of course, and both worth working on. But
>>> it's *possible* to improve either without the other.
>>>
>>> Stub files look like a great solution for type annotations, maybe even
>>> if only python 3 is supported.
>>>
>>> I definitely agree that any decision to drop python 2 should not be
>>> taken lightly. Anecdotally, I'm seeing an increase in python developers
>>> announcing that they are dropping support for python 2 (and loving it). As
>>> people have already pointed out, if we don't drop python 2 for spark 3.0,
>>> we're stuck with it until 4.0, which would place spark in a
>>> possibly-awkward position of supporting python 2 for some time after it
>>> goes EOL.
>>>
>>> 

saveAsTable in 2.3.2 throws IOException while 2.3.1 works fine?

2018-09-29 Thread Jacek Laskowski
Hi,

The following query fails in 2.3.2:

scala> spark.range(10).write.saveAsTable("t1")
...
2018-09-29 20:48:06 ERROR FileOutputCommitter:314 - Mkdirs failed to create
file:/user/hive/warehouse/bucketed/_temporary/0
2018-09-29 20:48:07 ERROR Utils:91 - Aborting task
java.io.IOException: Mkdirs failed to create
file:/user/hive/warehouse/bucketed/_temporary/0/_temporary/attempt_20180929204807__m_03_0
(exists=false, cwd=file:/Users/jacek/dev/apps/spark-2.3.2-bin-hadoop2.7)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)

While it works fine in 2.3.1.

Could anybody explain the change in behaviour in 2.3.2? The commit / the
JIRA issue would be even nicer. Thanks.

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


Re: [VOTE] SPARK 2.4.0 (RC2)

2018-09-29 Thread Stavros Kontopoulos
+1

Stavros

On Sat, Sep 29, 2018 at 5:59 AM, Sean Owen  wrote:

> +1, with comments:
>
> There are 5 critical issues for 2.4, and no blockers:
> SPARK-25378 ArrayData.toArray(StringType) assume UTF8String in 2.4
> SPARK-25325 ML, Graph 2.4 QA: Update user guide for new features & APIs
> SPARK-25319 Spark MLlib, GraphX 2.4 QA umbrella
> SPARK-25326 ML, Graph 2.4 QA: Programming guide update and migration guide
> SPARK-25323 ML 2.4 QA: API: Python API coverage
>
> Xiangrui, is SPARK-25378 important enough we need to get it into 2.4?
>
> I found two issues resolved for 2.4.1 that got into this RC, so marked
> them as resolved in 2.4.0.
>
> I checked the licenses and notice and they look correct now in source
> and binary builds.
>
> The 2.12 artifacts are as I'd expect.
>
> I ran all tests for 2.11 and 2.12 and they pass with -Pyarn
> -Pkubernetes -Pmesos -Phive -Phadoop-2.7 -Pscala-2.12.
>
>
>
>
> On Thu, Sep 27, 2018 at 10:00 PM Wenchen Fan  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 2.4.0.
> >
> > The vote is open until October 1 PST and passes if a majority +1 PMC
> votes are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 2.4.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v2.4.0-rc2 (commit
> 42f25f309e91c8cde1814e3720099ac1e64783da):
> > https://github.com/apache/spark/tree/v2.4.0-rc2
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1287
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc2-docs/
> >
> > The list of bug fixes going into 2.4.0 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.0
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 2.4.0?
> > ===
> >
> > The current list of open tickets targeted at 2.4.0 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> >
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: SPIP: Support Kafka delegation token in Structured Streaming

2018-09-29 Thread Jungtaek Lim
Hi Gabor,

Thanks for proposing the feature. I'm definitely interested to see this
feature, but honestly I'm not familiar with how Spark deals with delegation
token for HDFS and HBase. I'll try to review the doc in general, and try to
learn it, and review again based on understanding.

Thanks,
Jungtaek Lim (HeartSaVioR)

2018년 9월 27일 (목) 오후 8:58, Gabor Somogyi 님이 작성:

> Hi all,
>
> I am writing this e-mail in order to discuss the delegation token support
> for kafka feature which is reported in SPARK-25501
> . I've prepared a SPIP
> 
>  for
> it. PR is on the way...
>
> Looking forward to hear your feedback.
>
> BR,
> G
>
>


Re: time for Apache Spark 3.0?

2018-09-29 Thread Xiao Li
Yes. We should create a SPIP for each major breaking change.

Reynold Xin  于2018年9月28日周五 下午11:05写道:

> i think we should create spips for some of them, since they are pretty
> large ... i can create some tickets to start with
>
> --
> excuse the brevity and lower case due to wrist injury
>
>
> On Fri, Sep 28, 2018 at 11:01 PM Xiao Li  wrote:
>
>> Based on the above discussions, we have a "rough consensus" that the next
>> release will be 3.0. Now, we can start working on the API breaking changes
>> (e.g., the ones mentioned in the original email from Reynold).
>>
>> Cheers,
>>
>> Xiao
>>
>> Matei Zaharia  于2018年9月6日周四 下午2:21写道:
>>
>>> Yes, you can start with Unstable and move to Evolving and Stable when
>>> needed. We’ve definitely had experimental features that changed across
>>> maintenance releases when they were well-isolated. If your change risks
>>> breaking stuff in stable components of Spark though, then it probably won’t
>>> be suitable for that.
>>>
>>> > On Sep 6, 2018, at 1:49 PM, Ryan Blue 
>>> wrote:
>>> >
>>> > I meant flexibility beyond the point releases. I think what Reynold
>>> was suggesting was getting v2 code out more often than the point releases
>>> every 6 months. An Evolving API can change in point releases, but maybe we
>>> should move v2 to Unstable so it can change more often? I don't really see
>>> another way to get changes out more often.
>>> >
>>> > On Thu, Sep 6, 2018 at 11:07 AM Mark Hamstra 
>>> wrote:
>>> > Yes, that is why we have these annotations in the code and the
>>> corresponding labels appearing in the API documentation:
>>> https://github.com/apache/spark/blob/master/common/tags/src/main/java/org/apache/spark/annotation/InterfaceStability.java
>>> >
>>> > As long as it is properly annotated, we can change or even eliminate
>>> an API method before the next major release. And frankly, we shouldn't be
>>> contemplating bringing in the DS v2 API (and, I'd argue, any new API)
>>> without such an annotation. There is just too much risk of not getting
>>> everything right before we see the results of the new API being more widely
>>> used, and too much cost in maintaining until the next major release
>>> something that we come to regret for us to create new API in a fully frozen
>>> state.
>>> >
>>> >
>>> > On Thu, Sep 6, 2018 at 9:49 AM Ryan Blue 
>>> wrote:
>>> > It would be great to get more features out incrementally. For
>>> experimental features, do we have more relaxed constraints?
>>> >
>>> > On Thu, Sep 6, 2018 at 9:47 AM Reynold Xin 
>>> wrote:
>>> > +1 on 3.0
>>> >
>>> > Dsv2 stable can still evolve in across major releases. DataFrame,
>>> Dataset, dsv1 and a lot of other major features all were developed
>>> throughout the 1.x and 2.x lines.
>>> >
>>> > I do want to explore ways for us to get dsv2 incremental changes out
>>> there more frequently, to get feedback. Maybe that means we apply additive
>>> changes to 2.4.x; maybe that means making another 2.5 release sooner. I
>>> will start a separate thread about it.
>>> >
>>> >
>>> >
>>> > On Thu, Sep 6, 2018 at 9:31 AM Sean Owen  wrote:
>>> > I think this doesn't necessarily mean 3.0 is coming soon (thoughts on
>>> timing? 6 months?) but simply next. Do you mean you'd prefer that change to
>>> happen before 3.x? if it's a significant change, seems reasonable for a
>>> major version bump rather than minor. Is the concern that tying it to 3.0
>>> means you have to take a major version update to get it?
>>> >
>>> > I generally support moving on to 3.x so we can also jettison a lot of
>>> older dependencies, code, fix some long standing issues, etc.
>>> >
>>> > (BTW Scala 2.12 support, mentioned in the OP, will go in for 2.4)
>>> >
>>> > On Thu, Sep 6, 2018 at 9:10 AM Ryan Blue 
>>> wrote:
>>> > My concern is that the v2 data source API is still evolving and not
>>> very close to stable. I had hoped to have stabilized the API and behaviors
>>> for a 3.0 release. But we could also wait on that for a 4.0 release,
>>> depending on when we think that will be.
>>> >
>>> > Unless there is a pressing need to move to 3.0 for some other area, I
>>> think it would be better for the v2 sources to have a 2.5 release.
>>> >
>>> > On Thu, Sep 6, 2018 at 8:59 AM Xiao Li  wrote:
>>> > Yesterday, the 2.4 branch was created. Based on the above discussion,
>>> I think we can bump the master branch to 3.0.0-SNAPSHOT. Any concern?
>>> >
>>> >
>>> >
>>> > --
>>> > Ryan Blue
>>> > Software Engineer
>>> > Netflix
>>> >
>>> >
>>> > --
>>> > Ryan Blue
>>> > Software Engineer
>>> > Netflix
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: [DISCUSS] Syntax for table DDL

2018-09-29 Thread Xiao Li
Are they consistent with the current syntax defined in SqlBase.g4? I think
we are following the Hive DDL syntax:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/Partition/Column

Ryan Blue  于2018年9月28日周五 下午3:47写道:

> Hi everyone,
>
> I’m currently working on new table DDL statements for v2 tables. For
> context, the new logical plans for DataSourceV2 require a catalog interface
> so that Spark can create tables for operations like CTAS. The proposed
> TableCatalog API also includes an API for altering those tables so we can
> make ALTER TABLE statements work. I’m implementing those DDL statements,
> which will make it into upstream Spark when the TableCatalog PR is merged.
>
> Since I’m adding new SQL statements that don’t yet exist in Spark, I want
> to make sure that the syntax I’m using in our branch will match the syntax
> we add to Spark later. I’m basing this proposed syntax on PostgreSQL
> .
>
>- *Update data type*: ALTER TABLE tableIdentifier ALTER COLUMN
>qualifiedName TYPE dataType.
>- *Rename column*: ALTER TABLE tableIdentifier RENAME COLUMN
>qualifiedName TO qualifiedName
>- *Drop column*: ALTER TABLE tableIdentifier DROP (COLUMN | COLUMNS)
>qualifiedNameList
>
> A few notes:
>
>- Using qualifiedName in these rules allows updating nested types,
>like point.x.
>- Updates and renames can only alter one column, but drop can drop a
>list.
>- Rename can’t move types and will validate that if the TO name is
>qualified, that the prefix matches the original field.
>- I’m also changing ADD COLUMN to support adding fields to nested
>columns by using qualifiedName instead of identifier.
>
> Please reply to this thread if you have suggestions based on a different
> SQL engine or want this syntax to be different for another reason. Thanks!
>
> rb
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: time for Apache Spark 3.0?

2018-09-29 Thread Reynold Xin
i think we should create spips for some of them, since they are pretty
large ... i can create some tickets to start with

--
excuse the brevity and lower case due to wrist injury


On Fri, Sep 28, 2018 at 11:01 PM Xiao Li  wrote:

> Based on the above discussions, we have a "rough consensus" that the next
> release will be 3.0. Now, we can start working on the API breaking changes
> (e.g., the ones mentioned in the original email from Reynold).
>
> Cheers,
>
> Xiao
>
> Matei Zaharia  于2018年9月6日周四 下午2:21写道:
>
>> Yes, you can start with Unstable and move to Evolving and Stable when
>> needed. We’ve definitely had experimental features that changed across
>> maintenance releases when they were well-isolated. If your change risks
>> breaking stuff in stable components of Spark though, then it probably won’t
>> be suitable for that.
>>
>> > On Sep 6, 2018, at 1:49 PM, Ryan Blue 
>> wrote:
>> >
>> > I meant flexibility beyond the point releases. I think what Reynold was
>> suggesting was getting v2 code out more often than the point releases every
>> 6 months. An Evolving API can change in point releases, but maybe we should
>> move v2 to Unstable so it can change more often? I don't really see another
>> way to get changes out more often.
>> >
>> > On Thu, Sep 6, 2018 at 11:07 AM Mark Hamstra 
>> wrote:
>> > Yes, that is why we have these annotations in the code and the
>> corresponding labels appearing in the API documentation:
>> https://github.com/apache/spark/blob/master/common/tags/src/main/java/org/apache/spark/annotation/InterfaceStability.java
>> >
>> > As long as it is properly annotated, we can change or even eliminate an
>> API method before the next major release. And frankly, we shouldn't be
>> contemplating bringing in the DS v2 API (and, I'd argue, any new API)
>> without such an annotation. There is just too much risk of not getting
>> everything right before we see the results of the new API being more widely
>> used, and too much cost in maintaining until the next major release
>> something that we come to regret for us to create new API in a fully frozen
>> state.
>> >
>> >
>> > On Thu, Sep 6, 2018 at 9:49 AM Ryan Blue 
>> wrote:
>> > It would be great to get more features out incrementally. For
>> experimental features, do we have more relaxed constraints?
>> >
>> > On Thu, Sep 6, 2018 at 9:47 AM Reynold Xin  wrote:
>> > +1 on 3.0
>> >
>> > Dsv2 stable can still evolve in across major releases. DataFrame,
>> Dataset, dsv1 and a lot of other major features all were developed
>> throughout the 1.x and 2.x lines.
>> >
>> > I do want to explore ways for us to get dsv2 incremental changes out
>> there more frequently, to get feedback. Maybe that means we apply additive
>> changes to 2.4.x; maybe that means making another 2.5 release sooner. I
>> will start a separate thread about it.
>> >
>> >
>> >
>> > On Thu, Sep 6, 2018 at 9:31 AM Sean Owen  wrote:
>> > I think this doesn't necessarily mean 3.0 is coming soon (thoughts on
>> timing? 6 months?) but simply next. Do you mean you'd prefer that change to
>> happen before 3.x? if it's a significant change, seems reasonable for a
>> major version bump rather than minor. Is the concern that tying it to 3.0
>> means you have to take a major version update to get it?
>> >
>> > I generally support moving on to 3.x so we can also jettison a lot of
>> older dependencies, code, fix some long standing issues, etc.
>> >
>> > (BTW Scala 2.12 support, mentioned in the OP, will go in for 2.4)
>> >
>> > On Thu, Sep 6, 2018 at 9:10 AM Ryan Blue 
>> wrote:
>> > My concern is that the v2 data source API is still evolving and not
>> very close to stable. I had hoped to have stabilized the API and behaviors
>> for a 3.0 release. But we could also wait on that for a 4.0 release,
>> depending on when we think that will be.
>> >
>> > Unless there is a pressing need to move to 3.0 for some other area, I
>> think it would be better for the v2 sources to have a 2.5 release.
>> >
>> > On Thu, Sep 6, 2018 at 8:59 AM Xiao Li  wrote:
>> > Yesterday, the 2.4 branch was created. Based on the above discussion, I
>> think we can bump the master branch to 3.0.0-SNAPSHOT. Any concern?
>> >
>> >
>> >
>> > --
>> > Ryan Blue
>> > Software Engineer
>> > Netflix
>> >
>> >
>> > --
>> > Ryan Blue
>> > Software Engineer
>> > Netflix
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: time for Apache Spark 3.0?

2018-09-29 Thread Xiao Li
Based on the above discussions, we have a "rough consensus" that the next
release will be 3.0. Now, we can start working on the API breaking changes
(e.g., the ones mentioned in the original email from Reynold).

Cheers,

Xiao

Matei Zaharia  于2018年9月6日周四 下午2:21写道:

> Yes, you can start with Unstable and move to Evolving and Stable when
> needed. We’ve definitely had experimental features that changed across
> maintenance releases when they were well-isolated. If your change risks
> breaking stuff in stable components of Spark though, then it probably won’t
> be suitable for that.
>
> > On Sep 6, 2018, at 1:49 PM, Ryan Blue  wrote:
> >
> > I meant flexibility beyond the point releases. I think what Reynold was
> suggesting was getting v2 code out more often than the point releases every
> 6 months. An Evolving API can change in point releases, but maybe we should
> move v2 to Unstable so it can change more often? I don't really see another
> way to get changes out more often.
> >
> > On Thu, Sep 6, 2018 at 11:07 AM Mark Hamstra 
> wrote:
> > Yes, that is why we have these annotations in the code and the
> corresponding labels appearing in the API documentation:
> https://github.com/apache/spark/blob/master/common/tags/src/main/java/org/apache/spark/annotation/InterfaceStability.java
> >
> > As long as it is properly annotated, we can change or even eliminate an
> API method before the next major release. And frankly, we shouldn't be
> contemplating bringing in the DS v2 API (and, I'd argue, any new API)
> without such an annotation. There is just too much risk of not getting
> everything right before we see the results of the new API being more widely
> used, and too much cost in maintaining until the next major release
> something that we come to regret for us to create new API in a fully frozen
> state.
> >
> >
> > On Thu, Sep 6, 2018 at 9:49 AM Ryan Blue 
> wrote:
> > It would be great to get more features out incrementally. For
> experimental features, do we have more relaxed constraints?
> >
> > On Thu, Sep 6, 2018 at 9:47 AM Reynold Xin  wrote:
> > +1 on 3.0
> >
> > Dsv2 stable can still evolve in across major releases. DataFrame,
> Dataset, dsv1 and a lot of other major features all were developed
> throughout the 1.x and 2.x lines.
> >
> > I do want to explore ways for us to get dsv2 incremental changes out
> there more frequently, to get feedback. Maybe that means we apply additive
> changes to 2.4.x; maybe that means making another 2.5 release sooner. I
> will start a separate thread about it.
> >
> >
> >
> > On Thu, Sep 6, 2018 at 9:31 AM Sean Owen  wrote:
> > I think this doesn't necessarily mean 3.0 is coming soon (thoughts on
> timing? 6 months?) but simply next. Do you mean you'd prefer that change to
> happen before 3.x? if it's a significant change, seems reasonable for a
> major version bump rather than minor. Is the concern that tying it to 3.0
> means you have to take a major version update to get it?
> >
> > I generally support moving on to 3.x so we can also jettison a lot of
> older dependencies, code, fix some long standing issues, etc.
> >
> > (BTW Scala 2.12 support, mentioned in the OP, will go in for 2.4)
> >
> > On Thu, Sep 6, 2018 at 9:10 AM Ryan Blue 
> wrote:
> > My concern is that the v2 data source API is still evolving and not very
> close to stable. I had hoped to have stabilized the API and behaviors for a
> 3.0 release. But we could also wait on that for a 4.0 release, depending on
> when we think that will be.
> >
> > Unless there is a pressing need to move to 3.0 for some other area, I
> think it would be better for the v2 sources to have a 2.5 release.
> >
> > On Thu, Sep 6, 2018 at 8:59 AM Xiao Li  wrote:
> > Yesterday, the 2.4 branch was created. Based on the above discussion, I
> think we can bump the master branch to 3.0.0-SNAPSHOT. Any concern?
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>