Re: time for Apache Spark 3.0?

2018-06-16 Thread vaquar khan
+1  for 2.4 next, followed by 3.0.

Where we can get Apache Spark road map for 2.4 and 2.5  3.0 ?
is it possible we can share future release proposed specification same
like  releases (https://spark.apache.org/releases/spark-release-2-3-0.html)
Regards,
Viquar khan

On Sat, Jun 16, 2018 at 12:02 PM, vaquar khan  wrote:

> Plz ignore last email link (you tube )not sure how it added .
> Apologies not sure how to delete it.
>
>
> On Sat, Jun 16, 2018 at 11:58 AM, vaquar khan 
> wrote:
>
>> +1
>>
>> https://www.youtube.com/watch?v=-ik7aJ5U6kg
>>
>> Regards,
>> Vaquar khan
>>
>> On Fri, Jun 15, 2018 at 4:55 PM, Reynold Xin  wrote:
>>
>>> Yes. At this rate I think it's better to do 2.4 next, followed by 3.0.
>>>
>>>
>>> On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan 
>>> wrote:
>>>
 I agree, I dont see pressing need for major version bump as well.


 Regards,
 Mridul
 On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra 
 wrote:
 >
 > Changing major version numbers is not about new features or a vague
 notion that it is time to do something that will be seen to be a
 significant release. It is about breaking stable public APIs.
 >
 > I still remain unconvinced that the next version can't be 2.4.0.
 >
 > On Fri, Jun 15, 2018 at 1:34 AM Andy  wrote:
 >>
 >> Dear all:
 >>
 >> It have been 2 months since this topic being proposed. Any progress
 now? 2018 has been passed about 1/2.
 >>
 >> I agree with that the new version should be some exciting new
 feature. How about this one:
 >>
 >> 6. ML/DL framework to be integrated as core component and feature.
 (Such as Angel / BigDL / ……)
 >>
 >> 3.0 is a very important version for an good open source project. It
 should be better to drift away the historical burden and focus in new area.
 Spark has been widely used all over the world as a successful big data
 framework. And it can be better than that.
 >>
 >> Andy
 >>
 >>
 >> On Thu, Apr 5, 2018 at 7:20 AM Reynold Xin 
 wrote:
 >>>
 >>> There was a discussion thread on scala-contributors about Apache
 Spark not yet supporting Scala 2.12, and that got me to think perhaps it is
 about time for Spark to work towards the 3.0 release. By the time it comes
 out, it will be more than 2 years since Spark 2.0.
 >>>
 >>> For contributors less familiar with Spark’s history, I want to give
 more context on Spark releases:
 >>>
 >>> 1. Timeline: Spark 1.0 was released May 2014. Spark 2.0 was July
 2016. If we were to maintain the ~ 2 year cadence, it is time to work on
 Spark 3.0 in 2018.
 >>>
 >>> 2. Spark’s versioning policy promises that Spark does not break
 stable APIs in feature releases (e.g. 2.1, 2.2). API breaking changes are
 sometimes a necessary evil, and can be done in major releases (e.g. 1.6 to
 2.0, 2.x to 3.0).
 >>>
 >>> 3. That said, a major version isn’t necessarily the playground for
 disruptive API changes to make it painful for users to update. The main
 purpose of a major release is an opportunity to fix things that are broken
 in the current API and remove certain deprecated APIs.
 >>>
 >>> 4. Spark as a project has a culture of evolving architecture and
 developing major new features incrementally, so major releases are not the
 only time for exciting new features. For example, the bulk of the work in
 the move towards the DataFrame API was done in Spark 1.3, and Continuous
 Processing was introduced in Spark 2.3. Both were feature releases rather
 than major releases.
 >>>
 >>>
 >>> You can find more background in the thread discussing Spark 2.0:
 http://apache-spark-developers-list.1001551.n3.nabble.com/A-
 proposal-for-Spark-2-0-td15122.html
 >>>
 >>>
 >>> The primary motivating factor IMO for a major version bump is to
 support Scala 2.12, which requires minor API breaking changes to Spark’s
 APIs. Similar to Spark 2.0, I think there are also opportunities for other
 changes that we know have been biting us for a long time but can’t be
 changed in feature releases (to be clear, I’m actually not sure they are
 all good ideas, but I’m writing them down as candidates for consideration):
 >>>
 >>> 1. Support Scala 2.12.
 >>>
 >>> 2. Remove interfaces, configs, and modules (e.g. Bagel) deprecated
 in Spark 2.x.
 >>>
 >>> 3. Shade all dependencies.
 >>>
 >>> 4. Change the reserved keywords in Spark SQL to be more ANSI-SQL
 compliant, to prevent users from shooting themselves in the foot, e.g.
 “SELECT 2 SECOND” -- is “SECOND” an interval unit or an alias? To make it
 less painful for users to upgrade here, I’d suggest creating a flag for
 backward compatibility mode.
 >>>
 >>> 5. Similar to 4, make our type coercion rule in DataFrame/SQL 

Re: time for Apache Spark 3.0?

2018-06-16 Thread vaquar khan
Plz ignore last email link (you tube )not sure how it added .
Apologies not sure how to delete it.


On Sat, Jun 16, 2018 at 11:58 AM, vaquar khan  wrote:

> +1
>
> https://www.youtube.com/watch?v=-ik7aJ5U6kg
>
> Regards,
> Vaquar khan
>
> On Fri, Jun 15, 2018 at 4:55 PM, Reynold Xin  wrote:
>
>> Yes. At this rate I think it's better to do 2.4 next, followed by 3.0.
>>
>>
>> On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan 
>> wrote:
>>
>>> I agree, I dont see pressing need for major version bump as well.
>>>
>>>
>>> Regards,
>>> Mridul
>>> On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra 
>>> wrote:
>>> >
>>> > Changing major version numbers is not about new features or a vague
>>> notion that it is time to do something that will be seen to be a
>>> significant release. It is about breaking stable public APIs.
>>> >
>>> > I still remain unconvinced that the next version can't be 2.4.0.
>>> >
>>> > On Fri, Jun 15, 2018 at 1:34 AM Andy  wrote:
>>> >>
>>> >> Dear all:
>>> >>
>>> >> It have been 2 months since this topic being proposed. Any progress
>>> now? 2018 has been passed about 1/2.
>>> >>
>>> >> I agree with that the new version should be some exciting new
>>> feature. How about this one:
>>> >>
>>> >> 6. ML/DL framework to be integrated as core component and feature.
>>> (Such as Angel / BigDL / ……)
>>> >>
>>> >> 3.0 is a very important version for an good open source project. It
>>> should be better to drift away the historical burden and focus in new area.
>>> Spark has been widely used all over the world as a successful big data
>>> framework. And it can be better than that.
>>> >>
>>> >> Andy
>>> >>
>>> >>
>>> >> On Thu, Apr 5, 2018 at 7:20 AM Reynold Xin 
>>> wrote:
>>> >>>
>>> >>> There was a discussion thread on scala-contributors about Apache
>>> Spark not yet supporting Scala 2.12, and that got me to think perhaps it is
>>> about time for Spark to work towards the 3.0 release. By the time it comes
>>> out, it will be more than 2 years since Spark 2.0.
>>> >>>
>>> >>> For contributors less familiar with Spark’s history, I want to give
>>> more context on Spark releases:
>>> >>>
>>> >>> 1. Timeline: Spark 1.0 was released May 2014. Spark 2.0 was July
>>> 2016. If we were to maintain the ~ 2 year cadence, it is time to work on
>>> Spark 3.0 in 2018.
>>> >>>
>>> >>> 2. Spark’s versioning policy promises that Spark does not break
>>> stable APIs in feature releases (e.g. 2.1, 2.2). API breaking changes are
>>> sometimes a necessary evil, and can be done in major releases (e.g. 1.6 to
>>> 2.0, 2.x to 3.0).
>>> >>>
>>> >>> 3. That said, a major version isn’t necessarily the playground for
>>> disruptive API changes to make it painful for users to update. The main
>>> purpose of a major release is an opportunity to fix things that are broken
>>> in the current API and remove certain deprecated APIs.
>>> >>>
>>> >>> 4. Spark as a project has a culture of evolving architecture and
>>> developing major new features incrementally, so major releases are not the
>>> only time for exciting new features. For example, the bulk of the work in
>>> the move towards the DataFrame API was done in Spark 1.3, and Continuous
>>> Processing was introduced in Spark 2.3. Both were feature releases rather
>>> than major releases.
>>> >>>
>>> >>>
>>> >>> You can find more background in the thread discussing Spark 2.0:
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/A-
>>> proposal-for-Spark-2-0-td15122.html
>>> >>>
>>> >>>
>>> >>> The primary motivating factor IMO for a major version bump is to
>>> support Scala 2.12, which requires minor API breaking changes to Spark’s
>>> APIs. Similar to Spark 2.0, I think there are also opportunities for other
>>> changes that we know have been biting us for a long time but can’t be
>>> changed in feature releases (to be clear, I’m actually not sure they are
>>> all good ideas, but I’m writing them down as candidates for consideration):
>>> >>>
>>> >>> 1. Support Scala 2.12.
>>> >>>
>>> >>> 2. Remove interfaces, configs, and modules (e.g. Bagel) deprecated
>>> in Spark 2.x.
>>> >>>
>>> >>> 3. Shade all dependencies.
>>> >>>
>>> >>> 4. Change the reserved keywords in Spark SQL to be more ANSI-SQL
>>> compliant, to prevent users from shooting themselves in the foot, e.g.
>>> “SELECT 2 SECOND” -- is “SECOND” an interval unit or an alias? To make it
>>> less painful for users to upgrade here, I’d suggest creating a flag for
>>> backward compatibility mode.
>>> >>>
>>> >>> 5. Similar to 4, make our type coercion rule in DataFrame/SQL more
>>> standard compliant, and have a flag for backward compatibility.
>>> >>>
>>> >>> 6. Miscellaneous other small changes documented in JIRA already
>>> (e.g. “JavaPairRDD flatMapValues requires function returning Iterable, not
>>> Iterator”, “Prevent column name duplication in temporary view”).
>>> >>>
>>> >>>
>>> >>> Now the reality of a major version bump is that the world often
>>> thinks in terms of what exciting features are coming. 

Re: time for Apache Spark 3.0?

2018-06-16 Thread vaquar khan
+1

https://www.youtube.com/watch?v=-ik7aJ5U6kg

Regards,
Vaquar khan

On Fri, Jun 15, 2018 at 4:55 PM, Reynold Xin  wrote:

> Yes. At this rate I think it's better to do 2.4 next, followed by 3.0.
>
>
> On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan 
> wrote:
>
>> I agree, I dont see pressing need for major version bump as well.
>>
>>
>> Regards,
>> Mridul
>> On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra 
>> wrote:
>> >
>> > Changing major version numbers is not about new features or a vague
>> notion that it is time to do something that will be seen to be a
>> significant release. It is about breaking stable public APIs.
>> >
>> > I still remain unconvinced that the next version can't be 2.4.0.
>> >
>> > On Fri, Jun 15, 2018 at 1:34 AM Andy  wrote:
>> >>
>> >> Dear all:
>> >>
>> >> It have been 2 months since this topic being proposed. Any progress
>> now? 2018 has been passed about 1/2.
>> >>
>> >> I agree with that the new version should be some exciting new feature.
>> How about this one:
>> >>
>> >> 6. ML/DL framework to be integrated as core component and feature.
>> (Such as Angel / BigDL / ……)
>> >>
>> >> 3.0 is a very important version for an good open source project. It
>> should be better to drift away the historical burden and focus in new area.
>> Spark has been widely used all over the world as a successful big data
>> framework. And it can be better than that.
>> >>
>> >> Andy
>> >>
>> >>
>> >> On Thu, Apr 5, 2018 at 7:20 AM Reynold Xin 
>> wrote:
>> >>>
>> >>> There was a discussion thread on scala-contributors about Apache
>> Spark not yet supporting Scala 2.12, and that got me to think perhaps it is
>> about time for Spark to work towards the 3.0 release. By the time it comes
>> out, it will be more than 2 years since Spark 2.0.
>> >>>
>> >>> For contributors less familiar with Spark’s history, I want to give
>> more context on Spark releases:
>> >>>
>> >>> 1. Timeline: Spark 1.0 was released May 2014. Spark 2.0 was July
>> 2016. If we were to maintain the ~ 2 year cadence, it is time to work on
>> Spark 3.0 in 2018.
>> >>>
>> >>> 2. Spark’s versioning policy promises that Spark does not break
>> stable APIs in feature releases (e.g. 2.1, 2.2). API breaking changes are
>> sometimes a necessary evil, and can be done in major releases (e.g. 1.6 to
>> 2.0, 2.x to 3.0).
>> >>>
>> >>> 3. That said, a major version isn’t necessarily the playground for
>> disruptive API changes to make it painful for users to update. The main
>> purpose of a major release is an opportunity to fix things that are broken
>> in the current API and remove certain deprecated APIs.
>> >>>
>> >>> 4. Spark as a project has a culture of evolving architecture and
>> developing major new features incrementally, so major releases are not the
>> only time for exciting new features. For example, the bulk of the work in
>> the move towards the DataFrame API was done in Spark 1.3, and Continuous
>> Processing was introduced in Spark 2.3. Both were feature releases rather
>> than major releases.
>> >>>
>> >>>
>> >>> You can find more background in the thread discussing Spark 2.0:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/A-proposal-for-
>> Spark-2-0-td15122.html
>> >>>
>> >>>
>> >>> The primary motivating factor IMO for a major version bump is to
>> support Scala 2.12, which requires minor API breaking changes to Spark’s
>> APIs. Similar to Spark 2.0, I think there are also opportunities for other
>> changes that we know have been biting us for a long time but can’t be
>> changed in feature releases (to be clear, I’m actually not sure they are
>> all good ideas, but I’m writing them down as candidates for consideration):
>> >>>
>> >>> 1. Support Scala 2.12.
>> >>>
>> >>> 2. Remove interfaces, configs, and modules (e.g. Bagel) deprecated in
>> Spark 2.x.
>> >>>
>> >>> 3. Shade all dependencies.
>> >>>
>> >>> 4. Change the reserved keywords in Spark SQL to be more ANSI-SQL
>> compliant, to prevent users from shooting themselves in the foot, e.g.
>> “SELECT 2 SECOND” -- is “SECOND” an interval unit or an alias? To make it
>> less painful for users to upgrade here, I’d suggest creating a flag for
>> backward compatibility mode.
>> >>>
>> >>> 5. Similar to 4, make our type coercion rule in DataFrame/SQL more
>> standard compliant, and have a flag for backward compatibility.
>> >>>
>> >>> 6. Miscellaneous other small changes documented in JIRA already (e.g.
>> “JavaPairRDD flatMapValues requires function returning Iterable, not
>> Iterator”, “Prevent column name duplication in temporary view”).
>> >>>
>> >>>
>> >>> Now the reality of a major version bump is that the world often
>> thinks in terms of what exciting features are coming. I do think there are
>> a number of major changes happening already that can be part of the 3.0
>> release, if they make it in:
>> >>>
>> >>> 1. Scala 2.12 support (listing it twice)
>> >>> 2. Continuous Processing non-experimental
>> >>> 3. Kubernetes support non-experimental

Re: time for Apache Spark 3.0?

2018-06-16 Thread Xiao Li
+1

2018-06-15 14:55 GMT-07:00 Reynold Xin :

> Yes. At this rate I think it's better to do 2.4 next, followed by 3.0.
>
>
> On Fri, Jun 15, 2018 at 10:52 AM Mridul Muralidharan 
> wrote:
>
>> I agree, I dont see pressing need for major version bump as well.
>>
>>
>> Regards,
>> Mridul
>> On Fri, Jun 15, 2018 at 10:25 AM Mark Hamstra 
>> wrote:
>> >
>> > Changing major version numbers is not about new features or a vague
>> notion that it is time to do something that will be seen to be a
>> significant release. It is about breaking stable public APIs.
>> >
>> > I still remain unconvinced that the next version can't be 2.4.0.
>> >
>> > On Fri, Jun 15, 2018 at 1:34 AM Andy  wrote:
>> >>
>> >> Dear all:
>> >>
>> >> It have been 2 months since this topic being proposed. Any progress
>> now? 2018 has been passed about 1/2.
>> >>
>> >> I agree with that the new version should be some exciting new feature.
>> How about this one:
>> >>
>> >> 6. ML/DL framework to be integrated as core component and feature.
>> (Such as Angel / BigDL / ……)
>> >>
>> >> 3.0 is a very important version for an good open source project. It
>> should be better to drift away the historical burden and focus in new area.
>> Spark has been widely used all over the world as a successful big data
>> framework. And it can be better than that.
>> >>
>> >> Andy
>> >>
>> >>
>> >> On Thu, Apr 5, 2018 at 7:20 AM Reynold Xin 
>> wrote:
>> >>>
>> >>> There was a discussion thread on scala-contributors about Apache
>> Spark not yet supporting Scala 2.12, and that got me to think perhaps it is
>> about time for Spark to work towards the 3.0 release. By the time it comes
>> out, it will be more than 2 years since Spark 2.0.
>> >>>
>> >>> For contributors less familiar with Spark’s history, I want to give
>> more context on Spark releases:
>> >>>
>> >>> 1. Timeline: Spark 1.0 was released May 2014. Spark 2.0 was July
>> 2016. If we were to maintain the ~ 2 year cadence, it is time to work on
>> Spark 3.0 in 2018.
>> >>>
>> >>> 2. Spark’s versioning policy promises that Spark does not break
>> stable APIs in feature releases (e.g. 2.1, 2.2). API breaking changes are
>> sometimes a necessary evil, and can be done in major releases (e.g. 1.6 to
>> 2.0, 2.x to 3.0).
>> >>>
>> >>> 3. That said, a major version isn’t necessarily the playground for
>> disruptive API changes to make it painful for users to update. The main
>> purpose of a major release is an opportunity to fix things that are broken
>> in the current API and remove certain deprecated APIs.
>> >>>
>> >>> 4. Spark as a project has a culture of evolving architecture and
>> developing major new features incrementally, so major releases are not the
>> only time for exciting new features. For example, the bulk of the work in
>> the move towards the DataFrame API was done in Spark 1.3, and Continuous
>> Processing was introduced in Spark 2.3. Both were feature releases rather
>> than major releases.
>> >>>
>> >>>
>> >>> You can find more background in the thread discussing Spark 2.0:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/A-proposal-for-
>> Spark-2-0-td15122.html
>> >>>
>> >>>
>> >>> The primary motivating factor IMO for a major version bump is to
>> support Scala 2.12, which requires minor API breaking changes to Spark’s
>> APIs. Similar to Spark 2.0, I think there are also opportunities for other
>> changes that we know have been biting us for a long time but can’t be
>> changed in feature releases (to be clear, I’m actually not sure they are
>> all good ideas, but I’m writing them down as candidates for consideration):
>> >>>
>> >>> 1. Support Scala 2.12.
>> >>>
>> >>> 2. Remove interfaces, configs, and modules (e.g. Bagel) deprecated in
>> Spark 2.x.
>> >>>
>> >>> 3. Shade all dependencies.
>> >>>
>> >>> 4. Change the reserved keywords in Spark SQL to be more ANSI-SQL
>> compliant, to prevent users from shooting themselves in the foot, e.g.
>> “SELECT 2 SECOND” -- is “SECOND” an interval unit or an alias? To make it
>> less painful for users to upgrade here, I’d suggest creating a flag for
>> backward compatibility mode.
>> >>>
>> >>> 5. Similar to 4, make our type coercion rule in DataFrame/SQL more
>> standard compliant, and have a flag for backward compatibility.
>> >>>
>> >>> 6. Miscellaneous other small changes documented in JIRA already (e.g.
>> “JavaPairRDD flatMapValues requires function returning Iterable, not
>> Iterator”, “Prevent column name duplication in temporary view”).
>> >>>
>> >>>
>> >>> Now the reality of a major version bump is that the world often
>> thinks in terms of what exciting features are coming. I do think there are
>> a number of major changes happening already that can be part of the 3.0
>> release, if they make it in:
>> >>>
>> >>> 1. Scala 2.12 support (listing it twice)
>> >>> 2. Continuous Processing non-experimental
>> >>> 3. Kubernetes support non-experimental
>> >>> 4. A more flushed out version of data source API v2 (I don’t think it
>> 

Re: Jenkins availability question

2018-06-16 Thread Hyukjin Kwon
Ooops, I just noticed Shane's email. Please ignore this email.

2018년 6월 16일 (토) 오후 7:43, Hyukjin Kwon 님이 작성:

> Is Jenkins down now? I was about to investigate some issues that happened
> specifically within Jenkins.
>
> I would appreciate if anyone could roughly confirm when it comes back, or
> if it's actually now working fine but there's an issue specific to me to
> access to the Jenkins.
>
>


Jenkins availability question

2018-06-16 Thread Hyukjin Kwon
Is Jenkins down now? I was about to investigate some issues that happened
specifically within Jenkins.

I would appreciate if anyone could roughly confirm when it comes back, or
if it's actually now working fine but there's an issue specific to me to
access to the Jenkins.