[ANNOUNCEMENT] Plan for dropping Python 2 support

2019-06-03 Thread Xiangrui Meng
Hi all,

Today we announced the plan for dropping Python 2 support
 [1]
in Apache Spark:

As many of you already knew, Python core development team and many utilized
Python packages like Pandas and NumPy will drop Python 2 support in or
before 2020/01/01  [2]. Apache Spark has
supported both Python 2 and 3 since Spark 1.4 release in 2015. However,
maintaining Python 2/3 compatibility is an increasing burden and it
essentially limits the use of Python 3 features in Spark. Given the end of
life (EOL) of Python 2 is coming, we plan to eventually drop Python 2
support as well. The current plan is as follows:

* In the next major release in 2019, we will deprecate Python 2 support.
PySpark users will see a deprecation warning if Python 2 is used. We will
publish a migration guide for PySpark users to migrate to Python 3.
* We will drop Python 2 support in a future release (excluding patch
release) in 2020, after Python 2 EOL on 2020/01/01. PySpark users will see
an error if Python 2 is used.
* For releases that support Python 2, e.g., Spark 2.4, their patch releases
will continue supporting Python 2. However, after Python 2 EOL, we might
not take patches that are specific to Python 2.

Best,
Xiangrui

[1]: http://spark.apache.org/news/plan-for-dropping-python-2-support.html
[2]: https://python3statement.org/


Re: Should python-2 be supported in Spark 3.0?

2019-06-03 Thread Xiangrui Meng
I updated Spark website and announced the plan for dropping python 2
support there:
http://spark.apache.org/news/plan-for-dropping-python-2-support.html. I
will send an announcement email to user@ and dev@. -Xiangrui

On Fri, May 31, 2019 at 10:54 PM Felix Cheung 
wrote:

> Very subtle but someone might take
>
> “We will drop Python 2 support in a future release in 2020”
>
> To mean any / first release in 2020. Whereas the next statement indicates
> patch release is not included in above. Might help reorder the items or
> clarify the wording.
>
>
> --
> *From:* shane knapp 
> *Sent:* Friday, May 31, 2019 7:38:10 PM
> *To:* Denny Lee
> *Cc:* Holden Karau; Bryan Cutler; Erik Erlandson; Felix Cheung; Mark
> Hamstra; Matei Zaharia; Reynold Xin; Sean Owen; Wenchen Fen; Xiangrui Meng;
> dev; user
> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>
> +1000  ;)
>
> On Sat, Jun 1, 2019 at 6:53 AM Denny Lee  wrote:
>
>> +1
>>
>> On Fri, May 31, 2019 at 17:58 Holden Karau  wrote:
>>
>>> +1
>>>
>>> On Fri, May 31, 2019 at 5:41 PM Bryan Cutler  wrote:
>>>
 +1 and the draft sounds good

 On Thu, May 30, 2019, 11:32 AM Xiangrui Meng  wrote:

> Here is the draft announcement:
>
> ===
> Plan for dropping Python 2 support
>
> As many of you already knew, Python core development team and many
> utilized Python packages like Pandas and NumPy will drop Python 2 support
> in or before 2020/01/01. Apache Spark has supported both Python 2 and 3
> since Spark 1.4 release in 2015. However, maintaining Python 2/3
> compatibility is an increasing burden and it essentially limits the use of
> Python 3 features in Spark. Given the end of life (EOL) of Python 2 is
> coming, we plan to eventually drop Python 2 support as well. The current
> plan is as follows:
>
> * In the next major release in 2019, we will deprecate Python 2
> support. PySpark users will see a deprecation warning if Python 2 is used.
> We will publish a migration guide for PySpark users to migrate to Python 
> 3.
> * We will drop Python 2 support in a future release in 2020, after
> Python 2 EOL on 2020/01/01. PySpark users will see an error if Python 2 is
> used.
> * For releases that support Python 2, e.g., Spark 2.4, their patch
> releases will continue supporting Python 2. However, after Python 2 EOL, 
> we
> might not take patches that are specific to Python 2.
> ===
>
> Sean helped make a pass. If it looks good, I'm going to upload it to
> Spark website and announce it here. Let me know if you think we should do 
> a
> VOTE instead.
>
> On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng 
> wrote:
>
>> I created https://issues.apache.org/jira/browse/SPARK-27884 to track
>> the work.
>>
>> On Thu, May 30, 2019 at 2:18 AM Felix Cheung <
>> felixcheun...@hotmail.com> wrote:
>>
>>> We don’t usually reference a future release on website
>>>
>>> > Spark website and state that Python 2 is deprecated in Spark 3.0
>>>
>>> I suspect people will then ask when is Spark 3.0 coming out then.
>>> Might need to provide some clarity on that.
>>>
>>
>> We can say the "next major release in 2019" instead of Spark 3.0.
>> Spark 3.0 timeline certainly requires a new thread to discuss.
>>
>>
>>>
>>>
>>> --
>>> *From:* Reynold Xin 
>>> *Sent:* Thursday, May 30, 2019 12:59:14 AM
>>> *To:* shane knapp
>>> *Cc:* Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen;
>>> Wenchen Fen; Xiangrui Meng; dev; user
>>> *Subject:* Re: Should python-2 be supported in Spark 3.0?
>>>
>>> +1 on Xiangrui’s plan.
>>>
>>> On Thu, May 30, 2019 at 7:55 AM shane knapp 
>>> wrote:
>>>
 I don't have a good sense of the overhead of continuing to support
> Python 2; is it large enough to consider dropping it in Spark 3.0?
>
> from the build/test side, it will actually be pretty easy to
 continue support for python2.7 for spark 2.x as the feature sets won't 
 be
 expanding.

>>>
 that being said, i will be cracking a bottle of champagne when i
 can delete all of the ansible and anaconda configs for python2.x.  :)

>>>
>> On the development side, in a future release that drops Python 2
>> support we can remove code that maintains python 2/3 compatibility and
>> start using python 3 only features, which is also quite exciting.
>>
>>
>>>
 shane
 --
 Shane Knapp
 UC Berkeley EECS Research / RISELab Staff Technical Lead
 https://rise.cs.berkeley.edu

>>>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  

unsubscribe

2019-06-03 Thread ehsan shams



Re: Support SqlStreaming in spark

2019-06-03 Thread Stavros Kontopoulos
Hi all,
>From what I read there is an effort here to globally standardize SQL
Streaming (Flink people, Google at others are working with SQL
standardization body) https://arxiv.org/abs/1905.12133v1
should
Spark community be part of it?

Best,
Stavros

On Thu, Mar 28, 2019 at 12:03 PM uncleGen  wrote:

> Hi all,
>
> I have rewritten the design doc based on previous discussing.
>
> https://docs.google.com/document/d/19degwnIIcuMSELv6BQ_1VQI5AIVcvGeqOm5xE2-aRA0
>
> Would be interested to hear what others think.
>
> Regards,
> Genmao Yu
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


[SS] ContinuousExecution.commit and excessive JSON serialization?

2019-06-03 Thread Jacek Laskowski
Hi,

Why does ContinuousExecution.commit serialize an offset to JSON format just
before deserializing it back (from JSON to an offset)? [1]

val offset =

sources(0).deserializeOffset(offsetLog.get(epoch).get.offsets(0).get.json)

[1]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala#L341

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming
https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Follow me at https://twitter.com/jaceklaskowski