Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-26 Thread Ruifeng Zheng
+1

On Fri, Apr 26, 2024 at 10:26 AM Xinrong Meng  wrote:

> +1
>
> On Thu, Apr 25, 2024 at 2:08 PM Holden Karau 
> wrote:
>
>> +1
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Thu, Apr 25, 2024 at 11:18 AM Maciej  wrote:
>>
>>> +1
>>>
>>> Best regards,
>>> Maciej Szymkiewicz
>>>
>>> Web: https://zero323.net
>>> PGP: A30CEF0C31A501EC
>>>
>>> On 4/25/24 6:21 PM, Reynold Xin wrote:
>>>
>>> +1
>>>
>>> On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale
>>>  
>>> wrote:
>>>
 +1

 On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun 
 wrote:

> FYI, there is a proposal to drop Python 3.8 because its EOL is October
> 2024.
>
> https://github.com/apache/spark/pull/46228
> [SPARK-47993][PYTHON] Drop Python 3.8
>
> Since it's still alive and there will be an overlap between the
> lifecycle of Python 3.8 and Apache Spark 4.0.0, please give us your
> feedback on the PR, if you have any concerns.
>
> From my side, I agree with this decision.
>
> Thanks,
> Dongjoon.
>



Re: [PySpark]: DataFrameWriterV2.overwrite fails with spark connect

2024-04-11 Thread Ruifeng Zheng
Toki Takahashi,

Thanks for reporting this, I created
https://issues.apache.org/jira/browse/SPARK-47828 to track this bug.
I will take a look.

On Thu, Apr 11, 2024 at 10:11 PM Toki Takahashi 
wrote:

> Hi Community,
>
> I get the following error when using Spark Connect in PySpark 3.5.1
> and writing with DataFrameWriterV2.overwrite.
>
> ```
> > df.writeTo('db.table').overwrite(F.col('id')==F.lit(1))
> ...
> SparkConnectGrpcException:
> (org.apache.spark.sql.connect.common.InvalidPlanInput) Expression with
> ID: 0 is not supported
> ```
>
> I believe this is caused by the following code:
>
> https://github.com/apache/spark/blob/6e371e1df50e35d807065015525772c3c02a5995/python/pyspark/sql/connect/plan.py#L1760-L1763
>
> Is there a JIRA issue or PR regarding this error?
> If not, create one.
>
> Thanks,
> Toki Takahashi
>
> -----
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
Ruifeng Zheng
E-mail: zrfli...@gmail.com


Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Ruifeng Zheng
+1

On Mon, Apr 1, 2024 at 10:06 AM Haejoon Lee
 wrote:

> +1
>
> On Mon, Apr 1, 2024 at 10:15 AM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Pure Python Package in PyPI (Spark
>> Connect)
>>
>> JIRA <https://issues.apache.org/jira/browse/SPARK-47540>
>> Prototype <https://github.com/apache/spark/pull/45053>
>> SPIP doc
>> <https://docs.google.com/document/d/1Pund40wGRuB72LX6L7cliMDVoXTPR-xx4IkPmMLaZXk/edit?usp=sharing>
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>> Thanks.
>>
>

-- 
Ruifeng Zheng
E-mail: zrfli...@gmail.com


Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Ruifeng Zheng
+1

On Wed, Mar 13, 2024 at 4:32 AM John Zhuge  wrote:

> +1 (non-binding)
>
> On Tue, Mar 12, 2024 at 8:45 AM L. C. Hsieh  wrote:
>
>> +1
>>
>>
>> On Tue, Mar 12, 2024 at 8:20 AM Chao Sun  wrote:
>>
>>> +1
>>>
>>> On Tue, Mar 12, 2024 at 8:03 AM Xiao Li 
>>> wrote:
>>>
 +1

 On Tue, Mar 12, 2024 at 6:09 AM Holden Karau 
 wrote:

> +1
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

>
> On Mon, Mar 11, 2024 at 7:44 PM Reynold Xin
>  wrote:
>
>> +1
>>
>>
>> On Mon, Mar 11 2024 at 7:38 PM, Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>>
>>> +1 (non-binding), thanks Gengliang!
>>>
>>> On Mon, Mar 11, 2024 at 5:46 PM Gengliang Wang 
>>> wrote:
>>>
 Hi all,

 I'd like to start the vote for SPIP: Structured Logging Framework
 for Apache Spark

 References:

- JIRA ticket

- SPIP doc

 
- Discussion thread


 Please vote on the SPIP for the next 72 hours:

 [ ] +1: Accept the proposal as an official SPIP
 [ ] +0
 [ ] -1: I don’t think this is a good idea because …

 Thanks!
 Gengliang Wang

>>>

 --


>
> --
> John Zhuge
>


Re: [VOTE] SPIP: Testing Framework for Spark UI Javascript files

2023-11-26 Thread Ruifeng Zheng
+1

On Sun, Nov 26, 2023 at 6:58 AM Gengliang Wang  wrote:

> +1
>
> On Sat, Nov 25, 2023 at 2:50 AM yangjie01 
> wrote:
>
>> +1
>>
>>
>>
>> *发件人**: *Reynold Xin 
>> *日期**: *2023年11月25日 星期六 14:35
>> *收件人**: *Dongjoon Hyun 
>> *抄送**: *Ye Zhou , Mridul Muralidharan <
>> mri...@gmail.com>, Kent Yao , dev 
>> *主题**: *Re: [VOTE] SPIP: Testing Framework for Spark UI Javascript files
>>
>>
>>
>> +1
>>
>> [image: 图像已被发件人删除。]
>>
>>
>>
>>
>>
>> On Fri, Nov 24, 2023 at 10:19 PM, Dongjoon Hyun 
>> wrote:
>>
>> +1
>>
>>
>>
>> Thanks,
>>
>> Dongjoon.
>>
>>
>>
>> On Fri, Nov 24, 2023 at 7:14 PM Ye Zhou  wrote:
>>
>> +1(non-binding)
>>
>>
>>
>> On Fri, Nov 24, 2023 at 11:16 Mridul Muralidharan 
>> wrote:
>>
>>
>>
>> +1
>>
>>
>>
>> Regards,
>>
>> Mridul
>>
>>
>>
>> On Fri, Nov 24, 2023 at 8:21 AM Kent Yao  wrote:
>>
>> Hi Spark Dev,
>>
>> Following the discussion [1], I'd like to start the vote for the SPIP [2].
>>
>> The SPIP aims to improve the test coverage and develop experience for
>> Spark UI-related javascript codes.
>>
>> This thread will be open for at least the next 72 hours.  Please vote
>> accordingly,
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>>
>> Thank you!
>> Kent Yao
>>
>> [1] https://lists.apache.org/thread/5rqrho4ldgmqlc173y2229pfll5sgkff
>> <https://mailshield.baidu.com/check?q=A5eIk13PzbvR5UCVFABDK4GRUTDfo284IuNUsoBhT99%2fS%2boFEdCRAzrqN9WHLc6WlpnhXlOglydnrAZRonZnnNSbT%2fY%3d>
>> [2]
>> https://docs.google.com/document/d/1hWl5Q2CNNOjN5Ubyoa28XmpJtDyD9BtGtiEG2TT94rg/edit?usp=sharing
>> <https://mailshield.baidu.com/check?q=%2blIPGMGbYRuMEyUGx9ZQr5J1dIG1UlQn%2foKhALCdvpTABcdfsCF%2feqphsUqfpMIo7PacgdBDy6l9QC%2bTgZsqyACtpv4nZolAb0la8ThaeT5qcuXbdAnaKqgLCfTm8MZMdthX2w%3d%3d>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>>
>

-- 
Ruifeng Zheng
E-mail: zrfli...@gmail.com


Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-15 Thread Ruifeng Zheng
+1

On Thu, Nov 16, 2023 at 8:34 AM Ilan Filonenko  wrote:

> +1 (non-binding)
>
> On Wed, Nov 15, 2023 at 12:57 PM Xiao Li  wrote:
>
>> +1
>>
>> bo yang  于2023年11月15日周三 05:55写道:
>>
>>> +1
>>>
>>> On Tue, Nov 14, 2023 at 7:18 PM huaxin gao 
>>> wrote:
>>>
>>>> +1
>>>>
>>>> On Tue, Nov 14, 2023 at 10:45 AM Holden Karau 
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On Tue, Nov 14, 2023 at 10:21 AM DB Tsai  wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>>>>
>>>>>> On Nov 14, 2023, at 10:14 AM, Vakaris Baškirov <
>>>>>> vakaris.bashki...@gmail.com> wrote:
>>>>>>
>>>>>> +1 (non-binding)
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 14, 2023 at 8:03 PM Chao Sun  wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> On Tue, Nov 14, 2023 at 9:52 AM L. C. Hsieh 
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > +1
>>>>>>> >
>>>>>>> > On Tue, Nov 14, 2023 at 9:46 AM Ye Zhou 
>>>>>>> wrote:
>>>>>>> > >
>>>>>>> > > +1(Non-binding)
>>>>>>> > >
>>>>>>> > > On Tue, Nov 14, 2023 at 9:42 AM L. C. Hsieh 
>>>>>>> wrote:
>>>>>>> > >>
>>>>>>> > >> Hi all,
>>>>>>> > >>
>>>>>>> > >> I’d like to start a vote for SPIP: An Official Kubernetes
>>>>>>> Operator for
>>>>>>> > >> Apache Spark.
>>>>>>> > >>
>>>>>>> > >> The proposal is to develop an official Java-based Kubernetes
>>>>>>> operator
>>>>>>> > >> for Apache Spark to automate the deployment and simplify the
>>>>>>> lifecycle
>>>>>>> > >> management and orchestration of Spark applications and Spark
>>>>>>> clusters
>>>>>>> > >> on k8s at prod scale.
>>>>>>> > >>
>>>>>>> > >> This aims to reduce the learning curve and operation overhead
>>>>>>> for
>>>>>>> > >> Spark users so they can concentrate on core Spark logic.
>>>>>>> > >>
>>>>>>> > >> Please also refer to:
>>>>>>> > >>
>>>>>>> > >>- Discussion thread:
>>>>>>> > >>
>>>>>>> https://lists.apache.org/thread/wdy7jfhf7m8jy74p6s0npjfd15ym5rxz
>>>>>>> > >>- JIRA ticket:
>>>>>>> https://issues.apache.org/jira/browse/SPARK-45923
>>>>>>> > >>- SPIP doc:
>>>>>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >> Please vote on the SPIP for the next 72 hours:
>>>>>>> > >>
>>>>>>> > >> [ ] +1: Accept the proposal as an official SPIP
>>>>>>> > >> [ ] +0
>>>>>>> > >> [ ] -1: I don’t think this is a good idea because …
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> > >> Thank you!
>>>>>>> > >>
>>>>>>> > >> Liang-Chi Hsieh
>>>>>>> > >>
>>>>>>> > >>
>>>>>>> -
>>>>>>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>> > >>
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > --
>>>>>>> > >
>>>>>>> > > Zhou, Ye  周晔
>>>>>>> >
>>>>>>> >
>>>>>>> -
>>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>> >
>>>>>>>
>>>>>>> -
>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>>

-- 
Ruifeng Zheng
E-mail: zrfli...@gmail.com


Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-25 Thread Ruifeng Zheng
+1

On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon  wrote:

> Hi all,
>
> I would like to start the vote for updating documentation hosted for EOL
> and maintenance releases to improve the usability here, and in order for
> end users to read the proper and correct documentation.
>
> For discussion thread, please refer to
> https://lists.apache.org/thread/1675rzxx5x4j2x03t9x0kfph8tlys0cx.
>
> Here is one example:
> - https://github.com/apache/spark/pull/42989
> - https://github.com/apache/spark-website/pull/480
>
> Starting with my own +1.
>


Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-18 Thread Ruifeng Zheng
Thanks Yuanjian for driving this release, Congratulations!

On Mon, Sep 18, 2023 at 2:16 PM Maxim Gekk
 wrote:

> Thank you for the work, Yuanjian!
>
> On Mon, Sep 18, 2023 at 6:28 AM beliefer  wrote:
>
>> Congratulations! Apache Spark.
>>
>>
>>
>> At 2023-09-16 01:01:40, "Yuanjian Li"  wrote:
>>
>> Hi All,
>>
>> We are happy to announce the availability of *Apache Spark 3.5.0*!
>>
>> Apache Spark 3.5.0 is the sixth release of the 3.x line.
>>
>> To download Spark 3.5.0, head over to the download page:
>> https://spark.apache.org/downloads.html
>> (Please note: the PyPi upload is pending due to a size limit request;
>> we're actively following up here
>> <https://github.com/pypi/support/issues/3175> with the PyPi organization)
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-5-0.html
>>
>> We would like to acknowledge all community members for contributing to
>> this
>> release. This release would not have been possible without you.
>>
>> Best,
>> Yuanjian
>>
>>

-- 
Ruifeng Zheng
E-mail: zrfli...@gmail.com


Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Ruifeng Zheng
+1

On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon  wrote:

> +1
>
> On Tue, Sep 12, 2023 at 7:05 AM Xiao Li  wrote:
>
>> +1
>>
>> Xiao
>>
>> Yuanjian Li  于2023年9月11日周一 10:53写道:
>>
>>> @Peter Toth  I've looked into the details of this
>>> issue, and it appears that it's neither a regression in version 3.5.0 nor a
>>> correctness issue. It's a bug related to a new feature. I think we can fix
>>> this in 3.5.1 and list it as a known issue of the Scala client of Spark
>>> Connect in 3.5.0.
>>>
>>> Mridul Muralidharan  于2023年9月10日周日 04:12写道:
>>>

 +1

 Signatures, digests, etc check out fine.
 Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

 Regards,
 Mridul

 On Sat, Sep 9, 2023 at 10:02 AM Yuanjian Li 
 wrote:

> Please vote on releasing the following candidate(RC5) as Apache Spark
> version 3.5.0.
>
> The vote is open until 11:59pm Pacific time Sep 11th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.0
>
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.5.0-rc5 (commit
> ce5ddad990373636e94071e7cef2f31021add07b):
>
> https://github.com/apache/spark/tree/v3.5.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1449
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v3.5.0-rc5-docs/
>
> The list of bug fixes going into 3.5.0 can be found at the following
> URL:
>
> https://issues.apache.org/jira/projects/SPARK/versions/12352848
>
> This release is using the release script of the tag v3.5.0-rc5.
>
>
> FAQ
>
> =
>
> How can I help test this release?
>
> =
>
> If you are a Spark user, you can help us test this release by taking
>
> an existing Spark workload and running on this release candidate, then
>
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
>
> the current RC and see if anything important breaks, in the Java/Scala
>
> you can add the staging repository to your projects resolvers and test
>
> with the RC (make sure to clean up the artifact cache before/after so
>
> you don't end up building with an out of date RC going forward).
>
> ===
>
> What should happen to JIRA tickets still targeting 3.5.0?
>
> ===
>
> The current list of open tickets targeted at 3.5.0 can be found at:
>
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.5.0
>
> Committers should look at those and triage. Extremely important bug
>
> fixes, documentation, and API tweaks that impact compatibility should
>
> be worked on immediately. Everything else please retarget to an
>
> appropriate release.
>
> ==
>
> But my bug isn't fixed?
>
> ==
>
> In order to make timely releases, we will typically not hold the
>
> release unless the bug in question is a regression from the previous
>
> release. That being said, if there is something which is a regression
>
> that has not been correctly targeted please ping me or a committer to
>
> help target the issue.
>
> Thanks,
>
> Yuanjian Li
>



Re: Welcome two new Apache Spark committers

2023-08-06 Thread Ruifeng Zheng
Congratulations! Peter and Xiduo!

On Mon, Aug 7, 2023 at 10:13 AM Xiao Li  wrote:

> Congratulations, Peter and Xiduo!
>
>
>
> Debasish Das  于2023年8月6日周日 19:08写道:
>
>> Congratulations Peter and Xidou.
>>
>> On Sun, Aug 6, 2023, 7:05 PM Wenchen Fan  wrote:
>>
>>> Hi all,
>>>
>>> The Spark PMC recently voted to add two new committers. Please join me
>>> in welcoming them to their new role!
>>>
>>> - Peter Toth (Spark SQL)
>>> - Xiduo You (Spark SQL)
>>>
>>> They consistently make contributions to the project and clearly showed
>>> their expertise. We are very excited to have them join as committers.
>>>
>>


Re: LLM script for error message improvement

2023-08-02 Thread Ruifeng Zheng
+1 from my side, I'm fine to have it as a helper script

On Thu, Aug 3, 2023 at 10:53 AM Hyukjin Kwon  wrote:

> I think adding that dev tool script to improve the error message is fine.
>
> On Thu, 3 Aug 2023 at 10:24, Haejoon Lee
>  wrote:
>
>> Dear contributors, I hope you are doing well!
>>
>> I see there are contributors who are interested in working on error
>> message improvements and persistent contribution, so I want to share an
>> llm-based error message improvement script for helping your contribution.
>>
>> You can find a detail for the script at
>> https://github.com/apache/spark/pull/41711. I believe this can help your
>> error message improvement work, so I encourage you to take a look at the
>> pull request and leverage the script.
>>
>> Please let me know if you have any questions or concerns.
>>
>> Thanks all for your time and contributions!
>>
>> Best regards,
>>
>> Haejoon
>>
>


Re: Time for Spark 3.3.3 release?

2023-07-31 Thread Ruifeng Zheng
+1, thank you Yuming

On Tue, Aug 1, 2023 at 10:40 AM Yuming Wang  wrote:

> Thank you. I will prepare 3.3.3-rc1 soon.
>
> On Sun, Jul 30, 2023 at 12:15 AM Dongjoon Hyun 
> wrote:
>
>> +1
>>
>> Thank you for volunteering, Yuming.
>>
>> Dongjoon
>>
>>
>> On Fri, Jul 28, 2023 at 11:35 AM Yuming Wang  wrote:
>>
>>> Hi Spark devs,
>>>
>>> Since Apache Spark 3.3.2 tag creation (Feb 11), 60 patches
>>>  have
>>> arrived at branch-3.3.
>>>
>>> Shall we make a new release, Apache Spark 3.3.3, as the third release at
>>> branch-3.3?
>>> I'd like to volunteer as the release manager for Apache Spark 3.3.3.
>>>
>>>
>>>


Re: Spark Docker Official Image is now available

2023-07-19 Thread Ruifeng Zheng
Awesome, thank you YiKun for driving this!

On Thu, Jul 20, 2023 at 9:12 AM Hyukjin Kwon  wrote:

> This is amazing, finally!
>
> On Thu, 20 Jul 2023 at 10:10, Yikun Jiang  wrote:
>
>> The spark Docker Official Image is now available:
>> https://hub.docker.com/_/spark
>>
>> $ docker run -it --rm *spark* /opt/spark/bin/spark-shell
>> $ docker run -it --rm *spark*:python3 /opt/spark/bin/pyspark
>> $ docker run -it --rm *spark*:r /opt/spark/bin/sparkR
>>
>> We had a longer review journey than we expected, if you are also
>> interested in this journey, you can see more in:
>>
>> https://github.com/docker-library/official-images/pull/13089
>>
>> Thanks to everyone who helps in the Docker and Apache Spark community!
>>
>> Some background you might want to know:
>> *- apache/spark*: https://hub.docker.com/r/apache/spark, the Apache
>> Spark docker image, will be published by *Apache Spark community* when
>> the Apache Spark is released, no update.
>> *- spark*: https://hub.docker.com/_/spark, the Docker Official Image, it
>> will be published by the *Docker community*, keep active rebuilding for
>> updates and security fixes by the Docker community.
>> - The source repo of *apache/spark *and *spark: *
>> https://github.com/apache/spark-docker
>>
>> See more in:
>> [1] [DISCUSS] SPIP: Support Docker Official Image for Spark:
>> https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3
>> [2] [VOTE] SPIP: Support Docker Official Image for Spark:
>> https://lists.apache.org/thread/ro6olodm1jzdffwjx4oc7ol7oh6kshbl
>> [3] https://github.com/docker-library/official-images/pull/13089
>> [4]
>> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o/
>> [5] https://issues.apache.org/jira/browse/SPARK-40513
>>
>> Regards,
>> Yikun
>>
>

-- 
Ruifeng Zheng
E-mail: zrfli...@gmail.com


Re: [VOTE][SPIP] Python Data Source API

2023-07-09 Thread Ruifeng Zheng
+1

On Mon, Jul 10, 2023 at 8:20 AM Jungtaek Lim 
wrote:

> +1
>
> On Sat, Jul 8, 2023 at 4:13 AM Reynold Xin 
> wrote:
>
>> +1!
>>
>>
>> On Fri, Jul 7 2023 at 11:58 AM, Holden Karau 
>> wrote:
>>
>>> +1
>>>
>>> On Fri, Jul 7, 2023 at 9:55 AM huaxin gao 
>>> wrote:
>>>
 +1

 On Fri, Jul 7, 2023 at 8:59 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> +1 for me
>
> Mich Talebzadeh,
> Solutions Architect/Engineering Lead
> Palantir Technologies Limited
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Fri, 7 Jul 2023 at 11:05, Martin Grund
>  wrote:
>
>> +1 (non-binding)
>>
>> On Fri, Jul 7, 2023 at 12:05 AM Denny Lee 
>> wrote:
>>
>>> +1 (non-binding)
>>>
>>> On Fri, Jul 7, 2023 at 00:50 Maciej  wrote:
>>>
 +0

 Best regards,
 Maciej Szymkiewicz

 Web: https://zero323.net
 PGP: A30CEF0C31A501EC

 On 7/6/23 17:41, Xiao Li wrote:

 +1

 Xiao

 Hyukjin Kwon  于2023年7月5日周三 17:28写道:

> +1.
>
> See https://youtu.be/yj7XlTB1Jvc?t=604 :-).
>
> On Thu, 6 Jul 2023 at 09:15, Allison Wang
> 
>  wrote:
>
>> Hi all,
>>
>> I'd like to start the vote for SPIP: Python Data Source API.
>>
>> The high-level summary for the SPIP is that it aims to introduce
>> a simple API in Python for Data Sources. The idea is to enable Python
>> developers to create data sources without learning Scala or dealing 
>> with
>> the complexities of the current data source APIs. This would make 
>> Spark
>> more accessible to the wider Python developer community.
>>
>> References:
>>
>>- SPIP doc
>>
>> 
>>- JIRA ticket
>>
>>- Discussion thread
>>
>>
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because __.
>>
>> Thanks,
>> Allison
>>
>
>>>
>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>


Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Ruifeng Zheng
+1

On Thu, Jun 22, 2023 at 1:11 PM Dongjoon Hyun 
wrote:

> +1
>
> Dongjoon
>
> On Wed, Jun 21, 2023 at 8:56 PM Hyukjin Kwon  wrote:
>
>> +1
>>
>> On Thu, 22 Jun 2023 at 02:20, Jacek Laskowski  wrote:
>>
>>> +0
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> "The Internals Of" Online Books 
>>> Follow me on https://twitter.com/jaceklaskowski
>>>
>>> 
>>>
>>>
>>> On Wed, Jun 21, 2023 at 5:11 PM Amanda Liu 
>>> wrote:
>>>
 Hi all,

 I'd like to start the vote for SPIP: PySpark Test Framework.

 The high-level summary for the SPIP is that it proposes an official
 test framework for PySpark. Currently, there are only disparate open-source
 repos and blog posts for PySpark testing resources. We can streamline and
 simplify the testing process by incorporating test features, such as a
 PySpark Test Base class (which allows tests to share Spark sessions) and
 test util functions (for example, asserting dataframe and schema equality).

 *SPIP doc:*
 https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v

 *JIRA ticket:* https://issues.apache.org/jira/browse/SPARK-44042

 *Discussion thread:*
 https://lists.apache.org/thread/trwgbgn3ycoj8b8k8lkxko2hql23o41n

 Please vote on the SPIP for the next 72 hours:
 [ ] +1: Accept the proposal as an official SPIP
 [ ] +0
 [ ] -1: I don’t think this is a good idea because __.

 Thank you!

 Best,
 Amanda Liu

>>>


Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-21 Thread Ruifeng Zheng
+1

On Wed, Jun 21, 2023 at 2:26 PM huaxin gao  wrote:

> +1
>
> On Tue, Jun 20, 2023 at 11:21 PM Hyukjin Kwon 
> wrote:
>
>> +1
>>
>> On Wed, 21 Jun 2023 at 14:23, yangjie01  wrote:
>>
>>> +1
>>>
>>>
>>> 在 2023/6/21 13:20,“L. C. Hsieh”>> vii...@gmail.com>> 写入:
>>>
>>>
>>> +1
>>>
>>>
>>> On Tue, Jun 20, 2023 at 8:48 PM Dongjoon Hyun >> > wrote:
>>> >
>>> > +1
>>> >
>>> > Dongjoon
>>> >
>>> > On 2023/06/20 02:51:32 Jia Fan wrote:
>>> > > +1
>>> > >
>>> > > Dongjoon Hyun mailto:dongj...@apache.org>>
>>> 于2023年6月20日周二 10:41写道:
>>> > >
>>> > > > Please vote on releasing the following candidate as Apache Spark
>>> version
>>> > > > 3.4.1.
>>> > > >
>>> > > > The vote is open until June 23rd 1AM (PST) and passes if a
>>> majority +1 PMC
>>> > > > votes are cast, with a minimum of 3 +1 votes.
>>> > > >
>>> > > > [ ] +1 Release this package as Apache Spark 3.4.1
>>> > > > [ ] -1 Do not release this package because ...
>>> > > >
>>> > > > To learn more about Apache Spark, please see
>>> https://spark.apache.org/ 
>>> > > >
>>> > > > The tag to be voted on is v3.4.1-rc1 (commit
>>> > > > 6b1ff22dde1ead51cbf370be6e48a802daae58b6)
>>> > > > https://github.com/apache/spark/tree/v3.4.1-rc1 <
>>> https://github.com/apache/spark/tree/v3.4.1-rc1>
>>> > > >
>>> > > > The release files, including signatures, digests, etc. can be
>>> found at:
>>> > > > https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-bin/ <
>>> https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-bin/>
>>> > > >
>>> > > > Signatures used for Spark RCs can be found in this file:
>>> > > > https://dist.apache.org/repos/dist/dev/spark/KEYS <
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS>
>>> > > >
>>> > > > The staging repository for this release can be found at:
>>> > > >
>>> https://repository.apache.org/content/repositories/orgapachespark-1443/
>>> >> >
>>> > > >
>>> > > > The documentation corresponding to this release can be found at:
>>> > > > https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-docs/ <
>>> https://dist.apache.org/repos/dist/dev/spark/v3.4.1-rc1-docs/>
>>> > > >
>>> > > > The list of bug fixes going into 3.4.1 can be found at the
>>> following URL:
>>> > > > https://issues.apache.org/jira/projects/SPARK/versions/12352874 <
>>> https://issues.apache.org/jira/projects/SPARK/versions/12352874>
>>> > > >
>>> > > > This release is using the release script of the tag v3.4.1-rc1.
>>> > > >
>>> > > > FAQ
>>> > > >
>>> > > > =
>>> > > > How can I help test this release?
>>> > > > =
>>> > > >
>>> > > > If you are a Spark user, you can help us test this release by
>>> taking
>>> > > > an existing Spark workload and running on this release candidate,
>>> then
>>> > > > reporting any regressions.
>>> > > >
>>> > > > If you're working in PySpark you can set up a virtual env and
>>> install
>>> > > > the current RC and see if anything important breaks, in the
>>> Java/Scala
>>> > > > you can add the staging repository to your projects resolvers and
>>> test
>>> > > > with the RC (make sure to clean up the artifact cache before/after
>>> so
>>> > > > you don't end up building with a out of date RC going forward).
>>> > > >
>>> > > > ===
>>> > > > What should happen to JIRA tickets still targeting 3.4.1?
>>> > > > ===
>>> > > >
>>> > > > The current list of open tickets targeted at 3.4.1 can be found at:
>>> > > > https://issues.apache.org/jira/projects/SPARK <
>>> https://issues.apache.org/jira/projects/SPARK> and search for "Target
>>> > > > Version/s" = 3.4.1
>>> > > >
>>> > > > Committers should look at those and triage. Extremely important bug
>>> > > > fixes, documentation, and API tweaks that impact compatibility
>>> should
>>> > > > be worked on immediately. Everything else please retarget to an
>>> > > > appropriate release.
>>> > > >
>>> > > > ==
>>> > > > But my bug isn't fixed?
>>> > > > ==
>>> > > >
>>> > > > In order to make timely releases, we will typically not hold the
>>> > > > release unless the bug in question is a regression from the
>>> previous
>>> > > > release. That being said, if there is something which is a
>>> regression
>>> > > > that has not been correctly targeted please ping me or a committer
>>> to
>>> > > > help target the issue.
>>> > > >
>>> > >
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> dev-unsubscr...@spark.apache.org>
>>> >
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> dev-unsubscr...@spark.apache.org>
>>>
>>>
>>>
>>>
>>>
>>>


Re: [DISCUSS] SPIP: Add PySpark Test Framework

2023-06-14 Thread Ruifeng Zheng
+1 from my side

sounds good, it will be helpful to both users and contributors to improve
the test coverage

On Wed, Jun 14, 2023 at 8:27 AM Hyukjin Kwon  wrote:

> Yeah, I have been thinking about this too, and Holden did some work here
> that this SPIP will reuse. I support this.
>
> On Wed, 14 Jun 2023 at 08:10, Amanda Liu 
> wrote:
>
>> Hi all,
>>
>> I'd like to start a discussion about implementing an official PySpark
>> test framework. Currently, there's no official test framework, but only
>> various open-source repos and blog posts.
>>
>> Many of these open-source resources are very popular, which demonstrates
>> user-demand for PySpark testing capabilities. spark-testing-base
>>  has 1.4k stars, and
>> chispa  has 532k downloads/month.
>> However, it can be confusing for users to piece together disparate
>> resources to write their own PySpark tests (see The Elephant in the
>> Room: How to Write PySpark Tests
>> 
>> ).
>>
>> We can streamline and simplify the testing process by incorporating test
>> features, such as a PySpark Test Base class (which allows tests to share
>> Spark sessions) and test util functions (for example, asserting dataframe
>> and schema equality).
>>
>> Please see the SPIP document attached:
>> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07vAnd
>> the JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44042
>>
>> I would appreciate it if you could share your thoughts on this proposal.
>>
>> Thank you!
>> Amanda Liu
>>
>


Re: Apache Spark 3.4.1 Release?

2023-06-09 Thread Ruifeng Zheng
+1

Thank you Dongjoon!


On Fri, Jun 9, 2023 at 11:54 PM Xiao Li 
wrote:

> +1
>
> On Fri, Jun 9, 2023 at 08:30 Wenchen Fan  wrote:
>
>> +1
>>
>> On Fri, Jun 9, 2023 at 8:52 PM Xinrong Meng  wrote:
>>
>>> +1. Thank you Doonjoon!
>>>
>>> Thanks,
>>>
>>> Xinrong Meng
>>>
>>> Mridul Muralidharan 于2023年6月9日 周五上午5:22写道:
>>>

 +1, thanks Dongjoon !

 Regards,
 Mridul

 On Thu, Jun 8, 2023 at 7:16 PM Jia Fan 
 wrote:

> +1
>
> 
>
>
> Jia Fan
>
>
>
> 2023年6月9日 08:00,Yuming Wang  写道:
>
> +1.
>
> On Fri, Jun 9, 2023 at 7:14 AM Chao Sun  wrote:
>
>> +1 too
>>
>> On Thu, Jun 8, 2023 at 2:34 PM kazuyuki tanimura
>>  wrote:
>> >
>> > +1 (non-binding), Thank you Dongjoon
>> >
>> > Kazu
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
> --
>
>


Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-10 Thread Ruifeng Zheng
+1 (non-binding)


Thank you for driving this release!





RuifengZheng
ruife...@foxmail.com








--Original--
From:   
 "Yuming Wang"  
  
https://spark.apache.org/
  
  The tag to be voted on is v3.2.4-rc1 (commit
  0ae10ac18298d1792828f1d59b652ef17462d76e)
   https://github.com/apache/spark/tree/v3.2.4-rc1
  
  The release files, including signatures, digests, etc. can be found at:
   https://dist.apache.org/repos/dist/dev/spark/v3.2.4-rc1-bin/
  
  Signatures used for Spark RCs can be found in this file:
   https://dist.apache.org/repos/dist/dev/spark/KEYS
  
  The staging repository for this release can be found at:
   https://repository.apache.org/content/repositories/orgapachespark-1442/
  
  The documentation corresponding to this release can be found at:
   https://dist.apache.org/repos/dist/dev/spark/v3.2.4-rc1-docs/
  
  The list of bug fixes going into 3.2.4 can be found at the following URL:
   https://issues.apache.org/jira/projects/SPARK/versions/12352607
  
  This release is using the release script of the tag v3.2.4-rc1.
  
  FAQ
  
  =
  How can I help test this release?
  =
  
  If you are a Spark user, you can help us test this release by taking
  an existing Spark workload and running on this release candidate, then
  reporting any regressions.
  
  If you're working in PySpark you can set up a virtual env and install
  the current RC and see if anything important breaks, in the Java/Scala
  you can add the staging repository to your projects resolvers and test
  with the RC (make sure to clean up the artifact cache before/after so
  you don't end up building with a out of date RC going forward).
  
  ===
  What should happen to JIRA tickets still targeting 3.2.4?
  ===
  
  The current list of open tickets targeted at 3.2.4 can be found at:
   https://issues.apache.org/jira/projects/SPARK and search for "Target
  Version/s" = 3.2.4
  
  Committers should look at those and triage. Extremely important bug
  fixes, documentation, and API tweaks that impact compatibility should
  be worked on immediately. Everything else please retarget to an
  appropriate release.
  
  ==
  But my bug isn't fixed?
  ==
  
  In order to make timely releases, we will typically not hold the
  release unless the bug in question is a regression from the previous
  release. That being said, if there is something which is a regression
  that has not been correctly targeted please ping me or a committer to
  help target the issue.
  
 
 -
 To unsubscribe e-mail:  dev-unsubscr...@spark.apache.org

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread Ruifeng Zheng
+1 (non-binding)




RuifengZheng
ruife...@foxmail.com








--Original--
From:   
 "Kent Yao" 
   http://spark.apache.org/
 
  The tag to be voted on is v3.4.0-rc7 (commit 
87a5442f7ed96b11051d8a9333476d080054e5a0):
  https://github.com/apache/spark/tree/v3.4.0-rc7
 
  The release files, including signatures, digests, etc. 
can be found at:
  
https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-bin/
 
  Signatures used for Spark RCs can be found in this file:
  https://dist.apache.org/repos/dist/dev/spark/KEYS
 
  The staging repository for this release can be found at:
  
https://repository.apache.org/content/repositories/orgapachespark-1441
 
  The documentation corresponding to this release can be 
found at:
  
https://dist.apache.org/repos/dist/dev/spark/v3.4.0-rc7-docs/
 
  The list of bug fixes going into 3.4.0 can be found at 
the following URL:
  
https://issues.apache.org/jira/projects/SPARK/versions/12351465
 
  This release is using the release script of the tag 
v3.4.0-rc7.
 
 
  FAQ
 
  =
  How can I help test this release?
  =
  If you are a Spark user, you can help us test this 
release by taking
  an existing Spark workload and running on this release 
candidate, then
  reporting any regressions.
 
  If you're working in PySpark you can set up a virtual env 
and install
  the current RC and see if anything important breaks, in 
the Java/Scala
  you can add the staging repository to your projects 
resolvers and test
  with the RC (make sure to clean up the artifact cache 
before/after so
  you don't end up building with an out of date RC going 
forward).
 
  ===
  What should happen to JIRA tickets still targeting 3.4.0?
  ===
  The current list of open tickets targeted at 3.4.0 can be 
found at:
  https://issues.apache.org/jira/projects/SPARK and search 
for "Target Version/s" = 3.4.0
 
  Committers should look at those and triage. Extremely 
important bug
  fixes, documentation, and API tweaks that impact 
compatibility should
  be worked on immediately. Everything else please retarget 
to an
  appropriate release.
 
  ==
  But my bug isn't fixed?
  ==
  In order to make timely releases, we will typically not 
hold the
  release unless the bug in question is a regression from 
the previous
  release. That being said, if there is something which is 
a regression
  that has not been correctly targeted please ping me or a 
committer to
  help target the issue.
 
  Thanks,
  Xinrong Meng

 
-
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Apache Spark 3.2.2 Release?

2022-07-07 Thread Ruifeng Zheng
+1 thank you Dongjoon!




RuifengZheng
ruife...@foxmail.com








--Original--
From:   
 "Yikun Jiang"  
  


?????? [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Ruifeng Zheng
+1 (non-binding)


Maxim, thank you for driving this release!


thanks,
ruifeng






----
??: 
   "Chao Sun"   
 
https://lists.apache.org/thread/ksoxmozgz7q728mnxl6c2z7ncmo87vls 
 
  Maxim, thank you for your dedication on these release 
candidates.
 
  Chris Nauroth
 
 
  On Mon, Jun 13, 2022 at 3:21 PM Mridul Muralidharan 
http://spark.apache.org/ 
 
  The tag to be voted on is v3.3.0-rc6 (commit 
f74867bddfbcdd4d08076db36851e88b15e66556):
  https://github.com/apache/spark/tree/v3.3.0-rc6
 
  The release files, including signatures, digests, 
etc. can be found at:
  
https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-bin/ 
 
  Signatures used for Spark RCs can be found in this 
file:
  https://dist.apache.org/repos/dist/dev/spark/KEYS 
 
  The staging repository for this release can be found 
at:
  
https://repository.apache.org/content/repositories/orgapachespark-1407 
 
  The documentation corresponding to this release can 
be found at:
  
https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc6-docs/ 
 
  The list of bug fixes going into 3.3.0 can be found 
at the following URL:
  
https://issues.apache.org/jira/projects/SPARK/versions/12350369 
 
  This release is using the release script of the tag 
v3.3.0-rc6.
 
 
  FAQ
 
  =
  How can I help test this release?
  =
  If you are a Spark user, you can help us test this 
release by taking
  an existing Spark workload and running on this 
release candidate, then
  reporting any regressions.
 
  If you're working in PySpark you can set up a virtual 
env and install
  the current RC and see if anything important breaks, 
in the Java/Scala
  you can add the staging repository to your projects 
resolvers and test
  with the RC (make sure to clean up the artifact cache 
before/after so
  you don't end up building with a out of date RC going 
forward).
 
  ===
  What should happen to JIRA tickets still targeting 
3.3.0?
  ===
  The current list of open tickets targeted at 3.3.0 
can be found at:
  https://issues.apache.org/jira/projects/SPARK; 
and search for "Target Version/s" = 3.3.0
 
  Committers should look at those and triage. Extremely 
important bug
  fixes, documentation, and API tweaks that impact 
compatibility should
  be worked on immediately. Everything else please 
retarget to an
  appropriate release.
 
  ==
  But my bug isn't fixed?
  ==
  In order to make timely releases, we will typically 
not hold the
  release unless the bug in question is a regression 
from the previous
  release. That being said, if there is something which 
is a regression
  that has not been correctly targeted please ping me 
or a committer to
  help target the issue.
 
  Maxim Gekk
 
  Software Engineer
 
  Databricks, Inc.
 
 
 
  --
  Twitter: https://twitter.com/holdenkarau 
  Books (Learning Spark, High Performance Spark, etc.): 
https://amzn.to/2MaRAG9 
  YouTube Live Streams: https://www.youtube.com/user/holdenkarau
 
 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

?????? [VOTE][SPIP] Spark Connect

2022-06-13 Thread Ruifeng Zheng
+1




----
??: 
   "huaxin gao" 
   


回复:Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-16 Thread Ruifeng Zheng
+1, I think it is a good idea




--原始邮件--
发件人:
"Hyukjin Kwon"  
  


回复: [VOTE] Spark 3.1.3 RC4

2022-02-14 Thread Ruifeng Zheng
+1 (non-binding) 



checked the release script issue Dongjoon mentioned:


curl -s 
https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-bin/spark-3.1.3-bin-hadoop2.7.tgz
 | tar tz | grep hadoop-common 

spark-3.1.3-bin-hadoop2.7/jars/hadoop-common-2.7.4




--原始邮件--
发件人:
"Sean Owen" 
   
http://spark.apache.org/

There are currently no open issues targeting 3.1.3 in Spark's 
JIRAhttps://issues.apache.org/jira/browse(try project = SPARK AND "Target 
Version/s" = "3.1.3" AND status in (Open, Reopened, "In 
Progress"))athttps://s.apache.org/n79dw




The tag to be voted on is v3.1.3-rc4 (commit
d1f8a503a26bcfb4e466d9accc5fa241a7933667):
https://github.com/apache/spark/tree/v3.1.3-rc4


The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at
https://repository.apache.org/content/repositories/orgapachespark-1401



The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-docs/

The list of bug fixes going into 3.1.3 can be found at the following URL:
https://s.apache.org/x0q9b

This release is using the release script from3.1.3
The release docker container was rebuilt since the previous version didn't have 
the necessary components to build the R documentation.

FAQ


=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with an out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.1.3?
===

The current list of open tickets targeted at 3.1.3 can be found at:
https://issues.apache.org/jira/projects/SPARKand search for "Target
Version/s" = 3.1.3

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something that is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Note: I added an extra day to the vote since I know some folks are likely busy 
on the 14th with partner(s).






-- 
Twitter:https://twitter.com/holdenkarau

Books (Learning Spark, High Performance Spark, 
etc.):https://amzn.to/2MaRAG9;
YouTube Live Streams:https://www.youtube.com/user/holdenkarau

回复:[ANNOUNCE] Apache Spark 3.2.1 released

2022-01-28 Thread Ruifeng Zheng
It's Great!
Congrats and thanks, huaxin!




-- 原始邮件 --
发件人:
"huaxin gao"

https://spark.apache.org/downloads.html

To view the release notes:
https://spark.apache.org/releases/spark-release-3-2-1.html

We would like to acknowledge all community members for contributing to this
release. This release would not have been possible without you.


Huaxin Gao

回复: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Ruifeng Zheng
+1 (non-binding)



--原始邮件--
发件人:
"Kent Yao"  
  http://spark.apache.org/ The tag to be voted on is v3.2.1-rc2 
(commit
 
4f25b3f71238a00508a356591553f2dfa89f8290):
 
https://github.com/apache/spark/tree/v3.2.1-rc2; 
 
The release files, including signatures, digests, etc. can be found at:
 
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-bin/; 
 
Signatures used for Spark RCs can be found in this file: 
https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository for 
this release can be found at:
 
https://repository.apache.org/content/repositories/orgapachespark-1398/
 

 
The documentation corresponding to this release can be found at:
 
https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc2-docs/_site/; 
 
The list of bug fixes going into 3.2.1 can be found at the following URL:
 
https://s.apache.org/yu0cy
 

 
This release is using the release script of the tag v3.2.1-rc2. FAQ  
= How can I help test this release? 
= If you are a Spark user, you can help us test this 
release by taking an existing Spark workload and running on this release 
candidate, then reporting any regressions. If  you're working in PySpark you 
can set up a virtual env and install the current RC and see if anything 
important breaks, in the Java/Scala you can add the staging repository to your 
projects resolvers and test with the RC (make sure to clean up the artifact  
cache before/after so you don't end up building with a out of date RC going 
forward). === What should happen to 
JIRA tickets still targeting 3.2.1? === 
The current list of open  tickets targeted at 3.2.1 can be found at: 
https://issues.apache.org/jira/projects/SPARK  and search for "Target 
Version/s" = 3.2.1 Committers should look at those and triage. Extremely 
important bug fixes, documentation, and API tweaks that impact compatibility 
should be worked on immediately. Everything else please retarget to an 
appropriate  release. == But my bug isn't fixed? 
== In order to make timely releases, we will typically not hold 
the release unless the bug in question is a regression from the previous 
release. That being said, if there is something which  is a regression that has 
not been correctly targeted please ping me or a committer to help target the 
issue.
 
  
  
  
 
  
 
  
 

 
 
  
 
 
 



-- 
John Zhuge

Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-11 Thread Ruifeng Zheng
+1 (non-binding)

Thanks, ruifeng zheng



--Original--
From:   
 "Cheng Su" 
   
http://spark.apache.org/
 
 There are currently no issues targeting 3.2.1 (try project = SPARK AND
 "Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In Progress"))
 
 The tag to be voted on is v3.2.1-rc1 (commit
 2b0ee226f8dd17b278ad11139e62464433191653):
  
https://github.com/apache/spark/tree/v3.2.1-rc1

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/
 
 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS
 
 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1395/
 
 The documentation corresponding to this release can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/
 
 The list of bug fixes going into 3.2.1 can be found at the following URL:
 https://s.apache.org/7tzik
 
 This release is using the release script of the tag v3.2.1-rc1.
 
 FAQ
 
 
 =
 How can I help test this release?
 =
 
 If you are a Spark user, you can help us test this release by taking
 an existing Spark workload and running on this release candidate, then
 reporting any regressions.
 
 If you're working in PySpark you can set up a virtual env and install
 the current RC and see if anything important breaks, in the Java/Scala
 you can add the staging repository to your projects resolvers and test
 with the RC (make sure to clean up the artifact cache before/after so
 you don't end up building with an out of date RC going forward).
 
 ===
 What should happen to JIRA tickets still targeting 3.2.1?
 ===
 
 The current list of open tickets targeted at 3.2.1 can be found at:
 https://issues.apache.org/jira/projects/SPARKand search for "Target
 Version/s" = 3.2.1
 
 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should
 be worked on immediately. Everything else please retarget to an
 appropriate release.
 
 ==
 But my bug isn't fixed?
 ==
 
 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from the previous
 release. That being said, if there is something which is a regression
 that has not been correctly targeted please ping me or a committer to
 help target the issue.
 
 
 
 
 
  
 


Re: [Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions for RDD

2018-01-31 Thread Ruifeng Zheng
Do you mean in-memory processing? It works fine if all partitions are small. 
But when some partition don’t fit in memory, it will cause OOM. 

 

 

发件人: Reynold Xin <r...@databricks.com>
日期: 2018年2月1日 星期四 下午3:14
收件人: Ruifeng Zheng <ruife...@foxmail.com>
抄送: <dev@spark.apache.org>
主题: Re: [Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions 
for RDD

 

You can just do that with mapPartitions pretty easily can’t you?

 

On Wed, Jan 31, 2018 at 11:08 PM Ruifeng Zheng <ruife...@foxmail.com> wrote:

HI all:

 

   1, Dataset API supports operation “sortWithinPartitions”, but in RDD API 
there is no counterpart (I know there is “repartitionAndSortWithinPartitions”, 
but I don’t want to repartition the RDD), I have to convert RDD to Dataset for 
this function. Would it make sense to add a “sortWithinPartitions” for RDD?

 

   2, In “aggregateByKey”/”reduceByKey”, I want to do some special 
operation (like aggregator compression) after local aggregation on each 
partitions. A similar case may be: compute ‘ApproximatePercentile’ for 
different keys by ”reduceByKey”, it may be helpful if 
‘QuantileSummaries#compress’ is called before network communication. So I 
wonder if it is useful to add a ‘aggregateWithinPartitions’ for RDD?

 

Regards,

Ruifeng

 

 

 

 



[Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions for RDD

2018-01-31 Thread Ruifeng Zheng
HI all:

 

   1, Dataset API supports operation “sortWithinPartitions”, but in RDD API 
there is no counterpart (I know there is “repartitionAndSortWithinPartitions”, 
but I don’t want to repartition the RDD), I have to convert RDD to Dataset for 
this function. Would it make sense to add a “sortWithinPartitions” for RDD?

 

   2, In “aggregateByKey”/”reduceByKey”, I want to do some special 
operation (like aggregator compression) after local aggregation on each 
partitions. A similar case may be: compute ‘ApproximatePercentile’ for 
different keys by ”reduceByKey”, it may be helpful if 
‘QuantileSummaries#compress’ is called before network communication. So I 
wonder if it is useful to add a ‘aggregateWithinPartitions’ for RDD?

 

Regards,

Ruifeng