Re: KubernetesLocalDiskShuffleDataIO mount path dependency doubt.

2023-08-11 Thread Arun Ravi
Hi Dongjoon,

Thank you for sharing about Old Protocol and clearing my doubt. I was able
to understand the difference between Spark 2 & 3. For now
`KubernetesLocalDiskShuffleDataIO` works fine for me.


Thanks,
Arun Ravi M V
B.Tech (Batch: 2010-2014)

Computer Science and Engineering

Govt. Model Engineering College
Cochin University Of Science And Technology
Kochi
arunrav...@gmail.com
+91 9995354581
Skype : arunravimv


On Fri, 11 Aug 2023 at 23:52, Dongjoon Hyun  wrote:

> Hi, Arun.
>
> SPARK-35593 (Support shuffle data recovery on the reused PVCs) was Apache
> Spark 3.2.0 feature whose plugin follows only the legacy Spark shuffle
> directory structure to be safe.
>
> You can see the AS-IS test coverage in the corresponding
> `KubernetesLocalDiskShuffleDataIOSuite`.
>
>
> https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala
>
> To be clear, Apache Spark keeps the supported directory structure without
> any changes for historic reasons.
>
> You can use different structures by simply implementing your own plugin
> like KubernetesLocalDiskShuffleDataIO. It's extensible.
>
> Dongjoon.
>
>
> On Fri, Aug 11, 2023 at 4:52 AM Arun Ravi  wrote:
>
>> Hi Team,
>>
>> I am using the recently released shuffle recovery feature using
>> `KubernetesLocalDiskShuffleDataIO` plugin class on Spark 3.4.1.
>>
>> Can someone explain why the mount path has spark-x/executor-x/ pattern
>> dependency? I got this path detail from this PR
>> . Is it to avoid other
>> folders in the volume ? Also, does this mean the path should use executor
>> ID and spark app id or just hardcoded spark-x/executor-x/? Sorry, I
>> couldn't fully understand the reasoning for this. Any help will be super
>> useful.
>>
>>
>> Arun Ravi M V
>> B.Tech (Batch: 2010-2014)
>>
>> Computer Science and Engineering
>>
>> Govt. Model Engineering College
>> Cochin University Of Science And Technology
>> Kochi
>> arunrav...@gmail.com
>> +91 9995354581
>> Skype : arunravimv
>>
>


Re: KubernetesLocalDiskShuffleDataIO mount path dependency doubt.

2023-08-11 Thread Dongjoon Hyun
Hi, Arun.

SPARK-35593 (Support shuffle data recovery on the reused PVCs) was Apache
Spark 3.2.0 feature whose plugin follows only the legacy Spark shuffle
directory structure to be safe.

You can see the AS-IS test coverage in the corresponding
`KubernetesLocalDiskShuffleDataIOSuite`.

https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/shuffle/KubernetesLocalDiskShuffleDataIOSuite.scala

To be clear, Apache Spark keeps the supported directory structure without
any changes for historic reasons.

You can use different structures by simply implementing your own plugin
like KubernetesLocalDiskShuffleDataIO. It's extensible.

Dongjoon.


On Fri, Aug 11, 2023 at 4:52 AM Arun Ravi  wrote:

> Hi Team,
>
> I am using the recently released shuffle recovery feature using
> `KubernetesLocalDiskShuffleDataIO` plugin class on Spark 3.4.1.
>
> Can someone explain why the mount path has spark-x/executor-x/ pattern
> dependency? I got this path detail from this PR
> . Is it to avoid other
> folders in the volume ? Also, does this mean the path should use executor
> ID and spark app id or just hardcoded spark-x/executor-x/? Sorry, I
> couldn't fully understand the reasoning for this. Any help will be super
> useful.
>
>
> Arun Ravi M V
> B.Tech (Batch: 2010-2014)
>
> Computer Science and Engineering
>
> Govt. Model Engineering College
> Cochin University Of Science And Technology
> Kochi
> arunrav...@gmail.com
> +91 9995354581
> Skype : arunravimv
>


KubernetesLocalDiskShuffleDataIO mount path dependency doubt.

2023-08-11 Thread Arun Ravi
Hi Team,

I am using the recently released shuffle recovery feature using
`KubernetesLocalDiskShuffleDataIO` plugin class on Spark 3.4.1.

Can someone explain why the mount path has spark-x/executor-x/ pattern
dependency? I got this path detail from this PR
. Is it to avoid other folders
in the volume ? Also, does this mean the path should use executor ID and
spark app id or just hardcoded spark-x/executor-x/? Sorry, I couldn't fully
understand the reasoning for this. Any help will be super useful.


Arun Ravi M V
B.Tech (Batch: 2010-2014)

Computer Science and Engineering

Govt. Model Engineering College
Cochin University Of Science And Technology
Kochi
arunrav...@gmail.com
+91 9995354581
Skype : arunravimv


Re: [VOTE] Release Apache Spark 3.3.3 (RC1)

2023-08-11 Thread Mridul Muralidharan
+1

Signatures, digests, etc check out fine.
Checked out tag and build/tested with -Phive -Pyarn -Pmesos -Pkubernetes

Regards,
Mridul


On Fri, Aug 11, 2023 at 2:00 AM Cheng Pan  wrote:

> +1 (non-binding)
>
> Passed integration test with Apache Kyuubi.
>
> Thanks for driving this release.
>
> Thanks,
> Cheng Pan
>
>
> > On Aug 11, 2023, at 06:36, L. C. Hsieh  wrote:
> >
> > +1
> >
> > Thanks Yuming.
> >
> > On Thu, Aug 10, 2023 at 3:24 PM Dongjoon Hyun 
> wrote:
> >>
> >> +1
> >>
> >> Dongjoon
> >>
> >> On 2023/08/10 07:14:07 yangjie01 wrote:
> >>> +1
> >>> Thanks, Jie Yang
> >>>
> >>>
> >>> 发件人: Yuming Wang 
> >>> 日期: 2023年8月10日 星期四 13:33
> >>> 收件人: Dongjoon Hyun 
> >>> 抄送: dev 
> >>> 主题: Re: [VOTE] Release Apache Spark 3.3.3 (RC1)
> >>>
> >>> +1 myself.
> >>>
> >>> On Tue, Aug 8, 2023 at 12:41 AM Dongjoon Hyun  > wrote:
> >>> Thank you, Yuming.
> >>>
> >>> Dongjoon.
> >>>
> >>> On Mon, Aug 7, 2023 at 9:30 AM yangjie01  yangji...@baidu.com>> wrote:
> >>> HI,Dongjoon and Yuming
> >>>
> >>> I submitted a PR a few days ago to try to fix this issue:
> https://github.com/apache/spark/pull/42167<
> https://mailshield.baidu.com/check?q=zJC5kBC6NRCGy3lXApap3GX6%2bKB9Gi%2b%2fTr0LBfwtxiuVHIiRznzQ7iofG2KJFsJB>.
> The reason for the failure is that the branch daily test and the master use
> the same yml file.
> >>>
> >>> Jie Yang
> >>>
> >>> 发件人: Dongjoon Hyun  dongjoon.h...@gmail.com>>
> >>> 日期: 2023年8月8日 星期二 00:18
> >>> 收件人: Yuming Wang mailto:yumw...@apache.org>>
> >>> 抄送: dev mailto:dev@spark.apache.org>>
> >>> 主题: Re: [VOTE] Release Apache Spark 3.3.3 (RC1)
> >>>
> >>> Hi, Yuming.
> >>>
> >>> One of the community GitHub Action test pipelines is unhealthy
> consistently due to Python mypy linter.
> >>>
> >>> https://github.com/apache/spark/actions/workflows/build_branch33.yml<
> https://mailshield.baidu.com/check?q=zL6yo8WBsL15wzkqifGHCZlkv7KqucJxpuNp8neenIT6Re6167OIO8%2fCYlTH0k%2b29wZ%2fDuFIdfwQCHRIDBzTS292DGk6EvIh
> >
> >>>
> >>> It seems due to the pipeline difference between the same Python mypy
> linter already pass in commit build,
> >>>
> >>> Dongjoon.
> >>>
> >>>
> >>> On Fri, Aug 4, 2023 at 8:09 PM Yuming Wang  yumw...@apache.org>> wrote:
> >>> Please vote on releasing the following candidate as Apache Spark
> version 3.3.3.
> >>>
> >>> The vote is open until 11:59pm Pacific time August 10th and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 3.3.3
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see https://spark.apache.org<
> https://mailshield.baidu.com/check?q=cUpKoLnajWahunV4UDIAXHiHyx%2f5wSVGtwwdag%3d%3d
> >
> >>>
> >>> The tag to be voted on is v3.3.3-rc1 (commit
> 8c2b3319c6734250ff9d72f3d7e5cab56b142195):
> >>> https://github.com/apache/spark/tree/v3.3.3-rc1<
> https://mailshield.baidu.com/check?q=8FCIKpLCdZkaDTtrM2i6z6MozYaNPIUxXbtoz6UY4Dd9HDZ%2fGD1yoiMERdI6DE0Tv%2bgl0w%3d%3d
> >
> >>>
> >>> The release files, including signatures, digests, etc. can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.3-rc1-bin<
> https://mailshield.baidu.com/check?q=E6K9wCUIl7R2GWg35cz6FTdyOlAIldH1DzrC5lMm5vEz7tsnGbtOoOh3Xhjgt%2bKmRTfJyMzbsWs8FQuvjrnyEw%3d%3d
> >
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS<
> https://mailshield.baidu.com/check?q=E6fHbSXEWw02TTJBpc3bfA9mi7ea0YiWcNHkm%2fDJxwlaWinGnMdaoO1PahHhgj00vKwcbElpuHA%3d
> >
> >>>
> >>> The staging repository for this release can be found at:
> >>> https://repository.apache.org/content/repositories/orgapachespark-1445
> <
> https://mailshield.baidu.com/check?q=qwIV%2bgL7su%2fhDHaSq3L7D4SvWg6hop35lQ6SmnXKIqkCT%2b5Z2apQOzuDyyPx6aoUTTbwled13%2b5ajYiObU6S6Fie%2bMXccPyMOLOrKg%3d%3d
> >
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.3-rc1-docs<
> https://mailshield.baidu.com/check?q=8J9mpKGDzLZWyCARq00pdYmMTZ7Xg2gOIhMdnfDmdhOphsDhxGAe3BboUHQltnOgRUrIx2ycA8%2b%2fDX2SG1gd6g%3d%3d
> >
> >>>
> >>> The list of bug fixes going into 3.3.3 can be found at the following
> URL:
> >>> https://s.apache.org/rjci4<
> https://mailshield.baidu.com/check?q=CDSiusCyO4bcrg80RMEGb9gnL5P2xcxAWMuq6OOUhbc%3d
> >
> >>>
> >>> This release is using the release script of the tag v3.3.3-rc1.
> >>>
> >>>
> >>> FAQ
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>> If you are a Spark user, you can help us test this release by taking
> >>> an existing Spark workload and running on this release candidate, then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and install
> >>> the current RC and see if anything important breaks, in the Java/Scala
> >>> you can add the staging repository to your projects resolvers and test
> 

Re: [VOTE] Release Apache Spark 3.3.3 (RC1)

2023-08-11 Thread Cheng Pan
+1 (non-binding)

Passed integration test with Apache Kyuubi.

Thanks for driving this release.

Thanks,
Cheng Pan


> On Aug 11, 2023, at 06:36, L. C. Hsieh  wrote:
> 
> +1
> 
> Thanks Yuming.
> 
> On Thu, Aug 10, 2023 at 3:24 PM Dongjoon Hyun  wrote:
>> 
>> +1
>> 
>> Dongjoon
>> 
>> On 2023/08/10 07:14:07 yangjie01 wrote:
>>> +1
>>> Thanks, Jie Yang
>>> 
>>> 
>>> 发件人: Yuming Wang 
>>> 日期: 2023年8月10日 星期四 13:33
>>> 收件人: Dongjoon Hyun 
>>> 抄送: dev 
>>> 主题: Re: [VOTE] Release Apache Spark 3.3.3 (RC1)
>>> 
>>> +1 myself.
>>> 
>>> On Tue, Aug 8, 2023 at 12:41 AM Dongjoon Hyun 
>>> mailto:dongjoon.h...@gmail.com>> wrote:
>>> Thank you, Yuming.
>>> 
>>> Dongjoon.
>>> 
>>> On Mon, Aug 7, 2023 at 9:30 AM yangjie01 
>>> mailto:yangji...@baidu.com>> wrote:
>>> HI,Dongjoon and Yuming
>>> 
>>> I submitted a PR a few days ago to try to fix this issue: 
>>> https://github.com/apache/spark/pull/42167.
>>>  The reason for the failure is that the branch daily test and the master 
>>> use the same yml file.
>>> 
>>> Jie Yang
>>> 
>>> 发件人: Dongjoon Hyun mailto:dongjoon.h...@gmail.com>>
>>> 日期: 2023年8月8日 星期二 00:18
>>> 收件人: Yuming Wang mailto:yumw...@apache.org>>
>>> 抄送: dev mailto:dev@spark.apache.org>>
>>> 主题: Re: [VOTE] Release Apache Spark 3.3.3 (RC1)
>>> 
>>> Hi, Yuming.
>>> 
>>> One of the community GitHub Action test pipelines is unhealthy consistently 
>>> due to Python mypy linter.
>>> 
>>> https://github.com/apache/spark/actions/workflows/build_branch33.yml
>>> 
>>> It seems due to the pipeline difference between the same Python mypy linter 
>>> already pass in commit build,
>>> 
>>> Dongjoon.
>>> 
>>> 
>>> On Fri, Aug 4, 2023 at 8:09 PM Yuming Wang 
>>> mailto:yumw...@apache.org>> wrote:
>>> Please vote on releasing the following candidate as Apache Spark version 
>>> 3.3.3.
>>> 
>>> The vote is open until 11:59pm Pacific time August 10th and passes if a 
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> 
>>> [ ] +1 Release this package as Apache Spark 3.3.3
>>> [ ] -1 Do not release this package because ...
>>> 
>>> To learn more about Apache Spark, please see 
>>> https://spark.apache.org
>>> 
>>> The tag to be voted on is v3.3.3-rc1 (commit 
>>> 8c2b3319c6734250ff9d72f3d7e5cab56b142195):
>>> https://github.com/apache/spark/tree/v3.3.3-rc1
>>> 
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.3-rc1-bin
>>> 
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> 
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1445
>>> 
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.3-rc1-docs
>>> 
>>> The list of bug fixes going into 3.3.3 can be found at the following URL:
>>> https://s.apache.org/rjci4
>>> 
>>> This release is using the release script of the tag v3.3.3-rc1.
>>> 
>>> 
>>> FAQ
>>> 
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>> 
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with an out of date RC going forward).
>>> 
>>> ===
>>> What should happen to JIRA tickets still targeting 3.3.3?
>>> ===
>>> The current list of open tickets targeted at 3.3.3