Re: Apache Spark 3.3 Release

2022-03-03 Thread Jungtaek Lim
Thanks Maxim for volunteering to drive the release! I support the plan
(March 15th) to perform a release branch cut.

Btw, would we be open for modification of critical/blocker issues after the
release branch cut? I have a blocker JIRA ticket and the PR is open for
reviewing, but need some time to gain traction as well as going through
actual reviews. My guess is yes but to confirm again.

On Fri, Mar 4, 2022 at 4:20 AM Dongjoon Hyun 
wrote:

> Thank you, Max, for volunteering for Apache Spark 3.3 release manager.
>
> Ya, I'm also +1 for the original plan.
>
> Dongjoon
>
> On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan 
> wrote:
>
>>
>> Agree with Sean, code freeze by mid March sounds good.
>>
>> Regards,
>> Mridul
>>
>> On Thu, Mar 3, 2022 at 12:47 PM Sean Owen  wrote:
>>
>>> I think it's fine to pursue the existing plan - code freeze in two weeks
>>> and try to close off key remaining issues. Final release pending on how
>>> those go, and testing, but fine to get the ball rolling.
>>>
>>> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>>>  wrote:
>>>
 Hello All,

 I would like to bring on the table the theme about the new Spark
 release 3.3. According to the public schedule at
 https://spark.apache.org/versioning-policy.html, we planned to start
 the code freeze and release branch cut on March 15th, 2022. Since this date
 is coming soon, I would like to take your attention on the topic and gather
 objections that you might have.

 Bellow is the list of ongoing and active SPIPs:

 Spark SQL:
 - [SPARK-31357] DataSourceV2: Catalog API for view metadata
 - [SPARK-35801] Row-level operations in Data Source V2
 - [SPARK-37166] Storage Partitioned Join

 Spark Core:
 - [SPARK-20624] Add better handling for node shutdown
 - [SPARK-25299] Use remote storage for persisting shuffle data

 PySpark:
 - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark

 Kubernetes:
 - [SPARK-36057] Support Customized Kubernetes Schedulers

 Probably, we should finish if there are any remaining works for Spark
 3.3, and switch to QA mode, cut a branch and keep everything on track. I
 would like to volunteer to help drive this process.

 Best regards,
 Max Gekk

>>>


Re: Apache Spark 3.3 Release

2022-03-03 Thread Dongjoon Hyun
Thank you, Max, for volunteering for Apache Spark 3.3 release manager.

Ya, I'm also +1 for the original plan.

Dongjoon

On Thu, Mar 3, 2022 at 10:52 AM Mridul Muralidharan 
wrote:

>
> Agree with Sean, code freeze by mid March sounds good.
>
> Regards,
> Mridul
>
> On Thu, Mar 3, 2022 at 12:47 PM Sean Owen  wrote:
>
>> I think it's fine to pursue the existing plan - code freeze in two weeks
>> and try to close off key remaining issues. Final release pending on how
>> those go, and testing, but fine to get the ball rolling.
>>
>> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>>  wrote:
>>
>>> Hello All,
>>>
>>> I would like to bring on the table the theme about the new Spark release
>>> 3.3. According to the public schedule at
>>> https://spark.apache.org/versioning-policy.html, we planned to start
>>> the code freeze and release branch cut on March 15th, 2022. Since this date
>>> is coming soon, I would like to take your attention on the topic and gather
>>> objections that you might have.
>>>
>>> Bellow is the list of ongoing and active SPIPs:
>>>
>>> Spark SQL:
>>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>>> - [SPARK-35801] Row-level operations in Data Source V2
>>> - [SPARK-37166] Storage Partitioned Join
>>>
>>> Spark Core:
>>> - [SPARK-20624] Add better handling for node shutdown
>>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>>
>>> PySpark:
>>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>>
>>> Kubernetes:
>>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>>
>>> Probably, we should finish if there are any remaining works for Spark
>>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>>> would like to volunteer to help drive this process.
>>>
>>> Best regards,
>>> Max Gekk
>>>
>>


Re: Apache Spark 3.3 Release

2022-03-03 Thread Mridul Muralidharan
Agree with Sean, code freeze by mid March sounds good.

Regards,
Mridul

On Thu, Mar 3, 2022 at 12:47 PM Sean Owen  wrote:

> I think it's fine to pursue the existing plan - code freeze in two weeks
> and try to close off key remaining issues. Final release pending on how
> those go, and testing, but fine to get the ball rolling.
>
> On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
>  wrote:
>
>> Hello All,
>>
>> I would like to bring on the table the theme about the new Spark release
>> 3.3. According to the public schedule at
>> https://spark.apache.org/versioning-policy.html, we planned to start the
>> code freeze and release branch cut on March 15th, 2022. Since this date is
>> coming soon, I would like to take your attention on the topic and gather
>> objections that you might have.
>>
>> Bellow is the list of ongoing and active SPIPs:
>>
>> Spark SQL:
>> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
>> - [SPARK-35801] Row-level operations in Data Source V2
>> - [SPARK-37166] Storage Partitioned Join
>>
>> Spark Core:
>> - [SPARK-20624] Add better handling for node shutdown
>> - [SPARK-25299] Use remote storage for persisting shuffle data
>>
>> PySpark:
>> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>>
>> Kubernetes:
>> - [SPARK-36057] Support Customized Kubernetes Schedulers
>>
>> Probably, we should finish if there are any remaining works for Spark
>> 3.3, and switch to QA mode, cut a branch and keep everything on track. I
>> would like to volunteer to help drive this process.
>>
>> Best regards,
>> Max Gekk
>>
>


Re: Apache Spark 3.3 Release

2022-03-03 Thread Sean Owen
I think it's fine to pursue the existing plan - code freeze in two weeks
and try to close off key remaining issues. Final release pending on how
those go, and testing, but fine to get the ball rolling.

On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk
 wrote:

> Hello All,
>
> I would like to bring on the table the theme about the new Spark release
> 3.3. According to the public schedule at
> https://spark.apache.org/versioning-policy.html, we planned to start the
> code freeze and release branch cut on March 15th, 2022. Since this date is
> coming soon, I would like to take your attention on the topic and gather
> objections that you might have.
>
> Bellow is the list of ongoing and active SPIPs:
>
> Spark SQL:
> - [SPARK-31357] DataSourceV2: Catalog API for view metadata
> - [SPARK-35801] Row-level operations in Data Source V2
> - [SPARK-37166] Storage Partitioned Join
>
> Spark Core:
> - [SPARK-20624] Add better handling for node shutdown
> - [SPARK-25299] Use remote storage for persisting shuffle data
>
> PySpark:
> - [SPARK-26413] RDD Arrow Support in Spark Core and PySpark
>
> Kubernetes:
> - [SPARK-36057] Support Customized Kubernetes Schedulers
>
> Probably, we should finish if there are any remaining works for Spark 3.3,
> and switch to QA mode, cut a branch and keep everything on track. I would
> like to volunteer to help drive this process.
>
> Best regards,
> Max Gekk
>


Apache Spark 3.3 Release

2022-03-03 Thread Maxim Gekk
Hello All,

I would like to bring on the table the theme about the new Spark release
3.3. According to the public schedule at
https://spark.apache.org/versioning-policy.html, we planned to start the
code freeze and release branch cut on March 15th, 2022. Since this date is
coming soon, I would like to take your attention on the topic and gather
objections that you might have.

Bellow is the list of ongoing and active SPIPs:

Spark SQL:
- [SPARK-31357] DataSourceV2: Catalog API for view metadata
- [SPARK-35801] Row-level operations in Data Source V2
- [SPARK-37166] Storage Partitioned Join

Spark Core:
- [SPARK-20624] Add better handling for node shutdown
- [SPARK-25299] Use remote storage for persisting shuffle data

PySpark:
- [SPARK-26413] RDD Arrow Support in Spark Core and PySpark

Kubernetes:
- [SPARK-36057] Support Customized Kubernetes Schedulers

Probably, we should finish if there are any remaining works for Spark 3.3,
and switch to QA mode, cut a branch and keep everything on track. I would
like to volunteer to help drive this process.

Best regards,
Max Gekk


Re: Spark Streaming | Dynamic Action Support

2022-03-03 Thread Mich Talebzadeh
In short, I don't think there is such a possibility. However, there is the
option of shutting down spark gracefully with checkpoint directory enabled.
In such a way you can  re-submit the modified code which will pick up
BatchID from where it was left off, assuming the topic is the same. See the
thread
"How to gracefully shutdown Spark Structured Streaming" in
https://lists.apache.org/list.html?u...@spark.apache.org

HTH



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 3 Mar 2022 at 15:49, Mich Talebzadeh 
wrote:

> What is the definition of action here?
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 3 Mar 2022 at 10:56, Pappu Yadav  wrote:
>
>> Hi,
>>
>> Is there any way I can add/delete actions/jobs dynamically in a running
>> spark streaming job.
>> I will call an API and execute only the configured actions in the system.
>>
>> Eg . In the first batch suppose there are 5 actions in the spark
>> application.
>> Now suppose some configuration is changed and one action is added and one
>> is deleted.
>> How can i handle this in the spark streaming job without restarting the
>> application
>>
>


Re: Spark Streaming | Dynamic Action Support

2022-03-03 Thread Mich Talebzadeh
What is the definition of action here?



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 3 Mar 2022 at 10:56, Pappu Yadav  wrote:

> Hi,
>
> Is there any way I can add/delete actions/jobs dynamically in a running
> spark streaming job.
> I will call an API and execute only the configured actions in the system.
>
> Eg . In the first batch suppose there are 5 actions in the spark
> application.
> Now suppose some configuration is changed and one action is added and one
> is deleted.
> How can i handle this in the spark streaming job without restarting the
> application
>


Spark Streaming | Dynamic Action Support

2022-03-03 Thread Pappu Yadav
Hi,

Is there any way I can add/delete actions/jobs dynamically in a running
spark streaming job.
I will call an API and execute only the configured actions in the system.

Eg . In the first batch suppose there are 5 actions in the spark
application.
Now suppose some configuration is changed and one action is added and one
is deleted.
How can i handle this in the spark streaming job without restarting the
application


Re: `running-on-kubernetes` page render bad in v3.2.1(latest) website

2022-03-03 Thread Yikun Jiang
It already has been fixed by: https://github.com/apache/spark/pull/35572

Sorry for bothering here. Just ignore my previous email.