Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-27 Thread Qian Sun
Hi Mich,

ImageCache is an alibaba cloud ECI feature[1]. An image cache is a
cluster-level resource that you can use to accelerate the creation of pods
in different namespaces.

If need to update the spark image, imagecache will be created in the
cluster. And specify pod annotation to use image cache[2].


ref:
1.
https://www.alibabacloud.com/help/en/elastic-container-instance/latest/overview-of-the-image-cache-feature?spm=a2c63.p38356.0.0.19977f3e9Xpq4E#topic-2131957
2.
https://www.alibabacloud.com/help/en/ack/serverless-kubernetes/user-guide/use-image-caches-to-accelerate-the-creation-of-pods#section-3e8-8n8-hdh

On Fri, Aug 25, 2023 at 10:08 PM Mich Talebzadeh 
wrote:

> Hi Qian,
>
> How in practice have you implemented image caching for the driver and
> executor pods respectively?
>
> Thanks
>
> On Thu, 24 Aug 2023 at 02:44, Qian Sun  wrote:
>
>> Hi Mich
>>
>> I agree with your opinion that the startup time of the Spark on
>> Kubernetes cluster needs to be improved.
>>
>> Regarding the fetching image directly, I have utilized ImageCache to
>> store the images on the node, eliminating the time required to pull images
>> from a remote repository, which does indeed lead to a reduction in
>> overall time, and the effect becomes more pronounced as the size of the
>> image increases.
>>
>> Additionally, I have observed that the driver pod takes a significant
>> amount of time from running to attempting to create executor pods, with an
>> estimated time expenditure of around 75%. We can also explore optimization
>> options in this area.
>>
>> On Thu, Aug 24, 2023 at 12:58 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> On this conversion, one of the issues I brought up was the driver
>>> start-up time. This is especially true in k8s. As spark on k8s is modeled
>>> on Spark on standalone schedler, Spark on k8s consist of a
>>> single-driver pod (as master on standalone”) and a  number of executors
>>> (“workers”). When executed on k8s, the driver and executors are
>>> executed on separate pods
>>> <https://spark.apache.org/docs/latest/running-on-kubernetes.html>. First
>>> the driver pod is launched, then the driver pod itself launches the
>>> executor pods. From my observation, in an auto scaling cluster, the driver
>>> pod may take up to 40 seconds followed by executor pods. This is a
>>> considerable time for customers and it is painfully slow. Can we actually
>>> move away from dependency on standalone mode and try to speed up k8s
>>> cluster formation.
>>>
>>> Another naive question, when the docker image is pulled from the
>>> container registry to the driver itself, this takes finite time. The docker
>>> image for executors could be different from that of the driver
>>> docker image. Since spark-submit presents this at the time of submission,
>>> can we save time by fetching the docker images straight away?
>>>
>>> Thanks
>>>
>>> Mich
>>>
>>>
>>>view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 8 Aug 2023 at 18:25, Mich Talebzadeh 
>>> wrote:
>>>
>>>> Splendid idea. 
>>>>
>>>> Mich Talebzadeh,
>>>> Solutions Architect/Engineering Lead
>>>> London
>>>> United Kingdom
>>>>
>>>>
>>>>view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from s

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-23 Thread Qian Sun
t;>> *日期**: *2023年8月8日 星期二 06:53
>>>>>>> *抄送**: *dev 
>>>>>>> *主题**: *[Internet]Re: Improving Dynamic Allocation Logic for Spark
>>>>>>> 4+
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On the subject of dynamic allocation, is the following message a
>>>>>>> cause for concern when running Spark on k8s?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> INFO ExecutorAllocationManager: Dynamic allocation is enabled
>>>>>>> without a shuffle service.
>>>>>>>
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>>
>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>
>>>>>>> London
>>>>>>>
>>>>>>> United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 7 Aug 2023 at 23:42, Mich Talebzadeh <
>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> From what I have seen spark on a serverless cluster has hard up
>>>>>>> getting the driver going in a timely manner
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Annotations:  autopilot.gke.io/resource-adjustment:
>>>>>>>
>>>>>>>
>>>>>>> {"input":{"containers":[{"limits":{"memory":"1433Mi"},"requests":{"cpu":"1","memory":"1433Mi"},"name":"spark-kubernetes-driver"}]},"output...
>>>>>>>
>>>>>>>   autopilot.gke.io/warden-version: 2.7.41
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This is on spark 3.4.1 with Java 11 both the host running
>>>>>>> spark-submit and the docker itself
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am not sure how relevant this is to this discussion but it looks
>>>>>>> like a kind of blocker for now. What config params can help here and 
>>>>>>> what
>>>>>>> can be done?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>>
>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>
>>>>>>> London
>>>>>>>
>>>>>>> United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 7 Aug 2023 at 22:39, Holden Karau 
>>>>>>> wrote:
>>>>>>>
>>>>>>> Oh great point
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 7, 2023 at 2:23 PM bo yang  wrote:
>>>>>>>
>>>>>>> Thanks Holden for bringing this up!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Maybe another thing to think about is how to make dynamic allocation
>>>>>>> more friendly with Kubernetes and disaggregated shuffle storage?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 7, 2023 at 1:27 PM Holden Karau 
>>>>>>> wrote:
>>>>>>>
>>>>>>> So I wondering if there is interesting in revisiting some of how
>>>>>>> Spark is doing it's dynamica allocation for Spark 4+?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Some things that I've been thinking about:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> - Advisory user input (e.g. a way to say after X is done I know I
>>>>>>> need Y where Y might be a bunch of GPU machines)
>>>>>>>
>>>>>>> - Configurable tolerance (e.g. if we have at most Z% over target
>>>>>>> no-op)
>>>>>>>
>>>>>>> - Past runs of same job (e.g. stage X of job Y had a peak of K)
>>>>>>>
>>>>>>> - Faster executor launches (I'm a little fuzzy on what we can do
>>>>>>> here but, one area for example is we setup and tear down an RPC 
>>>>>>> connection
>>>>>>> to the driver with a blocking call which does seem to have some locking
>>>>>>> inside of the driver at first glance)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Is this an area other folks are thinking about? Should I make an
>>>>>>> epic we can track ideas in? Or are folks generally happy with today's
>>>>>>> dynamic allocation (or just busy with other things)?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>
>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>
>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>
>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>
>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>
>>>>>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>

-- 
Regards,
Qian Sun


Re: Executor metrics are missing on Prometheus sink

2023-02-13 Thread Qian Sun
Hi Luca,

Thanks for your reply, which is very helpful for me :)

I am trying other metrics sinks with cAdvisor to see the effect. If it
works well, I will share it with the community.

On Fri, Feb 10, 2023 at 4:26 PM Luca Canali  wrote:

> Hi Qian,
>
>
>
> Indeed the metrics available with the Prometheus servlet sink (which is
> marked still as experimental) are limited, compared to the full
> instrumentation, and this is due to the way it is implemented with a
> servlet and cannot be easily extended from what I can see.
>
> You can use another supported metrics sink (see
> https://spark.apache.org/docs/latest/monitoring.html#metrics ) if you
> want to collect all the metrics are exposed by Spark executors.
>
> For example, I use the graphite sink and then collect metrics into an
> InfluxDB instance (see https://github.com/cerndb/spark-dashboard )
>
> An additional comment is that there is room for having more sinks
> available for Apache Spark metrics, notably for InfluxDB and for Prometheus
> (gateway), if someone is interested in working on that.
>
>
>
> Best,
>
> Luca
>
>
>
>
>
> *From:* Qian Sun 
> *Sent:* Friday, February 10, 2023 05:05
> *To:* dev ; user.spark 
> *Subject:* Executor metrics are missing on prometheus sink
>
>
>
> Setting up prometheus sink in this way:
>
> -c spark.ui.prometheus.enabled=true
>
> -c spark.executor.processTreeMetrics.enabled=true
>
> -c spark.metrics.conf=/spark/conf/metric.properties
>
> *metric.properties:*{}
>
> *.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
>
> *.sink.prometheusServlet.path=/metrics/prometheus
>
> Result:
>
> Both of these endpoints have some metrics
>
> :4040/metrics/prometheus
>
> :4040/metrics/executors/prometheus
>
>
>
> But the executor one misses metrics under the executor namespace
> described here:
>
>
> https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor
>
>
>
> *How to expose executor metrics on spark exeuctors pod?*
>
>
>
> *Any help will be appreciated.*
>
> --
>
> Regards,
>
> Qian Sun
>


-- 
Regards,
Qian Sun


Executor metrics are missing on prometheus sink

2023-02-09 Thread Qian Sun
Setting up prometheus sink in this way:

-c spark.ui.prometheus.enabled=true
-c spark.executor.processTreeMetrics.enabled=true
-c spark.metrics.conf=/spark/conf/metric.properties

*metric.properties:*{}

*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus

Result:

Both of these endpoints have some metrics

:4040/metrics/prometheus
:4040/metrics/executors/prometheus


But the executor one misses metrics under the executor namespace described
here:
https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor

*How to expose executor metrics on spark exeuctors pod?*

*Any help will be appreciated.*
-- 
Regards,
Qian Sun


[DISCUSS] SPIP: Introduce Chaos Experiments in Apache Spark

2022-10-24 Thread Qian Sun
Hi, all

I would like to start the discussion about introducing chaos experiments in
apache spark.

This SPIP is proposed to introduce chaos experiments in apache spark to
make sure spark can withstand the unpredictability of in-production
environment, to help developers more quickly identify and resolve issues
that might not be captured with unit and integration testing.

Distributed systems face complex and unpredictable production environments,
such as disk failures, machine power loss, network isolation, and that’s
just the tip of the iceberg. To make distributed systems  more robust, we
need a method to simulate unpredictable failures and test responses to
these failures.


After chaos experiments:

   -

   Increases reliability and resiliency for apache spark.


   -

   Unplanned downtime and outages are far less likely to occur due to
   proactive and constant testing.
   -

   Strengthens system integrity.


It will also help apache spark to expose issues about reliability and
resiliency faster and earlier, making it easier to reproduce user-reported
production issues.

See more in SPIP DOC:
https://docs.google.com/document/d/17dpBLUJcmqqKz7LMoyr4UJgr5t5ZUS3FdwXoybbHDCE

-- 
Best!
Qian Sun


Re: Welcome Yikun Jiang as a Spark committer

2022-10-08 Thread Qian SUN
Congratulations!

Hyukjin Kwon  于2022年10月8日周六 12:40写道:

> Hi all,
>
> The Spark PMC recently added Yikun Jiang as a committer on the project.
> Yikun is the major contributor of the infrastructure and GitHub Actions in
> Apache Spark as well as Kubernates and PySpark.
> He has put a lot of effort into stabilizing and optimizing the builds
> so we all can work together in Apache Spark more
> efficiently and effectively. He's also driving the SPIP for Docker
> official image in Apache Spark as well for users and developers.
> Please join me in welcoming Yikun!
>
>

-- 
Best!
Qian SUN


Re: [VOTE] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Qian SUN
+1

Hyukjin Kwon  于2022年9月22日周四 09:41写道:

> Hi all,
>
> I would like to start a vote for SPIP: "Support Docker Official Image for
> Spark"
>
> The goal of the SPIP is to add Docker Official Image(DOI)
> <https://github.com/docker-library/official-images> to ensure the Spark
> Docker images
> meet the quality standards for Docker images, to provide these Docker
> images for users
> who want to use Apache Spark via Docker image.
>
> Please also refer to:
>
> - Previous discussion in dev mailing list: [DISCUSS] SPIP: Support Docker
> Official Image for Spark
> <https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3>
> - SPIP doc: SPIP: Support Docker Official Image for Spark
> <https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o>
> - JIRA: SPARK-40513 <https://issues.apache.org/jira/browse/SPARK-40513>
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
>

-- 
Best!
Qian SUN


Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-21 Thread Qian SUN
+1.
It's valuable, can I be involved in this work?

Yikun Jiang  于2022年9月19日周一 08:15写道:

> Hi, all
>
> I would like to start the discussion for supporting Docker Official Image
> for Spark.
>
> This SPIP is proposed to add Docker Official Image(DOI)
> <https://github.com/docker-library/official-images> to ensure the Spark
> Docker images meet the quality standards for Docker images, to provide
> these Docker images for users who want to use Apache Spark via Docker image.
>
> There are also several Apache projects that release the Docker Official
> Images <https://hub.docker.com/search?q=apache_filter=official>,
> such as: flink <https://hub.docker.com/_/flink>, storm
> <https://hub.docker.com/_/storm>, solr <https://hub.docker.com/_/solr>,
> zookeeper <https://hub.docker.com/_/zookeeper>, httpd
> <https://hub.docker.com/_/httpd> (with 50M+ to 1B+ download for each).
> From the huge download statistics, we can see the real demands of users,
> and from the support of other apache projects, we should also be able to do
> it.
>
> After support:
>
>-
>
>The Dockerfile will still be maintained by the Apache Spark community
>and reviewed by Docker.
>-
>
>The images will be maintained by the Docker community to ensure the
>quality standards for Docker images of the Docker community.
>
>
> It will also reduce the extra docker images maintenance effort (such as
> frequently rebuilding, image security update) of the Apache Spark community.
>
> See more in SPIP DOC:
> https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o
>
> cc: Ruifeng (co-author) and Hyukjin (shepherd)
>
> Regards,
> Yikun
>


-- 
Best!
Qian SUN


Re: Welcoming three new PMC members

2022-08-11 Thread Qian SUN
Congratulations!

Xinrong Meng  于2022年8月11日周四 00:14写道:

> Congratulations! Well deserved!
>
> On Wed, Aug 10, 2022 at 3:22 AM Bjørn Jørgensen 
> wrote:
>
>> Congratulations :)
>>
>> tir. 9. aug. 2022 kl. 18:40 skrev Xiao Li :
>>
>>> Hi all,
>>>
>>> The Spark PMC recently voted to add three new PMC members. Join me in
>>> welcoming them to their new roles!
>>>
>>> New PMC members: Huaxin Gao, Gengliang Wang and Maxim Gekk
>>>
>>> The Spark PMC
>>>
>>
>>
>> --
>> Bjørn Jørgensen
>> Vestre Aspehaug 4, 6010 Ålesund
>> Norge
>>
>> +47 480 94 297
>>
>

-- 
Best!
Qian SUN


Re: Welcome Xinrong Meng as a Spark committer

2022-08-09 Thread Qian SUN
Congratulations Xinrong!

Regards,
Qian SUN

Yang,Jie(INF)  于2022年8月9日周二 17:10写道:

> Congratulations!
>
>
> Regards,
>
> Yang Jie
>
>
>
>
>
> *发件人**: *Hyukjin Kwon 
> *日期**: *2022年8月9日 星期二 16:12
> *收件人**: *dev 
> *主题**: *Welcome Xinrong Meng as a Spark committer
>
>
>
> Hi all,
>
>
>
> The Spark PMC recently added Xinrong Meng as a committer on the project.
> Xinrong is the major contributor of PySpark especially Pandas API on Spark.
> She has guided a lot of new contributors enthusiastically. Please join me
> in welcoming Xinrong!
>
>
>


-- 
Best!
Qian SUN


Re: Contributions and help needed in SPARK-40005

2022-08-08 Thread Qian SUN
Sure, I will do it. SPARK-40010
<https://issues.apache.org/jira/browse/SPARK-40010> is built to track
progress.

Hyukjin Kwon gurwls...@gmail.com <http://mailto:gurwls...@gmail.com>
于2022年8月9日周二 10:58写道:

Please go ahead. Would be very appreciated.
>
> On Tue, 9 Aug 2022 at 11:58, Qian SUN  wrote:
>
>> Hi Hyukjin
>>
>> I would like to do some work and pick up *Window.py *if possible.
>>
>> Thanks,
>> Qian
>>
>> Hyukjin Kwon  于2022年8月9日周二 10:41写道:
>>
>>> Thanks Khalid for taking a look.
>>>
>>> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov 
>>> wrote:
>>>
>>>> Hi Hyukjin
>>>> That's great initiative, here is a PR that address one of those issues
>>>> that's waiting for review: https://github.com/apache/spark/pull/37408
>>>>
>>>> Perhaps, it would be also good to track these pending issues somewhere
>>>> to avoid effort duplication.
>>>>
>>>> For example, I would like to pick up *union* and *union all* if no
>>>> one has already.
>>>>
>>>> Thanks,
>>>> Khalid
>>>>
>>>>
>>>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon 
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to improve PySpark documentation especially:
>>>>>
>>>>>- Make the examples self-contained, e.g.,
>>>>>
>>>>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>>>- Document Parameters
>>>>>
>>>>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>>>There are many API that misses parameters in PySpark, e.g., 
>>>>> DataFrame.union
>>>>>
>>>>> Here is one example PR I am working on:
>>>>> https://github.com/apache/spark/pull/37437
>>>>> I can't do it all by myself. Any help, review, and contributions
>>>>> would be welcome and appreciated.
>>>>>
>>>>> Thank you all in advance.
>>>>>
>>>>
>>
>> --
>> Best!
>> Qian SUN
>>
> --
Best!
Qian SUN


Re: Contributions and help needed in SPARK-40005

2022-08-08 Thread Qian SUN
Hi Hyukjin

I would like to do some work and pick up *Window.py *if possible.

Thanks,
Qian

Hyukjin Kwon  于2022年8月9日周二 10:41写道:

> Thanks Khalid for taking a look.
>
> On Tue, 9 Aug 2022 at 00:37, Khalid Mammadov 
> wrote:
>
>> Hi Hyukjin
>> That's great initiative, here is a PR that address one of those issues
>> that's waiting for review: https://github.com/apache/spark/pull/37408
>>
>> Perhaps, it would be also good to track these pending issues somewhere to
>> avoid effort duplication.
>>
>> For example, I would like to pick up *union* and *union all* if no
>> one has already.
>>
>> Thanks,
>> Khalid
>>
>>
>> On Mon, Aug 8, 2022 at 1:44 PM Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> I am trying to improve PySpark documentation especially:
>>>
>>>- Make the examples self-contained, e.g.,
>>>https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html
>>>- Document Parameters
>>>
>>> https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html#pandas.DataFrame.pivot.
>>>There are many API that misses parameters in PySpark, e.g., 
>>> DataFrame.union
>>>
>>> Here is one example PR I am working on:
>>> https://github.com/apache/spark/pull/37437
>>> I can't do it all by myself. Any help, review, and contributions
>>> would be welcome and appreciated.
>>>
>>> Thank you all in advance.
>>>
>>

-- 
Best!
Qian SUN


Re: spark driver with OOM due to org.apache.spark.status.ElementTrackingStore

2022-08-04 Thread Qian SUN
I have no other advice for this. Does the situation improve after parameter
configuration?

Jason Jun  于2022年8月5日周五 06:55写道:

> Hi Qian,
>
> Thanks for your feedback. We're using spark ver 3.1.2, these are set :
>
> spark.ui.retainedJobs 10
> spark.ui.retainedStages 10
> spark.ui.retainedTasks 100
>
> I'll set this, spark.ui.dagGraph.retainedRootRDDs, as well.
>
> Any other advice for this?
>
> Thanks
> Jason
>
> On Wed, 3 Aug 2022 at 15:56, Qian Sun  wrote:
>
>> Hi Jason
>> LiveUI initializes ElementTrackingStore with InMemoryStore, so it has OOM
>> risk.
>>
>> /**
>>  * Create an in-memory store for a live application.
>>  */
>> def createLiveStore(
>> conf: SparkConf,
>> appStatusSource: Option[AppStatusSource] = None): AppStatusStore = {
>>   val store = new ElementTrackingStore(new InMemoryStore(), conf)
>>   val listener = new AppStatusListener(store, conf, true, appStatusSource)
>>   new AppStatusStore(store, listener = Some(listener))
>> }
>>
>> In addition to the parameters you mentioned, you can try to reduce the
>> following parameters:
>> * spark.ui.retainedTasks
>> * spark.ui.dagGraph.retainedRootRDDs
>>
>> If you have more information about this situation, it would be good.
>>
>> Best
>> Qian
>>
>>
>> 2022年8月3日 上午11:04,Jason Jun  写道:
>>
>> He there,
>>
>> We have spark driver running 24x7, and we are continiously getting OOM in
>> spark driver every 10 days.
>> I found org.apache.spark.status.ElementTrackingStore keep 85% of
>> heap usage after analyzing heap dump like this image:
>> 
>>
>> i found these parameter would be the root cause in jira ticket,
>> https://issues.apache.org/jira/browse/SPARK-26395
>>
>>- spark.ui.retainedDeadExecutors
>>- spark.ui.retainedJobs
>>- spark.ui.retainedStages
>>
>>
>> But it didn't work. OOM is delayed from 1 week to 10 days with these
>> changes.
>>
>> It would be really appreciated if anyone can give me any solutions.
>>
>> Thanks
>> Jason
>>
>> .
>>
>>
>>

-- 
Best!
Qian SUN


Re: spark driver with OOM due to org.apache.spark.status.ElementTrackingStore

2022-08-02 Thread Qian Sun
Hi Jason
LiveUI initializes ElementTrackingStore with InMemoryStore, so it has OOM risk.

/**
 * Create an in-memory store for a live application.
 */
def createLiveStore(
conf: SparkConf,
appStatusSource: Option[AppStatusSource] = None): AppStatusStore = {
  val store = new ElementTrackingStore(new InMemoryStore(), conf)
  val listener = new AppStatusListener(store, conf, true, appStatusSource)
  new AppStatusStore(store, listener = Some(listener))
}
In addition to the parameters you mentioned, you can try to reduce the 
following parameters:
* spark.ui.retainedTasks
* spark.ui.dagGraph.retainedRootRDDs

If you have more information about this situation, it would be good.

Best
Qian


> 2022年8月3日 上午11:04,Jason Jun  写道:
> 
> He there,
> 
> We have spark driver running 24x7, and we are continiously getting OOM in 
> spark driver every 10 days.
> I found org.apache.spark.status.ElementTrackingStore keep 85% of heap usage 
> after analyzing heap dump like this image:
> 
> 
> i found these parameter would be the root cause in jira ticket, 
> https://issues.apache.org/jira/browse/SPARK-26395 
> 
> spark.ui.retainedDeadExecutors
> spark.ui.retainedJobs
> spark.ui.retainedStages
> 
> But it didn't work. OOM is delayed from 1 week to 10 days with these changes.
> 
> It would be really appreciated if anyone can give me any solutions.
> 
> Thanks
> Jason
> 
> .



Re: [PSA] Please rebase and sync your master branch in your forked repository

2022-06-21 Thread Qian Sun
Thank you Hyukjin

> 2022年6月21日 上午7:45,Hyukjin Kwon  写道:
> 
> After https://github.com/apache/spark/pull/36922 
>  gets merged, it requires your 
> fork's master branch to be synced to the latest master branch in Apache 
> Spark. Otherwise, builds would not be triggered in your PR.
> 



Re: Stickers and Swag

2022-06-14 Thread Qian Sun
GOOD! Could these are mailed to China?

> 2022年6月14日 下午2:04,Xiao Li  写道:
> 
> Hi, all, 
> 
> The ASF has an official store at RedBubble 
>  that Apache Community 
> Development (ComDev) runs. If you are interested in buying Spark Swag, 70 
> products featuring the Spark logo are available: 
> https://www.redbubble.com/shop/ap/113203780 
>  
> 
> Go Spark! 
> 
> Xiao



Maven Test blocks with TransportCipherSuite

2022-05-20 Thread Qian SUN
Hi, team.

I run the maven command to run unit test, and have a NPE.

command: ./build/mvn test
refer to
https://spark.apache.org/docs/latest/building-spark.html#running-tests

NPE is as follow:
22/05/20 16:32:45.450 main WARN AbstractChannelHandlerContext: Failed to
mark a promise as failure because it has succeeded already:
DefaultChannelPromise@366ef90e(success)
java.lang.NullPointerException: null
at
org.apache.spark.network.crypto.TransportCipher$EncryptionHandler.close(TransportCipher.java:137)
~[classes/:?]
at
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:622)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:606)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.DefaultChannelPipeline.close(DefaultChannelPipeline.java:994)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at io.netty.channel.AbstractChannel.close(AbstractChannel.java:280)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.embedded.EmbeddedChannel.close(EmbeddedChannel.java:568)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.embedded.EmbeddedChannel.close(EmbeddedChannel.java:555)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.embedded.EmbeddedChannel.finish(EmbeddedChannel.java:503)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
io.netty.channel.embedded.EmbeddedChannel.finish(EmbeddedChannel.java:483)
~[netty-transport-4.1.77.Final.jar:4.1.77.Final]
at
org.apache.spark.network.crypto.TransportCipherSuite.testBufferNotLeaksOnInternalError(TransportCipherSuite.java:78)
~[test-classes/:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:1.8.0_291]
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_291]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_291]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_291]
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
~[junit-4.13.2.jar:4.13.2]
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
~[junit-4.13.2.jar:4.13.2]
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
~[junit-4.13.2.jar:4.13.2]
at
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:364)
~[surefire-junit4-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
~[surefire-junit4-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:237)
~[surefire-junit4-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:158)
~[surefire-junit4-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
~[surefire-booter-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
~[surefire-booter-3.0.0-M5.jar:3.0.0-M5]
at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
~[surefire-booter-3.0.0-M5.jar:3.0.0-M5]
at
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
~[surefire-booter-3.0.0-M5.jar:3.0.0-M5]


Anyone with same exception?

-- 
Best!
Qian SUN


Re: SIGMOD System Award for Apache Spark

2022-05-12 Thread Qian Sun
Congratulations !!!

> 2022年5月13日 上午3:44,Matei Zaharia  写道:
> 
> Hi all,
> 
> We recently found out that Apache Spark received 
>  the SIGMOD System Award this year, 
> given by SIGMOD (the ACM’s data management research organization) to 
> impactful real-world and research systems. This puts Spark in good company 
> with some very impressive previous recipients 
> . This award is 
> really an achievement by the whole community, so I wanted to say congrats to 
> everyone who contributes to Spark, whether through code, issue reports, docs, 
> or other means.
> 
> Matei



Re: PR builder not working now

2022-04-19 Thread Qian SUN
Thanks for your work!!

Hyukjin Kwon  于2022年4月20日周三 07:42写道:

> It's fixed now.
>
> On Tue, 19 Apr 2022 at 08:33, Hyukjin Kwon  wrote:
>
>> It's still persistent. I will send an email to GitHub support today
>>
>> On Wed, 13 Apr 2022 at 11:04, Dongjoon Hyun 
>> wrote:
>>
>>> Thank you for sharing that information!
>>>
>>> Bests
>>> Dongjoon.
>>>
>>>
>>> On Mon, Apr 11, 2022 at 10:29 PM Hyukjin Kwon 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> There is a bug in GitHub Actions' RESTful API (see
>>>> https://github.com/HyukjinKwon/spark/actions?query=branch%3Adebug-ga-detection
>>>> as an example).
>>>> So, currently OSS PR builder doesn't work properly with showing a
>>>> screen such as
>>>> https://github.com/apache/spark/pull/36157/checks?check_run_id=5984075130
>>>> because we rely on that.
>>>>
>>>> To check the PR builder's status, we should manually find the workflow
>>>> run in PR author's repository for now by going to:
>>>> https://github.com/[PR AUTHOR
>>>> ID]/spark/actions/workflows/build_and_test.yml
>>>>
>>>

-- 
Best!
Qian SUN


Re: [How To] run test suites for specific module

2022-01-24 Thread Qian SUN
Hi Shen

You can use sbt to run a specific suite.

1. run sbt shell.
   $ bash build/sbt
2. specify project.
   sbt > project core
 You can get project name from properties `sbt.project.name` from
pom.xml
3. Finally, you can run a specific suite
   sbt > testOnly org.apache.spark.scheduler.DAGSchedulerSuite

Hope this helps
Best regards,
Qian Sun

Fangjia Shen  于2022年1月25日周二 07:44写道:

> Hello all,
>
> How do you run Spark's test suites when you want to test the correctness
> of your code? Is there a way to run a specific test suite for Spark? For
> example, running test suite XXXSuite alone, instead of every class under
> the test/ directories.
>
> Here's some background info about what I want to do: I'm a graduate
> student trying to study Spark's design and find ways to improve Spark's
> performance by doing Software/Hardware co-design. I'm relatively new to
> Maven and so far struggling to find to a way to properly run Spark's own
> test suites.
>
> Let's say I did some modifications to a XXXExec node which belongs to the
> org.apache.spark.sql package. I want to see if my design passes the test
> cases. What should I do?
>
>
> What command should I use:
>
>  */build/mvn test *  or  */dev/run-tests*  ?
>
> And where should I run that command:
>
> **  or  ** ? - where  is where
> the modified scala file is located, e.g. "/sql/core/".
>
>
> I tried adding -Dtest=XXXSuite to *mvn test *but still get to run tens of
> thousands of tests. This is taking way too much time and unbearable if I'm
> just modifying a few file in a specific module.
>
> I would really appreciate any suggestion or comment.
>
>
> Best regards,
>
> Fangjia Shen
>
> Purdue University
>
>
>
>

-- 
Best!
Qian SUN


Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-11 Thread Qian Sun
+1

Looks good. All integration tests passed.

Qian

> 2022年1月11日 上午2:09,huaxin gao  写道:
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 3.2.1.
> 
> The vote is open until Jan. 13th at 12 PM PST (8 PM UTC) and passes if a 
> majority 
> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
> 
> [ ] +1 Release this package as Apache Spark 3.2.1
> [ ] -1 Do not release this package because ...
> 
> To learn more about Apache Spark, please see http://spark.apache.org/ 
> 
> 
> There are currently no issues targeting 3.2.1 (try project = SPARK AND
> "Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In Progress"))
> 
> The tag to be voted on is v3.2.1-rc1 (commit
> 2b0ee226f8dd17b278ad11139e62464433191653):
> https://github.com/apache/spark/tree/v3.2.1-rc1
>  
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/ 
> 
> 
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1395/ 
> 
> 
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/ 
> 
> 
> The list of bug fixes going into 3.2.1 can be found at the following URL:
> https://s.apache.org/7tzik 
> 
> This release is using the release script of the tag v3.2.1-rc1.
> 
> FAQ
> 
> 
> =
> How can I help test this release?
> =
> 
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
> 
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
> 
> ===
> What should happen to JIRA tickets still targeting 3.2.1?
> ===
> 
> The current list of open tickets targeted at 3.2.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK 
>  and search for "Target
> Version/s" = 3.2.1
> 
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
> 
> ==
> But my bug isn't fixed?
> ==
> 
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.



Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Qian Sun
My understanding is that we don’t need to do anything. Log4j2-core not used in 
spark.

> 2021年12月13日 下午12:45,Pralabh Kumar  写道:
> 
> Hi developers,  users 
> 
> Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on 
> recent CVE detected ?
> 
> 
> Regards
> Pralabh kumar


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org