Re: Viewing UI for spark jobs running on K8s

2023-05-31 Thread Qian Sun
Hi Nikhil

Spark operator supports ingress for exposing all UIs of running spark
applications.

reference:
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/quick-start-guide.md#driver-ui-access-and-ingress

On Thu, Jun 1, 2023 at 6:19 AM Nikhil Goyal  wrote:

> Hi folks,
> Is there an equivalent of the Yarn RM page for Spark on Kubernetes. We can
> port-forward the UI from the driver pod for each but this process is
> tedious given we have multiple jobs running. Is there a clever solution to
> exposing all Driver UIs in a centralized place?
>
> Thanks
> Nikhil
>

-- 
Regards,
Qian Sun


Re: Executor metrics are missing on Prometheus sink

2023-02-13 Thread Qian Sun
Hi Luca,

Thanks for your reply, which is very helpful for me :)

I am trying other metrics sinks with cAdvisor to see the effect. If it
works well, I will share it with the community.

On Fri, Feb 10, 2023 at 4:26 PM Luca Canali  wrote:

> Hi Qian,
>
>
>
> Indeed the metrics available with the Prometheus servlet sink (which is
> marked still as experimental) are limited, compared to the full
> instrumentation, and this is due to the way it is implemented with a
> servlet and cannot be easily extended from what I can see.
>
> You can use another supported metrics sink (see
> https://spark.apache.org/docs/latest/monitoring.html#metrics ) if you
> want to collect all the metrics are exposed by Spark executors.
>
> For example, I use the graphite sink and then collect metrics into an
> InfluxDB instance (see https://github.com/cerndb/spark-dashboard )
>
> An additional comment is that there is room for having more sinks
> available for Apache Spark metrics, notably for InfluxDB and for Prometheus
> (gateway), if someone is interested in working on that.
>
>
>
> Best,
>
> Luca
>
>
>
>
>
> *From:* Qian Sun 
> *Sent:* Friday, February 10, 2023 05:05
> *To:* dev ; user.spark 
> *Subject:* Executor metrics are missing on prometheus sink
>
>
>
> Setting up prometheus sink in this way:
>
> -c spark.ui.prometheus.enabled=true
>
> -c spark.executor.processTreeMetrics.enabled=true
>
> -c spark.metrics.conf=/spark/conf/metric.properties
>
> *metric.properties:*{}
>
> *.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
>
> *.sink.prometheusServlet.path=/metrics/prometheus
>
> Result:
>
> Both of these endpoints have some metrics
>
> :4040/metrics/prometheus
>
> :4040/metrics/executors/prometheus
>
>
>
> But the executor one misses metrics under the executor namespace
> described here:
>
>
> https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor
>
>
>
> *How to expose executor metrics on spark exeuctors pod?*
>
>
>
> *Any help will be appreciated.*
>
> --
>
> Regards,
>
> Qian Sun
>


-- 
Regards,
Qian Sun


Executor metrics are missing on prometheus sink

2023-02-09 Thread Qian Sun
Setting up prometheus sink in this way:

-c spark.ui.prometheus.enabled=true
-c spark.executor.processTreeMetrics.enabled=true
-c spark.metrics.conf=/spark/conf/metric.properties

*metric.properties:*{}

*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus

Result:

Both of these endpoints have some metrics

:4040/metrics/prometheus
:4040/metrics/executors/prometheus


But the executor one misses metrics under the executor namespace described
here:
https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor

*How to expose executor metrics on spark exeuctors pod?*

*Any help will be appreciated.*
-- 
Regards,
Qian Sun


Re: spark on kubernetes

2022-10-16 Thread Qian Sun
Glad to hear it!

On Sun, Oct 16, 2022 at 2:37 PM Mohammad Abdollahzade Arani <
mamadazar...@gmail.com> wrote:

> Hi Qian,
> Thanks for the reply and I'm So sorry for the late reply.
> I found the answer. My mistake was token conversion. I had to decode
> base64  the service accounts token and certificate.
> and you are right I have to use `service account cert` to configure
> spark.kubernetes.authenticate.caCertFile.
> Thanks again. best regards.
>
> On Sat, Oct 15, 2022 at 4:51 PM Qian Sun  wrote:
>
>> Hi Mohammad
>> Did you try this command?
>>
>>
>> ./bin/spark-submit  \ --master k8s://https://vm13:6443 \ --class 
>> com.example.WordCounter  \ --conf 
>> spark.kubernetes.authenticate.driver.serviceAccountName=default  \ 
>> --conf 
>> spark.kubernetes.container.image=private-docker-registery/spark/spark:3.2.1-3
>>  \ --conf spark.kubernetes.namespace=default \ 
>> java-word-count-1.0-SNAPSHOT.jar
>>
>> If you want spark.kubernetes.authenticate.caCertFile, you need to
>> configure it to serviceaccount certFile instead of apiserver certFile.
>>
>> On Sat, Oct 15, 2022 at 8:30 PM Mohammad Abdollahzade Arani
>> mamadazar...@gmail.com <http://mailto:mamadazar...@gmail.com> wrote:
>>
>> I have a k8s cluster and a spark cluster.
>>>  my question is is as bellow:
>>>
>>>
>>> https://stackoverflow.com/questions/74053948/how-to-resolve-pods-is-forbidden-user-systemanonymous-cannot-watch-resourc
>>>
>>> I have searched and I found lot's of other similar questions on
>>> stackoverflow without an answer like  bellow:
>>>
>>>
>>> https://stackoverflow.com/questions/61982896/how-to-fix-pods-is-forbidden-user-systemanonymous-cannot-watch-resource
>>>
>>>
>>> --
>>> Best Wishes!
>>> Mohammad Abdollahzade Arani
>>> Computer Engineering @ SBU
>>>
>>> --
>> Best!
>> Qian Sun
>>
>
>
> --
> Best Wishes!
> Mohammad Abdollahzade Arani
> Computer Engineering @ SBU
>
>

-- 
Best!
Qian Sun


Re: spark on kubernetes

2022-10-15 Thread Qian Sun
Hi Mohammad
Did you try this command?


./bin/spark-submit  \ --master k8s://https://vm13:6443 \
--class com.example.WordCounter  \ --conf
spark.kubernetes.authenticate.driver.serviceAccountName=default  \
--conf 
spark.kubernetes.container.image=private-docker-registery/spark/spark:3.2.1-3
\ --conf spark.kubernetes.namespace=default \
java-word-count-1.0-SNAPSHOT.jar

If you want spark.kubernetes.authenticate.caCertFile, you need to configure
it to serviceaccount certFile instead of apiserver certFile.

On Sat, Oct 15, 2022 at 8:30 PM Mohammad Abdollahzade Arani
mamadazar...@gmail.com <http://mailto:mamadazar...@gmail.com> wrote:

I have a k8s cluster and a spark cluster.
>  my question is is as bellow:
>
>
> https://stackoverflow.com/questions/74053948/how-to-resolve-pods-is-forbidden-user-systemanonymous-cannot-watch-resourc
>
> I have searched and I found lot's of other similar questions on
> stackoverflow without an answer like  bellow:
>
>
> https://stackoverflow.com/questions/61982896/how-to-fix-pods-is-forbidden-user-systemanonymous-cannot-watch-resource
>
>
> --
> Best Wishes!
> Mohammad Abdollahzade Arani
> Computer Engineering @ SBU
>
> --
Best!
Qian Sun


Re: Executor heartbeats on Kubernetes

2022-10-13 Thread Qian SUN
Hi Kristopher

I believe that heartbeat between driver and executors works traditionally
on k8s. spark.kubernetes.executor.missingPodDetectDelta works on executor
lifecycle.

When a registered executor’s POD is missing from the Kubernetes API
server’s polled list of PODs then this delta time is taken as the accepted
time difference between the registration time and the time of the polling.
After this time the POD is considered missing from the cluster and the
executor will be removed.

Best,
Qian

Kristopher Kane  于2022年10月14日周五 01:38写道:

> Due to settings like,
> "spark.kubernetes.executor.missingPodDetectDelta" I've begun to wonder
> about heartbeats on Kubernetes.
>
> Do executors still conduct the traditional heartbeat to the driver
> when run on Kubernetes?
>
> Thanks,
>
> Kris
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-- 
Best!
Qian SUN


Re: Kyro Serializer not getting set : Spark3

2022-09-22 Thread Qian SUN
Hi rajat

I’m guessing you are setting the configuration at runtime, and correct me
if I’m wrong.
Only certain subset of Spark SQL properties (prefixed with spark.sql) can
be set on runtime, please refer to SparkConf.scala
<https://github.com/apache/spark/blob/500f3097111a6bf024acf41400660c199a150350/core/src/main/scala/org/apache/spark/SparkConf.scala#L51-L52>

Once a SparkConf object is passed to Spark, it is cloned and can no longer
be modified by the user. Spark does not support modifying the configuration
at runtime.

So, remaining options have to be set before SparkContext is initalized.

val spark = SparkSession.builder.config("spark.serializer",
"org.apache.spark.serializer.KryoSerializer"").getOrCreate


rajat kumar  于2022年9月23日周五 05:58写道:

> Hello Users,
>
> While using below setting getting exception
>   spark.conf.set("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
>
> User class threw exception: org.apache.spark.sql.AnalysisException: Cannot
> modify the value of a Spark config: spark.serializer at
> org.apache.spark.sql.errors.QueryCompilationErrors$.cannotModifyValueOfSparkConfigError(QueryCompilationErrors.scala:2322)
>
> Can we safely skip setting it or is there any changed way?
>
> Thanks
> Rajat
>
>
>

-- 
Best!
Qian SUN


Re: running pyspark on kubernetes - no space left on device

2022-09-01 Thread Qian SUN
Hi
Spark provides spark.local.dir configuration to specify work folder on the
pod. You can specify spark.local.dir as your mount path.

Best regards

Manoj GEORGE  于2022年9月1日周四 21:16写道:

> CONFIDENTIAL & RESTRICTED
>
> Hi Team,
>
>
>
> I am new to spark, so please excuse my ignorance.
>
>
>
> Currently we are trying to run PySpark on Kubernetes cluster. The setup is
> working fine for some jobs, but when we are processing a large file ( 36
> gb),  we run into one of space issues.
>
>
>
> Based on what was found on internet, we have mapped the local dir to a
> persistent volume. This still doesn’t solve the issue.
>
>
>
> I am not sure if it is still writing to /tmp folder on the pod. Is there
> some other setting which need to be changed for this to work.
>
>
>
> Thanks in advance.
>
>
>
>
>
>
>
> Thanks,
>
> Manoj George
>
> *Manager Database Architecture*​
> M: +1 3522786801
>
> manoj.geo...@amadeus.com
>
> www.amadeus.com
> <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.amadeus.com%2F&data=04%7C01%7C%7C816fea82fee64ec9c05f08d8c7c976a2%7Cb3f4f7c272ce4192aba4d6c7719b5766%7C0%7C0%7C637479015378750789%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=8ilxfJIjh2sdR5HEKHj%2BO3ip2kCFZWHE%2FohZY9MiK9A%3D&reserved=0>
> ​
>
>
> <https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.amadeus.com%2F&data=04%7C01%7C%7C24fba0fe5fb042d0d88d08d8ceb970a0%7Cb3f4f7c272ce4192aba4d6c7719b5766%7C0%7C0%7C637486643149763604%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mapLR2rqtTGuricqiKm7V0SqLQpJUYOjkyWFRQ3QMGs%3D&reserved=0>
>
>
> Disclaimer: This email message and information contained in or attached to
> this message may be privileged, confidential, and protected from disclosure
> and is intended only for the person or entity to which it is addressed. Any
> review, retransmission, dissemination, printing or other use of, or taking
> of any action in reliance upon, this information by persons or entities
> other than the intended recipient is prohibited. If you receive this
> message in error, please immediately inform the sender by reply email and
> delete the message and any attachments. Thank you.
>


-- 
Best!
Qian SUN


Re: Spark with Hive (Standalone) Metastore

2022-07-04 Thread Qian SUN
Hi,
Setting the hive configuration to *spark.hive.xxx* should take effect.
Could you share your configuration?
https://github.com/apache/spark/blob/648457905c4ea7d00e3d88048c63f360045f0714/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L108-L115

And metastore client initialization code is here
https://github.com/apache/spark/blob/v3.0.2/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala

Ankur Khanna  于2022年7月4日周一 20:50写道:

> Hi,
>
>
>
> I am using spark-shell/spark-sql with Hive Metastore (running as a
> standalone process). I am facing a problem where the custom conf I pass
> while starting spark-shell (using –conf) is not being passed-on to the
> metastore with the session.
>
> I’ll appreciate help on how to get the properties passed-on to the
> metastore when starting a spark session.
>
>
>
> Also, it will be super-helpful if someone can point me to the code where
> the metastore client gets initialized when I begin a new spark-shell
> session. (My assumption is that hiveConf should be passed while
> initializing the metastore client instance when a new session is started)
>
>
>
> Spark version : 3.0.2
>
> Hive version : 3.1.2
>
>
>
> Best,
>
> Ankur Khanna
>


-- 
Best!
Qian SUN


Re: Stickers and Swag

2022-06-14 Thread Qian Sun
GOOD! Could these are mailed to China?

> 2022年6月14日 下午2:04,Xiao Li  写道:
> 
> Hi, all, 
> 
> The ASF has an official store at RedBubble 
>  that Apache Community 
> Development (ComDev) runs. If you are interested in buying Spark Swag, 70 
> products featuring the Spark logo are available: 
> https://www.redbubble.com/shop/ap/113203780 
>  
> 
> Go Spark! 
> 
> Xiao



Re: A scene with unstable Spark performance

2022-05-17 Thread Qian SUN
Hi. I think you need Spark dynamic resource allocation. Please refer to
https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
.
And If you use Spark SQL, AQE maybe help.
https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution

Bowen Song  于2022年5月17日周二 22:33写道:

> Hi all,
>
>
>
> I find Spark performance is unstable in this scene: we divided the jobs
> into two groups according to the job completion time. One group of jobs had
> an execution time of less than 10s, and the other group of jobs had an
> execution time from 10s to 300s. The reason for the difference is that the
> latter will scan more files, that is, the number of tasks will be larger.
> When the two groups of jobs were submitted to Spark for execution, I found
> that due to resource competition, the existence of the slower jobs made the
> original faster job take longer to return the result, which manifested as
> unstable Spark performance. The problem I want to solve is: Can we reserve
> certain resources for each of the two groups, so that the fast jobs can be
> scheduled in time, and the slow jobs will not be starved to death because
> the resources are completely allocated to the fast jobs.
>
>
>
> In this context, I need to group spark jobs, and the tasks from different
> groups of jobs can be scheduled using group reserved resources. At the
> beginning of each round of scheduling, tasks in this group will be
> scheduled first, only when there are no tasks in this group to schedule,
> its resources can be allocated to other groups to avoid idling of resources.
>
>
>
> For the consideration of resource utilization and the overhead of managing
> multiple clusters, I hope that the jobs can share the spark cluster, rather
> than creating private clusters for the groups.
>
>
>
> I've read the code for the Spark Fair Scheduler, and the implementation
> doesn't seem to meet the need to reserve resources for different groups of
> job.
>
>
>
> Is there a workaround that can solve this problem through Spark Fair
> Scheduler? If it can't be solved, would you consider adding a mechanism
> like capacity scheduling.
>
>
>
> Thank you,
>
> Bowen Song
>


-- 
Best!
Qian SUN


Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Qian Sun
My understanding is that we don’t need to do anything. Log4j2-core not used in 
spark.

> 2021年12月13日 下午12:45,Pralabh Kumar  写道:
> 
> Hi developers,  users 
> 
> Spark is built using log4j 1.2.17 . Is there a plan to upgrade based on 
> recent CVE detected ?
> 
> 
> Regards
> Pralabh kumar


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: creating database issue

2021-12-07 Thread Qian Sun
Hi,
 
  It seems to be a hms question. Would u like to provide the information about 
spark version, hive version and spark application configuration?

Best

> 2021年12月8日 上午9:04,bitfox  写道:
> 
> sorry I am newbie to spark.
> 
> When I created a database in pyspark shell following the book content of 
> learning spark 2.0, it gets:
> 
> >>> spark.sql("CREATE DATABASE learn_spark_db")
> 21/12/08 09:01:34 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout 
> does not exist
> 21/12/08 09:01:34 WARN HiveConf: HiveConf of name hive.stats.retries.wait 
> does not exist
> 21/12/08 09:01:39 WARN ObjectStore: Version information not found in 
> metastore. hive.metastore.schema.verification is not enabled so recording the 
> schema version 2.3.0
> 21/12/08 09:01:39 WARN ObjectStore: setMetaStoreSchemaVersion called but 
> recording version is disabled: version = 2.3.0, comment = Set by MetaStore 
> pyh@185.213.174.249
> 21/12/08 09:01:40 WARN ObjectStore: Failed to get database default, returning 
> NoSuchObjectException
> 21/12/08 09:01:40 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> 21/12/08 09:01:40 WARN ObjectStore: Failed to get database learn_spark_db, 
> returning NoSuchObjectException
> 
> Can you point to me where is wrong?
> 
> Thanks.
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org