Hello,
I'm running spark in local mode and seeing multiple threads showing like
below, anybody knows why it's not using a concurrent hash map ?
---
"request-handler-dispatcher-19785" #107532 prio=5 os_prio=0
tid=0x7fbd78036000 nid=0x4ebf runnable [0x7fc6e83af000]
Thanks for reply!
OK, confirmed that tasks can not be dependent of other tasks.
From about 30k/80k finished on, he starts to process 7 tasks at once. These all
finish very fast, and then he takes 7 new ones, and again, and again. And this
to process the remaining 50k. Every once in a while
We want to create a Spark-based streaming data pipeline that consumes from a
source (e.g. Kinesis), apply some basic transformations, and write the data to
a file-based sink (e.g. s3). We have thousands of different event types coming
in and the transformations would take place on a set of
Tasks are never dependent on each other. Stages are dependent on each other.
The Spark task manager will make sure that it plans the tasks so that they can
run indepdendently.
Out of the 80K tasks, how many are complete when you have 7 remaining? Is it
80k - 7 ? It could be that you have data
Dear community,
I have a job that runs quite well for most stages: resource are consumed quite
optimal (not much memoy/vcoresleft idle). My cluster is managed and works well.
I end up with 27 executors and have 2 cores for each, so can run 54 tasks. For
many stages I see I have a high number of
Hi,
Are you implying that the tool uses spark on Kubernetes as the
execution engine for Hive.
What version of Spark are you running on Kubernetes please and the
corresponding version of Hive?
Back in 2016, I did some work
Hi,
thx. Great work. Will test it
Best,
Meikel Bode
From: Kidong Lee
Sent: Freitag, 10. September 2021 01:39
To: user@spark.apache.org
Subject: spark thrift server as hive on spark running on kubernetes, and more.
Hi,
Recently, I have open-sourced a tool called