Lock issue with SQLConf.getConf

2021-09-10 Thread Kohki Nishio
Hello, I'm running spark in local mode and seeing multiple threads showing like below, anybody knows why it's not using a concurrent hash map ? --- "request-handler-dispatcher-19785" #107532 prio=5 os_prio=0 tid=0x7fbd78036000 nid=0x4ebf runnable [0x7fc6e83af000]

Re: Why are in 1 stage most of my executors idle: are tasks within a stage dependent of each other?

2021-09-10 Thread Joris Billen
Thanks for reply! OK, confirmed that tasks can not be dependent of other tasks. From about 30k/80k finished on, he starts to process 7 tasks at once. These all finish very fast, and then he takes 7 new ones, and again, and again. And this to process the remaining 50k. Every once in a while

Multi-schema data pipeline

2021-09-10 Thread Joarley Wanzeler de Moraes
We want to create a Spark-based streaming data pipeline that consumes from a source (e.g. Kinesis), apply some basic transformations, and write the data to a file-based sink (e.g. s3). We have thousands of different event types coming in and the transformations would take place on a set of

Re: Why are in 1 stage most of my executors idle: are tasks within a stage dependent of each other?

2021-09-10 Thread Lalwani, Jayesh
Tasks are never dependent on each other. Stages are dependent on each other. The Spark task manager will make sure that it plans the tasks so that they can run indepdendently. Out of the 80K tasks, how many are complete when you have 7 remaining? Is it 80k - 7 ? It could be that you have data

Why are in 1 stage most of my executors idle: are tasks within a stage dependent of each other?

2021-09-10 Thread Joris Billen
Dear community, I have a job that runs quite well for most stages: resource are consumed quite optimal (not much memoy/vcoresleft idle). My cluster is managed and works well. I end up with 27 executors and have 2 cores for each, so can run 54 tasks. For many stages I see I have a high number of

Re: spark thrift server as hive on spark running on kubernetes, and more.

2021-09-10 Thread Mich Talebzadeh
Hi, Are you implying that the tool uses spark on Kubernetes as the execution engine for Hive. What version of Spark are you running on Kubernetes please and the corresponding version of Hive? Back in 2016, I did some work

RE: spark thrift server as hive on spark running on kubernetes, and more.

2021-09-10 Thread Bode, Meikel, NMA-CFD
Hi, thx. Great work. Will test it  Best, Meikel Bode From: Kidong Lee Sent: Freitag, 10. September 2021 01:39 To: user@spark.apache.org Subject: spark thrift server as hive on spark running on kubernetes, and more. Hi, Recently, I have open-sourced a tool called