Re: [spark on yarn] spark on yarn without DFS

2019-05-23 Thread Achilleus 003
This is interesting. Would really appreciate it if you could share what exactly did you change in* core-site.xml *and *yarn-site.xml.* On Wed, May 22, 2019 at 9:14 AM Gourav Sengupta wrote: > just wondering what is the advantage of doing this? > > Regards > Gourav Sengupta > > On Wed, May 22,

Re: Executors idle, driver heap exploding and maxing only 1 cpu core

2019-05-23 Thread Nicholas Hakobian
One potential case that can cause this is the optimizer being a little overzealous with determining if a table can be broadcasted or not. Have you checked the UI or query plan to see if any steps include a BroadcastHashJoin? Its possible that the optimizer thinks that it should be able to fit the

unsubscribe

2019-05-23 Thread Mun, Woyou - US

Executors idle, driver heap exploding and maxing only 1 cpu core

2019-05-23 Thread Ashic Mahtab
Hi, We have a quite long winded Spark application we inherited with many stages. When we run on our spark cluster, things start off well enough. Workers are busy, lots of progress made, etc. etc. However, 30 minutes into processing, we see CPU usage of the workers drop drastically. At this

PySpark Streaming “PicklingError: Could not serialize object” when use transform operator and checkpoint enabled

2019-05-23 Thread Xilang Yan
In PySpark streaming, if checkpoint enabled, and if use a stream.transform operator to join with another rdd, “PicklingError: Could not serialize object” will be thrown. I have asked the same question at stackoverflow: