Re: Long-Running Spark application doesn't clean old shuffle data correctly

2019-07-23 Thread Alex Landa
Hi Keith, I don't think that we keep such references. But we do experience exceptions during the job execution that we catch and retry (timeouts/network issues from different data sources). Can they affect RDD cleanup? Thanks, Alex On Sun, Jul 21, 2019 at 10:49 PM Keith Chapman wrote: > Hi

Apache Spark Log4j logging applicationId

2019-07-23 Thread Luca Borin
Hi, I would like to add the applicationId to all logs produced by Spark through Log4j. Consider that I have a cluster with several jobs running in it, so the presence of the applicationId would be useful to logically divide them. I have found a partial solution. If I change the layout of the

Re: Spark 2.3 Dataframe Grouby operation throws IllegalArgumentException on Large dataset

2019-07-23 Thread Balakumar iyer S
Hi Bobby Evans, I apologise for the delayed response , yes you are right I missed out to paste the complete stack trace exception. Here with I have attached the complete yarn log for the same. Thank you , It would be helpful if you guys could assist me on this error.

Re: Avro large binary read memory problem

2019-07-23 Thread Nicolas Paris
On Tue, Jul 23, 2019 at 05:10:19PM +, Mario Amatucci wrote: > https://spark.apache.org/docs/2.2.0/configuration.html#memory-management thanks for the pointer, however, I tried almost every configuration and the behavior tends to show that spark keeps things in memory instead of releasing it

RE: Avro large binary read memory problem

2019-07-23 Thread Mario Amatucci
https://spark.apache.org/docs/2.2.0/configuration.html#memory-management MARIO AMATUCCI Senior Software Engineer   Office: +48 12 881 10 05 x 31463   Email: mario_amatu...@epam.com Gdansk, Poland   epam.com   ~do more with less~   CONFIDENTIALITY CAUTION AND DISCLAIMER This message is

Avro large binary read memory problem

2019-07-23 Thread Nicolas Paris
Hi I have those avro file with the schema id:Long, content:Binary the binary are large image with a maximum of 2GB of size. I d like to get a subset of row "where id in (...)" Sadly I get memory errors even if the subset is 0 of size. It looks like the reader stores the binary information