date:20190723

Re: Long-Running Spark application doesn't clean old shuffle data correctly

2019-07-23 Thread Alex Landa

Hi Keith, I don't think that we keep such references. But we do experience exceptions during the job execution that we catch and retry (timeouts/network issues from different data sources). Can they affect RDD cleanup? Thanks, Alex On Sun, Jul 21, 2019 at 10:49 PM Keith Chapman wrote: > Hi

Apache Spark Log4j logging applicationId

2019-07-23 Thread Luca Borin

Hi, I would like to add the applicationId to all logs produced by Spark through Log4j. Consider that I have a cluster with several jobs running in it, so the presence of the applicationId would be useful to logically divide them. I have found a partial solution. If I change the layout of the

Re: Spark 2.3 Dataframe Grouby operation throws IllegalArgumentException on Large dataset

2019-07-23 Thread Balakumar iyer S

Hi Bobby Evans, I apologise for the delayed response , yes you are right I missed out to paste the complete stack trace exception. Here with I have attached the complete yarn log for the same. Thank you , It would be helpful if you guys could assist me on this error.

Re: Avro large binary read memory problem

2019-07-23 Thread Nicolas Paris

On Tue, Jul 23, 2019 at 05:10:19PM +, Mario Amatucci wrote: > https://spark.apache.org/docs/2.2.0/configuration.html#memory-management thanks for the pointer, however, I tried almost every configuration and the behavior tends to show that spark keeps things in memory instead of releasing it

RE: Avro large binary read memory problem

2019-07-23 Thread Mario Amatucci

https://spark.apache.org/docs/2.2.0/configuration.html#memory-management MARIO AMATUCCI Senior Software Engineer Office: +48 12 881 10 05 x 31463 Email: mario_amatu...@epam.com Gdansk, Poland epam.com ~do more with less~ CONFIDENTIALITY CAUTION AND DISCLAIMER This message is

Avro large binary read memory problem

2019-07-23 Thread Nicolas Paris

Hi I have those avro file with the schema id:Long, content:Binary the binary are large image with a maximum of 2GB of size. I d like to get a subset of row "where id in (...)" Sadly I get memory errors even if the subset is 0 of size. It looks like the reader stores the binary information

Re: Long-Running Spark application doesn't clean old shuffle data correctly

Apache Spark Log4j logging applicationId

Re: Spark 2.3 Dataframe Grouby operation throws IllegalArgumentException on Large dataset

Re: Avro large binary read memory problem

RE: Avro large binary read memory problem

Avro large binary read memory problem

6 matches

Site Navigation

Mail list logo

Footer information