Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

2022-03-30 Thread Enrico Minack
> Wrt looping: if I want to process 3 years of data, my modest cluster will never do it one go , I would expect? > I have to break it down in smaller pieces and run that in a loop (1 day is already lots of data). Well, that is exactly what Spark is made for. It splits the work up and

Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

2022-03-30 Thread Bjørn Jørgensen
It`s quite impossible for anyone to answer your question about what is eating your memory, without even knowing what language you are using. If you are using C then it`s always pointers, that's the mem issue. If you are using python, there can be some like not using context manager like With

Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

2022-03-30 Thread Joris Billen
Thanks for answer-much appreciated! This forum is very useful :-) I didnt know the sparkcontext stays alive. I guess this is eating up memory. The eviction means that he knows that he should clear some of the old cached memory to be able to store new one. In case anyone has good articles about

Re: loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

2022-03-30 Thread Sean Owen
The Spark context does not stop when a job does. It stops when you stop it. There could be many ways mem can leak. Caching maybe - but it will evict. You should be clearing caches when no longer needed. I would guess it is something else your program holds on to in its logic. Also consider not

loop of spark jobs leads to increase in memory on worker nodes and eventually faillure

2022-03-30 Thread Joris Billen
Hi, I have a pyspark job submitted through spark-submit that does some heavy processing for 1 day of data. It runs with no errors. I have to loop over many days, so I run this spark job in a loop. I notice after couple executions the memory is increasing on all worker nodes and eventually this

RE: [EXTERNAL] Re: spark ETL and spark thrift server running together

2022-03-30 Thread Alex Kosberg
Hi Christophe, Thank you for the explanation! Regards, Alex From: Christophe Préaud Sent: Wednesday, March 30, 2022 3:43 PM To: Alex Kosberg ; user@spark.apache.org Subject: [EXTERNAL] Re: spark ETL and spark thrift server running together Hi Alex, As stated in the Hive documentation

Re: spark ETL and spark thrift server running together

2022-03-30 Thread Christophe Préaud
Hi Alex, As stated in the Hive documentation (https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration): *An embedded metastore database is mainly used for unit tests. Only one process can connect to the metastore database at a time, so it is not really a

spark ETL and spark thrift server running together

2022-03-30 Thread Alex Kosberg
Hi, Some details: * Spark SQL (version 3.2.1) * Driver: Hive JDBC (version 2.3.9) * ThriftCLIService: Starting ThriftBinaryCLIService on port 1 with 5...500 worker threads * BI tool is connect via odbc driver After activating Spark Thrift Server I'm unable to

Call for Presentations now open, ApacheCon North America 2022

2022-03-30 Thread Rich Bowen
[You are receiving this because you are subscribed to one or more user or dev mailing list of an Apache Software Foundation project.] ApacheCon draws participants at all levels to explore “Tomorrow’s Technology Today” across 300+ Apache projects and their diverse communities. ApacheCon showcases

Unusual bug,please help me,i can do nothing!!!

2022-03-30 Thread spark User
Hello, I am a spark user. I use the "spark-shell.cmd" startup command in windows cmd, the first startup is normal, when I use the "ctrl+c" command to force the end of the spark window, it can't start normally again. .The error message is as follows "Failed to initialize Spark