> Wrt looping: if I want to process 3 years of data, my modest cluster
will never do it one go , I would expect?
> I have to break it down in smaller pieces and run that in a loop (1
day is already lots of data).
Well, that is exactly what Spark is made for. It splits the work up and
It`s quite impossible for anyone to answer your question about what is
eating your memory, without even knowing what language you are using.
If you are using C then it`s always pointers, that's the mem issue.
If you are using python, there can be some like not using context manager
like With
Thanks for answer-much appreciated! This forum is very useful :-)
I didnt know the sparkcontext stays alive. I guess this is eating up memory.
The eviction means that he knows that he should clear some of the old cached
memory to be able to store new one. In case anyone has good articles about
The Spark context does not stop when a job does. It stops when you stop it.
There could be many ways mem can leak. Caching maybe - but it will evict.
You should be clearing caches when no longer needed.
I would guess it is something else your program holds on to in its logic.
Also consider not
Hi,
I have a pyspark job submitted through spark-submit that does some heavy
processing for 1 day of data. It runs with no errors. I have to loop over many
days, so I run this spark job in a loop. I notice after couple executions the
memory is increasing on all worker nodes and eventually this
Hi Christophe,
Thank you for the explanation!
Regards,
Alex
From: Christophe Préaud
Sent: Wednesday, March 30, 2022 3:43 PM
To: Alex Kosberg ; user@spark.apache.org
Subject: [EXTERNAL] Re: spark ETL and spark thrift server running together
Hi Alex,
As stated in the Hive documentation
Hi Alex,
As stated in the Hive documentation
(https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration):
*An embedded metastore database is mainly used for unit tests. Only one process
can connect to the metastore database at a time, so it is not really a
Hi,
Some details:
* Spark SQL (version 3.2.1)
* Driver: Hive JDBC (version 2.3.9)
* ThriftCLIService: Starting ThriftBinaryCLIService on port 1 with
5...500 worker threads
* BI tool is connect via odbc driver
After activating Spark Thrift Server I'm unable to
[You are receiving this because you are subscribed to one or more user
or dev mailing list of an Apache Software Foundation project.]
ApacheCon draws participants at all levels to explore “Tomorrow’s
Technology Today” across 300+ Apache projects and their diverse
communities. ApacheCon showcases
Hello, I am a spark user. I use the "spark-shell.cmd" startup command in
windows cmd, the first startup is normal, when I use the "ctrl+c" command to
force the end of the spark window, it can't start normally again. .The error
message is as follows "Failed to initialize Spark
10 matches
Mail list logo