The properties provided earlier, will work for the standalone mode. For cluster 
mode, the below properties need to be added in the spark-submit:
--files "<path>/log4j.properties"     (to make log4j property file available 
for both driver and executor/s)

(to enable the extra java options for driver and executor/s)
--conf 
"spark.driver.extraJavaOptions=-Dlog4j.configuration=file:<path>/log4j.properties"
--conf 
"spark.executor.extraJavaOptions=-Dlog4j.configuration=file:<path>/log4j.properties"

Regards,
Abhishek Jain

From: em...@yeikel.com <em...@yeikel.com>
Sent: Friday, February 15, 2019 7:32 AM
To: Jain, Abhishek 3. (Nokia - IN/Bangalore) <abhishek.3.j...@nokia.com>; 
'Deepak Sharma' <deepakmc...@gmail.com>
Cc: 'spark users' <user@spark.apache.org>
Subject: RE: Spark streaming filling the disk with logs

I have a quick question about this configuration. Particularly this line :

log4j.appender.rolling.file=/var/log/spark/<logfilename>

Where is that path at? At the driver level or for each executor individually?

Thank you

From: Jain, Abhishek 3. (Nokia - IN/Bangalore) 
<abhishek.3.j...@nokia.com<mailto:abhishek.3.j...@nokia.com>>
Sent: Thursday, February 14, 2019 7:48 AM
To: Deepak Sharma <deepakmc...@gmail.com<mailto:deepakmc...@gmail.com>>
Cc: spark users <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: RE: Spark streaming filling the disk with logs

++
If you can afford loosing few old logs, then you can make use of rolling file 
Appender as well.

log4j.rootLogger=INFO, rolling
log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.maxFileSize=50MB
log4j.appender.rolling.maxBackupIndex=5
log4j.appender.rolling.file=/var/log/spark/<logfilename>
log4j.logger.org.apache.spark=<LogLevel>

This means log4j will roll the log file by 50MB and keep only 5 recent files. 
These files are saved in /var/log/spark directory, with filename mentioned.

Regards,
Abhishek Jain

From: Jain, Abhishek 3. (Nokia - IN/Bangalore)
Sent: Thursday, February 14, 2019 5:58 PM
To: Deepak Sharma <deepakmc...@gmail.com<mailto:deepakmc...@gmail.com>>
Cc: spark users <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: RE: Spark streaming filling the disk with logs

Hi Deepak,

The spark logging can be set for different purposes. Say for example if you 
want to control the spark-submit log, 
“log4j.logger.org.apache.spark.repl.Main=WARN/INFO/ERROR” can be set.

Similarly, to control third party logs:
log4j.logger.org.spark_project.jetty=<LEVEL>, 
log4j.logger.org.apache.parquet=<LEVEL> etc..

These properties can be set in the conf/log4j .properties file.

Hope this helps! 😊

Regards,
Abhishek Jain

From: Deepak Sharma <deepakmc...@gmail.com<mailto:deepakmc...@gmail.com>>
Sent: Thursday, February 14, 2019 12:10 PM
To: spark users <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Spark streaming filling the disk with logs

Hi All
I am running a spark streaming job with below configuration :

--conf "spark.executor.extraJavaOptions=-Droot.logger=WARN,console"

But it’s still filling the disk with info logs.
If the logging level is set to WARN at cluster level , then only the WARN logs 
are getting written but then it affects all the jobs .

Is there any way to get rid of INFO level of logging at spark streaming job 
level ?

Thanks
Deepak

--
Thanks
Deepak
www.bigdatabig.com<http://www.bigdatabig.com>
www.keosha.net<http://www.keosha.net>

Reply via email to