Hello All,
I've a Spark Structured Streaming job on GCP Dataproc - which picks up data
from Kafka, does processing and pushes data back into kafka topics.

Couple of questions :
1. Does Spark put all the log (incl. INFO, WARN etc) into stderr ?
What I notice is that stdout is empty, while all the logging is put in to
stderr

2. Is there a way for me to expire the data in stderr (i.e. expire the
older logs) ?
Since I've a long running streaming job, the stderr gets filled up over
time and nodes/VMs become unavailable.

Pls advice.

Here is output of the yarn logs command :
```

root@versa-structured-stream-v1-w-1:/home/karanalang# yarn logs
-applicationId application_1663623368960_0008 -log_files stderr -size -500

2022-09-19 23:26:01,439 INFO client.RMProxy: Connecting to ResourceManager
at versa-structured-stream-v1-m/10.142.0.62:8032

2022-09-19 23:26:01,696 INFO client.AHSProxy: Connecting to Application
History server at versa-structured-stream-v1-m/10.142.0.62:10200

Can not find any log file matching the pattern: [stderr] for the container:
container_e01_1663623368960_0008_01_000003 within the application:
application_1663623368960_0008

Container: container_e01_1663623368960_0008_01_000002 on
versa-structured-stream-v1-w-2.c.versa-sml-googl.internal:8026

LogAggregationType: LOCAL

=======================================================================================================================

LogType:stderr

LogLastModifiedTime:Mon Sep 19 23:26:02 +0000 2022

LogLength:44309782124

LogContents:

, tenantId=3, vsnId=0, mstatsTotSentOctets=48210,
mstatsTotRecvdOctets=242351, mstatsTotSessDuration=300000,
mstatsTotSessCount=34, mstatsType=dest-stats, destIp=165.225.216.24,
mstatsAttribs=,topic=syslog.ueba-us4.v1.versa.demo3,customer=versa  type(row)
is ->  <class 'str'>

22/09/19 23:26:02 WARN
org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer: KafkaDataConsumer
is not running in UninterruptibleThread. It may hang when
KafkaDataConsumer's methods are interrupted because of KAFKA-1894

End of LogType:stderr.This log file belongs to a running container
(container_e01_1663623368960_0008_01_000002) and so may not be complete.

***********************************************************************



Container: container_e01_1663623368960_0008_01_000001 on
versa-structured-stream-v1-w-1.c.versa-sml-googl.internal:8026

LogAggregationType: LOCAL

=======================================================================================================================

LogType:stderr

LogLastModifiedTime:Mon Sep 19 22:54:55 +0000 2022

LogLength:17367929

LogContents:

on syslog.ueba-us4.v1.versa.demo3-2

22/09/19 22:52:52 INFO
org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer
clientId=consumer-spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor-1,
groupId=spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor]
Resetting offset for partition syslog.ueba-us4.v1.versa.demo3-2 to offset
449568676.

22/09/19 22:54:55 ERROR
org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

End of LogType:stderr.

***********************************************************************

```

Reply via email to