I've created a stackoverflow ticket for this as well

On Mon, Sep 19, 2022 at 4:37 PM karan alang <karan.al...@gmail.com> wrote:

> Hello All,
> I've a Spark Structured Streaming job on GCP Dataproc - which picks up
> data from Kafka, does processing and pushes data back into kafka topics.
>
> Couple of questions :
> 1. Does Spark put all the log (incl. INFO, WARN etc) into stderr ?
> What I notice is that stdout is empty, while all the logging is put in to
> stderr
>
> 2. Is there a way for me to expire the data in stderr (i.e. expire the
> older logs) ?
> Since I've a long running streaming job, the stderr gets filled up over
> time and nodes/VMs become unavailable.
>
> Pls advice.
>
> Here is output of the yarn logs command :
> ```
>
> root@versa-structured-stream-v1-w-1:/home/karanalang# yarn logs
> -applicationId application_1663623368960_0008 -log_files stderr -size -500
>
> 2022-09-19 23:26:01,439 INFO client.RMProxy: Connecting to ResourceManager
> at versa-structured-stream-v1-m/10.142.0.62:8032
>
> 2022-09-19 23:26:01,696 INFO client.AHSProxy: Connecting to Application
> History server at versa-structured-stream-v1-m/10.142.0.62:10200
>
> Can not find any log file matching the pattern: [stderr] for the
> container: container_e01_1663623368960_0008_01_000003 within the
> application: application_1663623368960_0008
>
> Container: container_e01_1663623368960_0008_01_000002 on
> versa-structured-stream-v1-w-2.c.versa-sml-googl.internal:8026
>
> LogAggregationType: LOCAL
>
>
> =======================================================================================================================
>
> LogType:stderr
>
> LogLastModifiedTime:Mon Sep 19 23:26:02 +0000 2022
>
> LogLength:44309782124
>
> LogContents:
>
> , tenantId=3, vsnId=0, mstatsTotSentOctets=48210,
> mstatsTotRecvdOctets=242351, mstatsTotSessDuration=300000,
> mstatsTotSessCount=34, mstatsType=dest-stats, destIp=165.225.216.24,
> mstatsAttribs=,topic=syslog.ueba-us4.v1.versa.demo3,customer=versa  type(row)
> is ->  <class 'str'>
>
> 22/09/19 23:26:02 WARN
> org.apache.spark.sql.kafka010.consumer.KafkaDataConsumer: KafkaDataConsumer
> is not running in UninterruptibleThread. It may hang when
> KafkaDataConsumer's methods are interrupted because of KAFKA-1894
>
> End of LogType:stderr.This log file belongs to a running container
> (container_e01_1663623368960_0008_01_000002) and so may not be complete.
>
> ***********************************************************************
>
>
>
> Container: container_e01_1663623368960_0008_01_000001 on
> versa-structured-stream-v1-w-1.c.versa-sml-googl.internal:8026
>
> LogAggregationType: LOCAL
>
>
> =======================================================================================================================
>
> LogType:stderr
>
> LogLastModifiedTime:Mon Sep 19 22:54:55 +0000 2022
>
> LogLength:17367929
>
> LogContents:
>
> on syslog.ueba-us4.v1.versa.demo3-2
>
> 22/09/19 22:52:52 INFO
> org.apache.kafka.clients.consumer.internals.SubscriptionState: [Consumer
> clientId=consumer-spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor-1,
> groupId=spark-kafka-source-0f984ad9-f663-4ce1-9ef1-349419f3e6ec-1714963016-executor]
> Resetting offset for partition syslog.ueba-us4.v1.versa.demo3-2 to offset
> 449568676.
>
> 22/09/19 22:54:55 ERROR
> org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
>
> End of LogType:stderr.
>
> ***********************************************************************
>
> ```
>
>
>

Reply via email to