Please help: Spark job hung/stop writing after exceeding the folder size

Bhupendra Mishra Wed, 10 Aug 2016 02:10:53 -0700

Dear All,

I have struggling with an issue where spark steam job gets hung after
exceeding size of output folder path.


here is more details:

I have Flume sending and configuration
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel2

# Describe/configure source1
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -f /mapr/rawdata/input/syslog


agent1.sinks.sink1.type = org.apache.spark.streaming.flume.sink.SparkSink
agent1.sinks.sink1.hostname = hostname
agent1.sinks.sink1.port = 43333
#agent1.sinks.sink1.batchSize=200

agent1.channels.channel2.type = FILE
agent1.channels.channel2.checkpointDir=/tmp/mapr/flume/java/checkpoint
agent1.channels.channel2.dataDirs=/tmp/mapr/flume/java/data
agent1.channels.channel2.capacity = 10000
agent1.channels.channel2.transactionCapacity = 1000

agent1.sources.source1.channels = channel2
agent1.sinks.sink1.channel = channel2



And i have batch interval of 180 second and

and here how we are writing final output

finalTable.write.partitionBy("destzone").mode(SaveMode.Append).parquet("/rawdata/output/FirewalSyslog2")
 sqlContext.sql("MSCK REPAIR TABLE network.trafficlog1")

Please help me to fix the same.

Please help: Spark job hung/stop writing after exceeding the folder size

Reply via email to