Dear All,

I have struggling with an issue where spark steam job gets hung after
exceeding size of output folder path.

here is more details:

I have Flume sending and configuration
agent1.sources = source1
agent1.sinks = sink1
agent1.channels = channel2

# Describe/configure source1
agent1.sources.source1.type = exec
agent1.sources.source1.command = tail -f /mapr/rawdata/input/syslog


agent1.sinks.sink1.type = org.apache.spark.streaming.flume.sink.SparkSink
agent1.sinks.sink1.hostname = hostname
agent1.sinks.sink1.port = 43333
#agent1.sinks.sink1.batchSize=200

agent1.channels.channel2.type = FILE
agent1.channels.channel2.checkpointDir=/tmp/mapr/flume/java/checkpoint
agent1.channels.channel2.dataDirs=/tmp/mapr/flume/java/data
agent1.channels.channel2.capacity = 10000
agent1.channels.channel2.transactionCapacity = 1000

agent1.sources.source1.channels = channel2
agent1.sinks.sink1.channel = channel2



And i have batch interval of 180 second and

and here how we are writing final output

finalTable.write.partitionBy("destzone").mode(SaveMode.Append).parquet("/rawdata/output/FirewalSyslog2")
 sqlContext.sql("MSCK REPAIR TABLE network.trafficlog1")

Please help me to fix the same.

Reply via email to