Dear All, I have struggling with an issue where spark steam job gets hung after exceeding size of output folder path.
here is more details: I have Flume sending and configuration agent1.sources = source1 agent1.sinks = sink1 agent1.channels = channel2 # Describe/configure source1 agent1.sources.source1.type = exec agent1.sources.source1.command = tail -f /mapr/rawdata/input/syslog agent1.sinks.sink1.type = org.apache.spark.streaming.flume.sink.SparkSink agent1.sinks.sink1.hostname = hostname agent1.sinks.sink1.port = 43333 #agent1.sinks.sink1.batchSize=200 agent1.channels.channel2.type = FILE agent1.channels.channel2.checkpointDir=/tmp/mapr/flume/java/checkpoint agent1.channels.channel2.dataDirs=/tmp/mapr/flume/java/data agent1.channels.channel2.capacity = 10000 agent1.channels.channel2.transactionCapacity = 1000 agent1.sources.source1.channels = channel2 agent1.sinks.sink1.channel = channel2 And i have batch interval of 180 second and and here how we are writing final output finalTable.write.partitionBy("destzone").mode(SaveMode.Append).parquet("/rawdata/output/FirewalSyslog2") sqlContext.sql("MSCK REPAIR TABLE network.trafficlog1") Please help me to fix the same.