Hi all, We have an application where we pull logs from an external server(far apart from hadoop cluster) to hadoop cluster. Sometimes, we could see huge delay (of 1 hour or more) in actually seeing the data in HDFS though the file has been closed and the variable is set to null from the external application.I was in the impression that when I close the file, the data gets reflected in hadoop cluster. Now, in this situation, it is even more complicated to handle write failures as it is giving false impression to the client that data has been written to HDFS. Kindly clarify if my perception is correct. If yes, Could some one tell me what is causing the delay in actually showing the data. During those cases, how can we tackle write failures (due to some temporary issues like data node not available, disk is full) as there is no way, we can figure out the failure at the client side?
Thanks Pallavi