Hi all,

We have an application where we pull logs from an external server(far apart 
from hadoop cluster) to hadoop cluster. Sometimes, we could see huge delay (of 
1 hour or more) in actually seeing the data in HDFS though the file has been 
closed and the variable is set to null from the external application.I was in 
the impression that when I close the file, the data gets reflected in hadoop 
cluster. Now, in this situation, it is even more complicated to handle write 
failures as it is giving false impression to the client that data has been 
written to HDFS. Kindly clarify if my perception is correct. If yes, Could some 
one tell me what is causing the delay in actually showing the data. During 
those cases, how can we tackle write failures (due to some temporary issues 
like data node not available, disk is full) as there is no way, we can figure 
out the failure at the client side?

Thanks
Pallavi

Reply via email to