Hello,

We have a large number of 
custom-generated files (not just web logs) that we need to move from our JBoss 
servers to HDFS.  Our first implementation ran a cron job every 5 minutes to 
move our files from the "output" directory to HDFS.

Is this recommended?  We are being told by our IT team that our JBoss servers 
should not have access to HDFS for security reasons.  The files must be 
"sucked" to HDFS by other servers that do not accept traffic 
from the outside.  In essence, they are asking for a layer of 
indirection.  Instead of:
{JBoss server} --> {HDFS}
it's being requested that it look like:
{Separate server} <-- {JBoss server}
and then
{Separate server} --> HDFS


While I understand in principle what is being said, the security of having 
processes on JBoss servers writing files to HDFS doesn't seem any worse than 
having Tomcat servers access a central database, which they do.

Can anyone comment on what a recommended approach would be?  Should our JBoss 
servers push their data to HDFS or should the data be pulled by another server 
and then placed into HDFS?

Thank you!
FT

Reply via email to