Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-57629838 > @mattf don't know what you mean by "functionality that is already provided by the system". I'm not aware of HDFS having any way to automatically do housekeeping of old files. a system approach means using something like logrotate or a cleaner process that's run from cron. such an approach is beneficial in a number of ways, including reducing the complexity of spark by not duplicating functionality that's already available in spark's environment - akin to using a standard library for i/o instead of interacting w/ devices directly. in this case the context for the environment is the system, where you'll find things like logrotate and cron readily available. as for rotating logs in hdfs - i wouldn't expect hdfs to provide such a feature, because the feature serves a specific use case on top of hdfs. some searching suggests that there are a few solutions available for doing rotation or pruning of files in hdfs and points out that rotating/pruning/cleaning/purging can be done remotely and independently from spark since hdfs is distributed.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org