We are probably still the minority, but our analytics platform based on Spark + HDFS does not have map/reduce installed. I'm wondering if there is a distcp equivalent that leverages Spark to do the work.
Our team is trying to find the best way to do cross-datacenter replication of our HDFS data to minimize the impact of outages/dc failure.