Github user Fokko commented on the issue: https://github.com/apache/spark/pull/20057 Hi @gatorsmile, thanks for putting it to the tests. The main reason why I personally dislike Sqoop is: - **Legacy.** The old map-reduce should be buried in the upcoming years. As a data engineering consultant, I see more people questioning the whole Hadoop stack. Using Sqoop you still need to run map-reduce tasks, and this isn't easy on other platforms like kubernetes. - **Stability.** I see Sqoop jobs fail quite often, and there isn't a nice way of retrying this in an atomic way. For example, when having a Sqoop job on Airflow, we cannot simply retry the operation. We when we import data from a rmdbs to hdfs, we have to make sure that the target directory of the previous run has been deleted. This is also where Spark-jdbc comes in, for example, in the future we would like to delete single partitions, but this is wip. Maybe @danielvdende can elaborate a bit on their use-case.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org