Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-45432289 As an aside, I am generally -1 on adding a lot of specific reading/writing code to Spark core. My view is, that is why InputFormat/OutputFormat support is there - to provide that custom read/write functionality. Now it makes sense for something like Parquet with SparkSQL as the preferred format for efficiency (in much the same way as SequenceFiles are often the preferred format in many Hadoop pipelines), but should Spark core contain standardised methods for .saveAsXXXFile for every format? IMO, no - the examples show how to do things with common formats. I can see why providing contrib modules for reading/writing structured (RDBMS-like) data via common formats for SparkSQL makes sense, as there will probably be one "correct" way of doing this. But looking at the HBase PR you referenced, I don't see the value of having that live in Spark. And why is it not simply using an ```OutputFormat``` instead of custom config and writing code? (I might be missing something here, but it seems to add complexity and maintenance burden unnecessarily)
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---