Possibly a dumb question: differences between saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset?

2014-09-22 Thread innowireless TaeYun Kim
Hi, I'm confused with saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset. What's the difference between the two? What's the individual use cases of the two APIs? Could you describe the internal flows of the two APIs briefly? I've used Spark several months, but I have no experience on

Re: Possibly a dumb question: differences between saveAsNewAPIHadoopFile and saveAsNewAPIHadoopDataset?

2014-09-22 Thread Matei Zaharia
File takes a filename to write to, while Dataset takes only a JobConf. This means that Dataset is more general (it can also save to storage systems that are not file systems, such as key-value stores), but is more annoying to use if you actually have a file. Matei On September 21, 2014 at