OK, this is a reasonable train of though and your names seem fine. However text is actually the persistent representation of what I was calling an extended-DRM, which should probably be called an IndexedDataset. I don’t see the difference between import and persistence since there are never user visible intermediate files. Also simple CSV is the only currently supported format for IndexedDataset, there will be others as needs present themselves.
Therefore following your train of thought and to fit the changes you suggest for DRM naming I’d change the IndexedDataset names to: Package level indexedDatasetDfsRead(src: String, schema: Schema = DefaultSchema): IndexedDataset Method level indexedDataset.dfsWrite(dest: String, schema: Schema = DefaultSchema) Once read the DRM is a CheckpointedDrm contained in the IndexedDataset. So call it import/export or persistence a user can use either the sequence file or text to read/write DRMs Seem reasonable? On Sep 26, 2014, at 11:30 AM, Dmitriy Lyubimov <notificati...@github.com> wrote: to be a bit more concrete, there's indeed slight discrepancy between write and read names, but semantically they are what they say they are, i.e. they are persisting drm to hdfs. To be even more concrete, i am probably for simply package-level drmDfsRead() and method-level dfsWrite() names. The convention here is that all drm-related package-level routines start with drm prefix so we don't easily mix these things with other things in global scope. Now, everything else, including reading/writing CSV formats, is an export operation (as opposed to persistence). Consequently, proper names are perhaps along the lines drmImportCSV and exportCSV respectively. Import and export emphasizes the fact that format is not native, loses a lot of coherency enforcement, and requires a lot of validation while parsing back. — Reply to this email directly or view it on GitHub.