OK, this is a reasonable train of though and your names seem fine. However text 
is actually the persistent representation of what I was calling an 
extended-DRM, which should probably be called an IndexedDataset. I don’t see 
the difference between import and persistence since there are never user 
visible intermediate files. Also simple CSV is the only currently supported 
format for IndexedDataset, there will be others as needs present themselves.

Therefore following your train of thought and to fit the changes you suggest 
for DRM naming I’d change the IndexedDataset names to:

Package level
indexedDatasetDfsRead(src: String, schema: Schema = DefaultSchema): 
IndexedDataset
 
Method level
indexedDataset.dfsWrite(dest: String, schema: Schema = DefaultSchema)

Once read the DRM is a CheckpointedDrm contained in the IndexedDataset. So call 
it import/export or persistence a user can use either the sequence file or text 
to read/write DRMs

Seem reasonable?

On Sep 26, 2014, at 11:30 AM, Dmitriy Lyubimov <notificati...@github.com> wrote:

to be a bit more concrete, there's indeed slight discrepancy between write and 
read names, but semantically they are what they say they are, i.e. they are 
persisting drm to hdfs.

To be even more concrete, i am probably for simply package-level drmDfsRead() 
and method-level dfsWrite() names.

The convention here is that all drm-related package-level routines start with 
drm prefix so we don't easily mix these things with other things in global 
scope.

Now, everything else, including reading/writing CSV formats, is an export 
operation (as opposed to persistence). Consequently, proper names are perhaps 
along the lines drmImportCSV and exportCSV respectively. Import and export 
emphasizes the fact that format is not native, loses a lot of coherency 
enforcement, and requires a lot of validation while parsing back.

—
Reply to this email directly or view it on GitHub.


Reply via email to