wangwei created SINGA-82:
----------------------------
Summary: Refactor input layers use data store abstraction
Key: SINGA-82
URL: https://issues.apache.org/jira/browse/SINGA-82
Project: Singa
Issue Type: Improvement
Reporter: wangwei
Assignee: wangwei
1. Separate the data storage from Layer. Currently, SINGA creates one layer to
read data from one storage, e.g., ShardData, CSV, LMDB. One problem is that
only read operations are provided. When users prepare the training data, they
have to get familiar with the read/write operations for each storage. Inspired
from caffe::db::DB, we can provide a storage abstraction with simple
read/write operation interfaces. Then users call these operations to prepare
their training data. Particularly, training data is stored as (string key,
string value) tuples. The base Store class
{code}
// open the store for reading, writing or appending
virtual bool Open(const string& source, Mode mode);
// for reading tuples
virtual bool Read(string*key, string*value) = 0;
// for writing tuples
virtual bool Write(const string& key, const string& value) = 0;
{code}
The specific storage, e.g., CSV, LMDB, image folder or HDFS (will be supported
soon), inherits Store and overrides the functions.
Consequently, a single KVInputLayer (like the SequenceFile.Reader from Hadoop)
can read from different sources by configuring *store* field (e.g., store=csv).
With the Store class, we can implement a KVInputLayer to read batchsize tuples
in its ComputeFeature function. The tuple is parsed by a virtual function
depending on the application (or the format of the tuple).
{code}
// parse the tuple as the k-th instance for one mini-batch
virtual bool Parse(int k, const string& key, const string& tuple) = 0;
{code}
For example, a CSVKVInputLayer may parse the key into a line ID, and parse the
label and feature from the value field. An ImageKVInputLayer may parse a
SingleLabelImageRecord from the value field.
2. The will be a set of layers doing data preprocessing, e.g., normalization
and image augmentation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)