[
https://issues.apache.org/jira/browse/SINGA-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wangwei resolved SINGA-97.
--------------------------
Resolution: Fixed
HDFS support (i.e., read/write functions against files in HDFS) could be added
in the next version.
> SINGA-97 Add HDFS Store
> ------------------------
>
> Key: SINGA-97
> URL: https://issues.apache.org/jira/browse/SINGA-97
> Project: Singa
> Issue Type: New Feature
> Reporter: Anh Dinh
> Assignee: Anh Dinh
>
> This ticket implements HDFS Store for reading data from HDFS. It complements
> the existing CSV Store which reads data from CSV file. HDFS is the popular
> distributed file system with high (sequential) I/O throughputs, thus
> supporting it is necessary in order for SINGA to scale.
> The implementation will extend singa::io::Store class which is declared in
> `singa/io/store.h`. In particular, it will support the following I/O
> operations:
> + `bool Open(string& file, Mode mode)`
> + `bool Close()`
> + `bool Flush()`
> + `int Seek(int record_idx)`
> + `int Read(string *content)`
> + `int Write(string& content)`
> HDFS usage in SINGA is different to that in standard MapReduce applications.
> Specifically, each SINGA worker may train on sequences of records which do
> not lie within block boundary, whereas in MapReduce each Mapper process a
> number of complete blocks. In MapReduce, the runtime engine may fetch and
> cache the entire block over the network, knowing that the block will be
> processed entirely. In SINGA, such pre-fetching and caching strategy will be
> sub-optimal because it wastes I/O and network bandwidth on data records which
> are not used.
> We defer I/O optimization to a future ticket.
> For implementation, we choose `libhdfs3` from Pivotal for HDFS implementation
> in C++. This library is built natively for C++, hence it is more optimized
> and easier to deploy than the original `libhdfs` library that is shipped
> with Hadoop. Finally, we test the implementation in a distributed environment
> set up from a number of Docker containers (see SINGA-11).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)