[ https://issues.apache.org/jira/browse/SINGA-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231526#comment-15231526 ]
ASF subversion and git services commented on SINGA-130: ------------------------------------------------------- Commit a0bdd0b85ddba7d670ab04c5de04a29c8366e868 in incubator-singa's branch refs/heads/master from [~ug93tad] [ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=a0bdd0b ] SINGA-130 Data prefetching layer Extended StoreInputLayer to support prefetching of data. It maintains a buffer for (key,value) pairs read from the storage layer. In Setup(), it launches a new thread for reading data into the buffer. This thread stores data into the buffer. The ComputeFeature() method waits for thread to finish (join) before parsing it into data_ and aux_ field. Finally, it launches another thread. In terms of memory consumption, this prefetching use extra (batchsize*recordsize) bytes for the buffer. However, we observe no visible runtime improvement, as I/O time is very small (in order of milliseconds without prefetching, and tens of microsecond with prefetching) compared to CPU time. > Implement a layer subclass for data prefetching > ----------------------------------------------- > > Key: SINGA-130 > URL: https://issues.apache.org/jira/browse/SINGA-130 > Project: Singa > Issue Type: New Feature > Reporter: wangwei > Assignee: Anh Dinh > Labels: data, multi-threading, prefetch > > Data prefetching is important for training with GPU, because the IO would > become the bottleneck when the computation is very fast. > One idea is to create a general prefetch layer which embeds the application > specific data loading layers. > {code} > PrefetchLayer::ComptueFeature() { > wait until the pretch thread finishes. > swap the prefeth_data_ and data_ blobs. > if (first time) > load data into data_ blobs > spawn a new thread to call functions from data loading layers for loading > data into prefetch_data_. > } > {code} > > If the prefetch layer has multiple loading layers and is connected to > multiple destination layers, then different destination layer may want data > loaded by different loading layers. This case should be handled properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)