Another interesting opportunity would be the development of a gpu storage tier based on gpu direct
On Feb 17, 2018 4:00 PM, "Patrick Stuedi" <[email protected]> wrote: > That's great, one of the main goals of crail being an apache incubator > project is to get more people involved in the development of crail. I've > been following your contributions to tensorflow, nice work! Collaborating > in this context (incl mxnet) would be very interesting. There are multiple > ways to go. Once we have the core c++ client we could need help in the > developmen of the various bindings (rdma, tcp, for storage and rpc). Or we > could need help in leveraging crail in tensirflow and mxnet (param server, > storage of the model > dram). Let us know where you see opportinities. > > On Feb 17, 2018 3:36 PM, "Bairen YI" <[email protected]> wrote: > >> Hi Patrick, >> >> That would be fantastic. In fact we would love to get more involved as >> our lab in HKUST has partnered with MLNX to codevelop datacenter scale AI >> software solution (TensorFlow and Apache MXNet), and we could encourage a >> couple of students contributing code to Crail at this very stage if we see >> fit. It could also bring novel system/networking research opportunities to >> our lab. >> >> Let me know how we could better work together. >> >> Best, >> Bairen >> >> > On 17 Feb 2018, at 22:19, Patrick Stuedi <[email protected]> wrote: >> > >> > Hi Bairen, >> > >> > Your comment is just on spot. The development of a c++ Api for crail is >> one >> > of the top items on the roadmap, in partical to facilitate the >> integration >> > into tensorflow and serverless. In fact i started drafting a prototype >> two >> > weeks ago that i wanted to share soon. If you are interested in helping >> let >> > us know! >> > >> > >> > >> > On Feb 17, 2018 1:49 PM, "Bairen YI" <[email protected]> wrote: >> > >> > HI folks, >> > >> > I have been looking into you guys’ work for a long time and it is great >> to >> > see Crail accepted as an Apache Incubator project. >> > >> > I authored the GPU Direct RDMA transport for TensorFlow ( >> > https://github.com/tensorflow/tensorflow/pull/11392), and I would love >> to >> > see how we could design an end-to-end zero-copy dataflow from Crail to >> > various deep learning framework such as TensorFlow ( >> > https://dl.acm.org/citation.cfm?doid=3123878.3131975). >> > >> > Is there any roadmap for Crail as a standalone language-independent >> > FileSystem/Cache service with C API? That would really ease the >> integration >> > into non-JVM based third party system. It does not have to be HDFS >> > compatible if that brings extra performance cost. >> > >> > Best, >> > Bairen >> >
