[ https://issues.apache.org/jira/browse/FLINK-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777124#comment-16777124 ]
Stephan Ewen commented on FLINK-5782: ------------------------------------- Closing this for inactivity. > Support GPU calculations > ------------------------ > > Key: FLINK-5782 > URL: https://issues.apache.org/jira/browse/FLINK-5782 > Project: Flink > Issue Type: Improvement > Components: Core > Affects Versions: 1.3.0 > Reporter: Kate Eri > Assignee: Kate Eri > Priority: Minor > > This ticket was initiated as continuation of the dev discussion thread: [New > Flink team member - Kate Eri (Integration with DL4J > topic)|http://mail-archives.apache.org/mod_mbox/flink-dev/201702.mbox/browser] > > Recently we have proposed the idea to integrate > [Deeplearning4J|https://deeplearning4j.org/index.html] with Apache Flink. > It is known that DL models training is resource demanding process, so > training on CPU could converge much longer than on GPU. > But not only for DL training GPU usage could be supposed, but also for > optimization of graph analytics and other typical data manipulations, nice > overview of GPU related problems is presented [Accelerating Spark workloads > using > GPUs|https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus]. > Currently the community pointed the following issues to consider: > 1) Flink would like to avoid to write one more time its own GPU support, > to reduce engineering burden. That’s why such libraries like > [ND4J|http://nd4j.org/userguide] should be considered. > 2) Currently Flink uses [Breeze|https://github.com/scalanlp/breeze], to > optimize linear algebra calculations, ND4J can’t be integrated as is, because > it still doesn’t support [sparse arrays|http://nd4j.org/userguide#faq]. Maybe > this issue should be simply contributed to ND4J to enable its usage? > 3) The calculations would have to work with both available and not > available GPUs. If the system detects that GPUs are available, then ideally > it would exploit them. Thus GPU resource management could be incorporated in > [FLINK-5131|https://issues.apache.org/jira/browse/FLINK-5131] (only > suggested). > 4) It was mentioned that as far Flink takes care of shipping data around > the cluster, also it will perform its dump out to GPU for calculation and > load back up. In practice, the lack of a persist method for intermediate > results makes this troublesome (not because of GPUs but for calculating any > sort of complex algorithm we expect to be able to cache intermediate results). > That’s why the Ticket > [FLINK-1730|https://issues.apache.org/jira/browse/FLINK-1730] must be > implemented to solve such problem. > 5) Also it was recommended to take a look at Apache Mahout, at least to > get the experience with GPU integration and check its > https://github.com/apache/mahout/tree/master/viennacl-omp > https://github.com/apache/mahout/tree/master/viennacl > 6) For now, GPU proposed only for batch calculations optimization, to > support GPU for streaming should be started another ticket, because > optimization of streaming by GPU requires additional research. > 7) Also experience of Netflix regarding this question could be considered: > [Distributed Neural Networks with GPUs in the AWS > Cloud|http://techblog.netflix.com/search/label/CUDA] > This is considered as master ticket for GPU related ticktes -- This message was sent by Atlassian JIRA (v7.6.3#76005)