Hi Janardhan, >> 1. Can you help me, estimate how much would it take to implement blocksparse kernels practically. This is a difficult question to answer as it depends on how comfortable you are with writing and optimizing sparse kernels. To implement block-sparse kernel as per your document, one needs to know (each step is progressively more difficult than the previous step): 1. How to implement and compile a simple CUDA kernel. 2. How to implement a non-block sparse kernel such that results match with the CuSPARSE code. 3. How to optimize a non-block sparse kernel such that performance match with the CuSPARSE code. 4. How to optimize a non-block sparse kernel for a given hardware such that performance match with the CuSPARSE code on that hardware. This requires working knowledge of different Nvidia devices and how to tweak sass code. 5. How to implement a block sparse kernel such that results match with the CuSPARSE code. 6. How to optimize a block sparse kernel such that performance match with the CuSPARSE code. 7. How to optimize a block sparse kernel for a given hardware such that performance match with the CuSPARSE code on that hardware.
I would recommend before attempting a block-sparse kernel: - Picking up JIRAs that will help you through steps (1)-(3). May be, you want to try SYSTEMML-937 and SYSTEMML-2312/SYSTEMML-2313 first. - You can skip (4) and (7) as it involves maintenance overhead. For (2)-(3), you can either use SystemML or the CUDA code I sent in the earlier thread as baseline. >> 2. Would like to spare some time to review PRs ( ~2 PRs per week). Sure. You may want to batch similar PRs as it reduces turn-around time. Thanks, Niketan Pansare IBM Almaden Research Center E-mail: npansar At us.ibm.com http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar From: Janardhan <[email protected]> To: [email protected], Niketan Pansare <[email protected]>, [email protected], Nakul Jindal <[email protected]> Date: 05/10/2018 10:34 AM Subject: [DISCUSS] Blocksparse kernels Hi Niketan, Nakul, and Berthold, 1. Can you help me, estimate how much would it take to implement blocksparse kernels practically. 2. Would like to spare some time to review PRs ( ~2 PRs per week). a. Relevant Jira: https://issues.apache.org/jira/browse/SYSTEMML-2041 b. My proposal: https://docs.google.com/document/d/1cgPdyhhG3kQZxeP1VYOnQoZuTVA2216CC_aWdurmTNw/edit?usp=sharing c. Research paper: https://s3-us-west-2.amazonaws.com/openai-assets/blocksparse/blocksparsepaper.pdf Thank you, Janardhan
