I bet all involved people is aware of most of this issues, but just in case.
* MatAssemblyEnd_SeqAIJCUDA: What about mode=MAT_FLUSH_ASSEMBLY? What's the point of coping to the GPU? * MatAssemblyEnd_SeqAIJCUDA: the 'tempvec' cusp array is always allocated, but not used for MatMult when no commpressed row. Of course, this issue is very low priority. * MatAssemblyEnd_SeqAIJCUDA: Perhaps memory allocation on the GPU is cheap, but if nonzeros do not change, we could avoid re-creating the GPU mat from scratch. * There are some calls that operate on assembled matrices (MatScale, MatZeroRows, MatDiagonalScale, etc.). These operations need GPU syncing. Am I missing something? * MatShift: seqaij does not implement MatShift, then MatSetValues is used in a loop, next the matrix is re-assembled. This will cause an extra copy to the GPU (take into account that used code already assembled the matrix before the MatShift call). Other calls will suffer from this issue: MatDiagonalSet * MatGetArray: if the user updates values, we are in trouble. All that being said, I'm still unsure why the GPU coping was implemented at MatAssemblyEnd_SeqAIJ. What about using a valid_GPU_data flag for Mat, set it to false in MatAssembly_Begin, and make the GPU coping at the time MatMult_SeqAIJ is called? Of course, such appoach would not solve all the previous issues... I'm just asking the rationale for the current approach. -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169