When using hex elements with order say 4 or 5, the number of non-zeros in 
the GiMMiK kernels gets quite high. 

E.g., with n=4, tgradpcoru_upts [hex] appears to get 1819 non-zeros from 
46875 entries (i.e. about 4% non-zeros)

Just wondering, for high-order 3d runs, what's an appropriate way to 
replace the GiMMiK kernels with more normal matrix multiplication?

PS: even with 1819 non-zeros, the hard-wired (const) GiMMiK kernels appear 
to run fine, but at some point I guess the number of registers required 
must outweigh the cost of loading the const mats from memory?

>From class CUDAGiMMiKKernels(CUDAKernelProvider):

        # Check that A is reasonably sparse
        if np.count_nonzero(a.get()) > self.max_nnz:
            raise NotSuitableError('Matrix too dense for GiMMiK')


default self.max_nnz: [512]

thanks for any pointers,
Nigel

-- 
You received this message because you are subscribed to the Google Groups "PyFR 
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to pyfrmailinglist+unsubscr...@googlegroups.com.
To post to this group, send an email to pyfrmailinglist@googlegroups.com.
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.

Reply via email to