Current kernel framework divides inputs (e.g. arrays, chunked arrays) into 
batches and feeds to kernel code.
Does it make sense to pass input args directly to kernel?
I'm writing quantile kernel, need to allocate buffer to record all inputs and 
find nth at last. For chunked array, input is received chunk by chunk, kernel 
don't know the total buffer size to be allocated all at once. It will be 
convenient if the raw chunked array input is seen by the kernel.
Or there are better ways to achieve this? Thanks.

Reply via email to