On 7 Sep 2015, at 20:44, lonikar <loni...@gmail.com<mailto:loni...@gmail.com>> wrote:
2. If the vectorization is difficult or a major effort, I am not sure how I am going to implement even a glimpse of changes I would like to. I think I will have to satisfied with only a partial effort. Batching rows defeats the purpose as I have found that it consumes a considerable amount of CPU cycles and producing one row at a time also takes away the performance benefit. Whats really required is to access a large partition and produce the result partition in one shot. why not look at the dataframes APIs and the back-end implementations of things which support it? The data sources which are columnized from the outset (ORC, parquet) are the ones where vector operations work well : you can read at of columns, perform a parallel operation, then repeat. If you can hook up to a column structure you may get that speedup. I think I will have to severely limit the scope of my talk in that case. Or re-orient it to propose the changes instead of presenting the results of execution on GPU. Please suggest since you seem to have selected the talk. It is always essential to have the core of your talk ready before you propose the talk -its something reviewers (nothing to do with me here) mostly expect. Otherwise you are left in a panic three days before trying to do bash together some slides you will have to present to an audience that may include people that know the code better than you. I've been there -and fear I will be there again in 3 weeks time. Some general suggestions 1. assume the audience knows spark, but not how to code for GPUs: intro that on a slide or two 2. cover the bandwidth problem: how much computation is needed before working with the GPU is justified 3. Look at the body of work of Hadoop MapReduce & GPUs and the limitations (IO bandwidth, intermediate stage B/W) as well as benefits (perf on CPU workloads, power budget) 4. Cover how that's changing: SDDs, in-memory filesystems, whether infiniband would help. 5. Try to demo something. It's always nice to show something working at a talk, even if its just your laptop