[GitHub] drill issue #914: DRILL-5657: Size-aware vector writer structure

paul-rogers Thu, 16 Nov 2017 23:18:08 -0800

Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/914
  
    Regarding the use of memory addresses. The only reason to do so is 
performance. To show the benefit of using addresses, I reran the 
`PerformanceTool` class to test the original code, the code using addresses, 
and a version that uses DrillBuf as @parthchandra suggested. I expected to see 
that using addresses was a winner. That's not at all what happened.
    
    The code contains a class, `PerformanceTool` that compares the column 
writers with the original vector mutators. It loads a vector to 16 MB in size, 
repeated 300 times. The following are the run times, in ms.
    
    Vector Type | Original  | New w/Address | New w/Drillbuf
    ------------ | --------  | ------------  | ------------- 
    Required | 5703 | 4034 | 1461
    Nullable | 12743 | 3645 | 3411
    Repeated | 20430 | 7226 | 2669
    
    Here:
    
    * "Original" column uses the original int vector mutator class.
    * "New w/Address" shows the same exercise, using the version of the vector 
writers based on a direct memory address.
    * "New w/Drillbuf" shows the vector writers, but using the technique Parth 
suggested to create "unsafe" methods on the `Drillbuf` class.
    
    The test is run with a pre-allocated vector (no double-and-copy 
operations). See `PerformanceTool` for details.
    
    I have no explanation for why the `Drillbuf` version should be faster at 
all, let alone far faster; but I'll take it. The latest commit contains the 
code after this revision.
    
    So, thank you Parth, you were right again with what turned out to be an 
outstanding performance boost.

---

[GitHub] drill issue #914: DRILL-5657: Size-aware vector writer structure

Reply via email to