[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

robbinspg Fri, 15 Apr 2016 04:51:30 -0700

Github user robbinspg commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12397#discussion_r59863075
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java
 ---
    @@ -31,6 +33,8 @@
       private byte[] buffer;
       private int offset;
       private int bitOffset; // Only used for booleans.
    +  
    +  private final static boolean bigEndianPlatform = 
ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN);
    --- End diff --
    
    So the first step is to get this functional on big endian without affecting 
the little endian implementation asap for inclusion in 2.0.0.
    
    In VectorizedPlainValuesReader we could wrap the buffer once (assuming it 
never gets re-allocated) and use the ByteBuffer for the big endian access. We 
could always use the ByteBuffer rather than the byte[] even in little endian 
implementation but I do not know what performance impact that would have and as 
stated above I did not want to mess around with the little endian 
implementation at this time.
    
    Of course the methods in on/offHeapColumnVector that take the byte[] as a 
parameter could also be changed to take the byte buffer instead.
    
    There is also the unnecessary subtracting/adding the 
Platform.BYTE_ARRAY_OFFSET around a lot of the method calls. eg 
    
      public final void readIntegers(int total, ColumnVector c, int rowId) {
        c.putIntsLittleEndian(rowId, total, buffer, offset - 
Platform.BYTE_ARRAY_OFFSET);
        offset += 4 * total;
      }
    
    where the first thing that putIntsLittleEndian does is to add that back on 
to the offset.
    
    So I think for now I'm reasonably happy with this big endian support in 
this patch and we could maybe review the whole code structure later. I've 
improved the patch and will maybe make the change to 
VectorizedPlainValuesReader so the code in there only wraps the buffer once.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13745][SQL]Support columnar in memory r...

Reply via email to