[GitHub] carbondata issue #1538: [CARBONDATA-1779] GenericVectorizedReader

bhavya411 Wed, 22 Nov 2017 00:55:29 -0800

Github user bhavya411 commented on the issue:

    https://github.com/apache/carbondata/pull/1538
  
    This PR removes the Spark Dependency from Presto Integration Module for 
using the CarbonVectorizedRecordreader, This PR consolidate  
CarbonVectorizedRecordReader into one,to make it shared for all integration 
modules
    
    In the earlier version of Presto Integration we were using ColumnarBatch of 
Spark, which is not a good practice, here we provided our own implementation of 
the ColumnVector and the VectorBatch to eliminate the Spark all together. This 
generic ColumnVector can now be used for all the integration module wherever we 
want to have a VectorizedReader to speed up the processing. 
    
    There are some core module classes changed to ensure that we are using Java 
data types instead of Spark datatypes, Decimal being one of them.
    
    This PR tries to limit the changes to Core module .
    
    Newly Added Classes
    1.CarbonColumnVectorImpl:This Class Implements the Interface 
CarbonColumnVector and provides the methods to store the data in a Vector and 
to retrieved the data from it as well
    
    2.CarbonVectorBatch: This Class Creates A VectorizedRowBatch which is a set 
of rows, organized with each column as a CarbonVector. It is the unit of query 
execution, organized to minimize the cost per row and achieve high 
cycles-per-instruction. The major fields are public by design to allow fast and 
convenient access by the vectorized query execution code.
    
    3.StructField:This class is used to pass the Schema Information to the 
Carbon Columnar Batch

---

[GitHub] carbondata issue #1538: [CARBONDATA-1779] GenericVectorizedReader

Reply via email to