[jira] [Commented] (HIVE-15987) Replace ColumnVector.isNull boolean[] impl. with BitSet

2017-02-20 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874196#comment-15874196
 ] 

Teddy Choi commented on HIVE-15987:
---

[~gopalv] I see. Thank you for feedback.

> Replace ColumnVector.isNull boolean[] impl. with BitSet
> ---
>
> Key: HIVE-15987
> URL: https://issues.apache.org/jira/browse/HIVE-15987
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>  Labels: incompatibleChange
>
> Most of data operations in Hive uses null operations. The current 
> implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits 
> per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per 
> 1 boolean with a backing long array. Also logical operations between longs 
> are much faster than ones with bytes as it uses less instructions per byte. 
> So it will bring 8x or more performance for null operations.
> However, there also are several cases that will make this improvement slow. 
> Such as simple reads will require more instructions per row. So it should 
> include benchmark tests to show its performance impact.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15987) Replace ColumnVector.isNull boolean[] impl. with BitSet

2017-02-19 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874144#comment-15874144
 ] 

Gopal V commented on HIVE-15987:


-1 for Hive-2.x branch storage-api impl, we consider this for Hive-3.0 branch 
since this breaks external interfaces to ORC and 3rd party vectorized udfs.

> Replace ColumnVector.isNull boolean[] impl. with BitSet
> ---
>
> Key: HIVE-15987
> URL: https://issues.apache.org/jira/browse/HIVE-15987
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Teddy Choi
>Assignee: Teddy Choi
>  Labels: incompatibleChange
>
> Most of data operations in Hive uses null operations. The current 
> implementation of ColumnVector.isNull uses a boolean array, which uses 8 bits 
> per 1 boolean. BitSet is a more compact representation, as it uses 1 bit per 
> 1 boolean with a backing long array. Also logical operations between longs 
> are much faster than ones with bytes as it uses less instructions per byte. 
> So it will bring 8x or more performance for null operations.
> However, there also are several cases that will make this improvement slow. 
> Such as simple reads will require more instructions per row. So it should 
> include benchmark tests to show its performance impact.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)