[jira] [Comment Edited] (ARROW-1922) Blog post on recent improvements/changes in JAVA Vectors

2017-12-14 Thread Gonzalo Ortiz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290492#comment-16290492
 ] 

Gonzalo Ortiz edited comment on ARROW-1922 at 12/14/17 8:03 AM:


I have a benchmark that compares Arrow 0.6 with plain ByteBuffers. It shows 
very bad results. I could try to update it to use the last version.

BTW, the benchmark is [here|https://github.com/gortiz/arrow-jmh]


was (Author: gortizja):
I have a benchmark that compares Arrow 0.6 with plain ByteBuffers. It shows 
very bad results. I could try to update it to use the last version.

> Blog post on recent improvements/changes in JAVA Vectors
> 
>
> Key: ARROW-1922
> URL: https://issues.apache.org/jira/browse/ARROW-1922
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java - Vectors
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1922) Blog post on recent improvements/changes in JAVA Vectors

2017-12-14 Thread Gonzalo Ortiz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290492#comment-16290492
 ] 

Gonzalo Ortiz commented on ARROW-1922:
--

I have a benchmark that compares Arrow 0.6 with plain ByteBuffers. It shows 
very bad results. I could try to update it to use the last version.

> Blog post on recent improvements/changes in JAVA Vectors
> 
>
> Key: ARROW-1922
> URL: https://issues.apache.org/jira/browse/ARROW-1922
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java - Vectors
>Reporter: Siddharth Teotia
>Assignee: Siddharth Teotia
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1710) [Java] Decide what to do with non-nullable vectors in new vector class hierarchy

2017-10-23 Thread Gonzalo Ortiz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214748#comment-16214748
 ] 

Gonzalo Ortiz commented on ARROW-1710:
--

I'm a little bit concern about performance loss if non-nullable vectors are 
removed. Arrow on Java has quite bad performance so far (compared with plan 
ByteBuffers) and to remove non-nullable vectors can make it even worst.

> [Java] Decide what to do with non-nullable vectors in new vector class 
> hierarchy 
> -
>
> Key: ARROW-1710
> URL: https://issues.apache.org/jira/browse/ARROW-1710
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: Java - Vectors
>Reporter: Li Jin
> Fix For: 0.8.0
>
>
> So far the consensus seems to be remove all non-nullable vectors. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1463) [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code

2017-09-25 Thread Gonzalo Ortiz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178694#comment-16178694
 ] 

Gonzalo Ortiz commented on ARROW-1463:
--

[~laurentgo] is great to have that in mind. I'm really interested in how to be 
sure that operations on Arrow Vectors are, indeed, vectorized at CPU level and 
once Panama is released, we will be able to control that. But AFAIK they don't 
have a clear API (yet) that Arrow can try to get close to. They just have a 
proposal that it is very likely to change.

> [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated 
> code
> 
>
> Key: ARROW-1463
> URL: https://issues.apache.org/jira/browse/ARROW-1463
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Siddharth Teotia
>
> The templates used in the java package are very high mainteance and the if 
> conditions are hard to track. As started in the discussion here: 
> https://github.com/apache/arrow/pull/1012, I'd like to propose that we modify 
> the structure of the internal value vectors and code generation dynamics.
> Create new abstract base vectors:
> BaseFixedVector
> BaseVariableVector
> BaseNullableVector
> For each of these, implement all the basic functionality of a vector without 
> using templating.
> Evaluate whether to use code generation to generate specific specializations 
> of this functionality for each type where needed for performance purposes 
> (probably constrained to mutator and accessor set/get methods). Giant and 
> complex if conditions in the templates are actually worse from my perspective 
> than a small amount of hand written duplicated code since templates are much 
> harder to work with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1463) [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code

2017-09-07 Thread Gonzalo Ortiz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156598#comment-16156598
 ] 

Gonzalo Ortiz commented on ARROW-1463:
--

On the one hand, what [~wesmckinn] is right. They could be the same structure. 
On the other, they are not exactly semantically equal. Non nullable structures 
doesn't return nulls, which can be used by some static checkers or languages 
like Kotlin. 

There are also two performance issues, one on each model. They are just a 
theoretical reasoning (that may be wrong!), we would need to verify that some 
JMH test (and it can change from one JVM to another):
* A structure that doesn't need to check the nullable array would have shorter 
and simpler methods (without branches) and therefore it is more probable that 
the JVM can optimize that. So this is a point on having two implementations: 
one with nulls and one without them.
* If there are two structures (a nullable and a non nullable) and the nullable 
is just a decoration/delegation on the non-nullable, nullable structures may 
behave quite worst than we expect for the same reason. As soon as the methods 
get complex (calling the delegate class) the JVM may or may not optimize them.

TL;RD: From the library developer perspective, I would like to have only one 
implementation. From a user perspective, I would like to have both 
implementations, ideally without delegation (which means repeated code on 
nullable-structure)

> [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated 
> code
> 
>
> Key: ARROW-1463
> URL: https://issues.apache.org/jira/browse/ARROW-1463
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: SIDDHARTH TEOTIA
>
> The templates used in the java package are very high mainteance and the if 
> conditions are hard to track. As started in the discussion here: 
> https://github.com/apache/arrow/pull/1012, I'd like to propose that we modify 
> the structure of the internal value vectors and code generation dynamics.
> Create new abstract base vectors:
> BaseFixedVector
> BaseVariableVector
> BaseNullableVector
> For each of these, implement all the basic functionality of a vector without 
> using templating.
> Evaluate whether to use code generation to generate specific specializations 
> of this functionality for each type where needed for performance purposes 
> (probably constrained to mutator and accessor set/get methods). Giant and 
> complex if conditions in the templates are actually worse from my perspective 
> than a small amount of hand written duplicated code since templates are much 
> harder to work with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1463) [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code

2017-09-06 Thread Gonzalo Ortiz (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154874#comment-16154874
 ] 

Gonzalo Ortiz commented on ARROW-1463:
--

+1

I have found that the current hierarchy and heavily use of delegation makes 
implementations like IntVector very slow compared to ByteBuffer. I have some 
JMH benchmarks where I iterate on a buffer, reading the content as ints to 
accumulate them. When using ByteBuffer it takes 17us to execute an iteration. 
With IntVector.Accessor it gets 100 us!

> [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated 
> code
> 
>
> Key: ARROW-1463
> URL: https://issues.apache.org/jira/browse/ARROW-1463
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: SIDDHARTH TEOTIA
>
> The templates used in the java package are very high mainteance and the if 
> conditions are hard to track. As started in the discussion here: 
> https://github.com/apache/arrow/pull/1012, I'd like to propose that we modify 
> the structure of the internal value vectors and code generation dynamics.
> Create new abstract base vectors:
> BaseFixedVector
> BaseVariableVector
> BaseNullableVector
> For each of these, implement all the basic functionality of a vector without 
> using templating.
> Evaluate whether to use code generation to generate specific specializations 
> of this functionality for each type where needed for performance purposes 
> (probably constrained to mutator and accessor set/get methods). Giant and 
> complex if conditions in the templates are actually worse from my perspective 
> than a small amount of hand written duplicated code since templates are much 
> harder to work with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)