[ 
https://issues.apache.org/jira/browse/ARROW-1922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291563#comment-16291563
 ] 

ASF GitHub Bot commented on ARROW-1922:
---------------------------------------

siddharthteotia commented on a change in pull request #1419: ARROW-1922: Blog 
post on JAVA vector changes
URL: https://github.com/apache/arrow/pull/1419#discussion_r157053774
 
 

 ##########
 File path: site/_posts/2017-12-13-java-vector-improvements.md
 ##########
 @@ -0,0 +1,110 @@
+---
+layout: post
+title: "Improved JAVA Vector APIs"
+excerpt: "This post describes the recent improvements in JAVA Vector code"
+date: 2017-12-13 12:50:00
+author: Siddharth Teotia
+categories: [application]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+This post gives insight into the major improvements in the JAVA implementation
+of vectors.
+
+## Design Goals
+
+1. Improved Maintainability and Extensibility.
+2. Improved heap usage.
+3. No performance overhead on hot code paths.
+
+## Background
+
+**Improved Maintainability and Extensibility** 
+
+We use templates in several places for compile time JAVA code generation for
+different vector classes, readers, writers etc. Templates are helpful as the
+developers don't have to write a lot of duplicate code. 
+
+However, we realized that over a period of time some specific JAVA 
+templates became extremely complex with giant if-else blocks, poor code 
indentation
+and documentation. All this impacted the ability to easily extend these 
templates 
+for adding new functionality or improving the existing infrastructure.
+
+So we evaluated the usage of templates for compile time code generation and
+decided not to use complex templates in some places by writing small amount of 
+duplicate code which is elegant, well documented and extensible.
+
+**Improved Heap Usage**
+
+We did extensive memory analysis downstream in Dremio where Arrow is used
+heavily for in-memory query execution on columnar data. The general conclusion
+was that Arrow JAVA Vectors have non-negligible heap overhead and volume of 
+objects was too high. There were places in code where we were creating objects
+unnecessarily and using structures that could be substituted with better
+alternatives.
+
+**No performance overhead on hot code paths**
 
 Review comment:
   Done.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Blog post on recent improvements/changes in JAVA Vectors
> --------------------------------------------------------
>
>                 Key: ARROW-1922
>                 URL: https://issues.apache.org/jira/browse/ARROW-1922
>             Project: Apache Arrow
>          Issue Type: Task
>          Components: Java - Vectors
>            Reporter: Siddharth Teotia
>            Assignee: Siddharth Teotia
>              Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to