[ 
https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386694#comment-16386694
 ] 

ASF GitHub Bot commented on ARROW-2199:
---------------------------------------

icexelloss commented on a change in pull request #1646: ARROW-2199: [JAVA] 
Control the memory allocated for inner vectors in containers.
URL: https://github.com/apache/arrow/pull/1646#discussion_r172316594
 
 

 ##########
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/DensityAwareVector.java
 ##########
 @@ -0,0 +1,57 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector;
+
+/**
+ * Vector that support density aware initial capacity settings.
+ * We use this for ListVector and VarCharVector as of now to
+ * control the memory allocated.
+ *
+ * For ListVector, we have been using a multiplier of 5
+ * to compute the initial capacity of the inner data vector.
+ * For deeply nested lists and lists with lots of NULL values,
+ * this is over-allocation upfront. So density helps to be
+ * conservative when computing the value capacity of the
+ * inner vector.
+ *
+ * For example, a density value of 10 implies each position in the
+ * list vector has a list of 10 values. So we will provision
+ * an initial capacity of (valuecount * 10) for the inner vector.
+ * A density value of 0.1 implies out of 10 positions in the list vector,
+ * 1 position has a list of size 1 and remaining positions are
 
 Review comment:
   I see. Thanks for the explanation.
   
   > Density is the average size of list per position in the List vector
   
   This is fine. 
   
   >   density value of 10 implies each position in the list vector has a list 
of 10 values.
   
   If I understand correctly, a density value of 10 can be either:
   * 10 sub list of 10 values
   * 1 sub list 100 values, 9 null sublists
   * ...
   As long as the average size of sub lists equals density.
   
   Is that correct? If so, can we make it clear in the doc?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is 
> never less than 1 and propagate density throughout the vector tree
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-2199
>                 URL: https://issues.apache.org/jira/browse/ARROW-2199
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java - Vectors
>            Reporter: Siddharth Teotia
>            Assignee: Siddharth Teotia
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to