[ 
https://issues.apache.org/jira/browse/ARROW-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386620#comment-16386620
 ] 

ASF GitHub Bot commented on ARROW-2199:
---------------------------------------

siddharthteotia commented on a change in pull request #1646: ARROW-2199: [JAVA] 
Control the memory allocated for inner vectors in containers.
URL: https://github.com/apache/arrow/pull/1646#discussion_r172302307
 
 

 ##########
 File path: 
java/vector/src/main/java/org/apache/arrow/vector/DensityAwareVector.java
 ##########
 @@ -0,0 +1,57 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.arrow.vector;
+
+/**
+ * Vector that support density aware initial capacity settings.
+ * We use this for ListVector and VarCharVector as of now to
+ * control the memory allocated.
+ *
+ * For ListVector, we have been using a multiplier of 5
+ * to compute the initial capacity of the inner data vector.
+ * For deeply nested lists and lists with lots of NULL values,
+ * this is over-allocation upfront. So density helps to be
+ * conservative when computing the value capacity of the
+ * inner vector.
+ *
+ * For example, a density value of 10 implies each position in the
+ * list vector has a list of 10 values. So we will provision
+ * an initial capacity of (valuecount * 10) for the inner vector.
+ * A density value of 0.1 implies out of 10 positions in the list vector,
+ * 1 position has a list of size 1 and remaining positions are
 
 Review comment:
   The doc doesn't mix that. It clearly indicates the purpose of density:
   
   For example, a
      *                density value of 10 implies each position in the list
      *                vector has a list of 10 values.
      *                A density value of 0.1 implies out of 10 positions in
      *                the list vector, 1 position has a list of size 1 and
      *                remaining positions are null (no lists) or empty lists.
      *                This helps in tightly controlling the memory we provision
      *                for inner data vector.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [JAVA] Follow up fixes for ARROW-2019. Ensure density driven capacity is 
> never less than 1 and propagate density throughout the vector tree
> -------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-2199
>                 URL: https://issues.apache.org/jira/browse/ARROW-2199
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Java - Vectors
>            Reporter: Siddharth Teotia
>            Assignee: Siddharth Teotia
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to