[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072859#comment-16072859 ] ASF GitHub Bot commented on DRILL-5517: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/840 > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070405#comment-16070405 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r125083942 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -546,238 +576,400 @@ public DateTime getObject(int index) { public ${minor.javaType!type.javaType} getPrimitiveObject(int index) { return get(index); } - + public void get(int index, ${minor.class}Holder holder){ <#if minor.class.startsWith("Decimal")> holder.scale = getField().getScale(); holder.precision = getField().getPrecision(); - holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * ${type.width}); + holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * VALUE_WIDTH); } public void get(int index, Nullable${minor.class}Holder holder){ holder.isSet = 1; - holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * ${type.width}); + holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * VALUE_WIDTH); } <#-- type.width --> - } - - /** - * ${minor.class}.Mutator implements a mutable vector of fixed width values. Elements in the - * vector are accessed by position from the logical start of the vector. Values should be pushed - * onto the vector sequentially, but may be randomly accessed. - * The width of each element is ${type.width} byte(s) - * The equivalent Java primitive is '${minor.javaType!type.javaType}' - * - * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. - */ - public final class Mutator extends BaseDataValueVector.BaseMutator { - -private Mutator(){}; - /** -* Set the element at the given index to the given value. Note that widths smaller than -* 32 bits are handled by the DrillBuf interface. -* -* @param index position of the bit to set -* @param value value to set -*/ + } + + /** + * ${minor.class}.Mutator implements a mutable vector of fixed width values. Elements in the + * vector are accessed by position from the logical start of the vector. Values should be pushed + * onto the vector sequentially, but may be randomly accessed. + * + * The width of each element is {@link #VALUE_WIDTH} (= ${type.width}) byte(s). + * The equivalent Java primitive is '${minor.javaType!type.javaType}' + * + * + * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + */ + public final class Mutator extends BaseDataValueVector.BaseMutator { + +private Mutator() {}; + +/** + * Set the element at the given index to the given value. Note that widths smaller than + * 32 bits are handled by the DrillBuf interface. + * + * @param index position of the bit to set + * @param value value to set + */ + <#if (type.width > 8)> public void set(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) { - data.setBytes(index * ${type.width}, value, 0, ${type.width}); + data.setBytes(index * VALUE_WIDTH, value, 0, VALUE_WIDTH); } public void setSafe(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) { while(index >= getValueCapacity()) { reAlloc(); } - data.setBytes(index * ${type.width}, value, 0, ${type.width}); + data.setBytes(index * VALUE_WIDTH, value, 0, VALUE_WIDTH); } - <#if (minor.class == "Interval")> -public void set(int index, int months, int days, int milliseconds){ - final int offsetIndex = index * ${type.width}; - data.setInt(offsetIndex, months); - data.setInt((offsetIndex + ${minor.daysOffset}), days); - data.setInt((offsetIndex + ${minor.millisecondsOffset}), milliseconds); +/** + * Set the value of a required or nullable vector. Enforces the value + * and size limits. + * @param index item to write + * @return true if the item was written, false if the index would --- End diff -- Fixed here and several other places. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/j
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069417#comment-16069417 ] ASF GitHub Bot commented on DRILL-5517: --- Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/840 Minor followup comment. LGTM. +1 > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069414#comment-16069414 ] ASF GitHub Bot commented on DRILL-5517: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124958378 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -546,238 +576,400 @@ public DateTime getObject(int index) { public ${minor.javaType!type.javaType} getPrimitiveObject(int index) { return get(index); } - + public void get(int index, ${minor.class}Holder holder){ <#if minor.class.startsWith("Decimal")> holder.scale = getField().getScale(); holder.precision = getField().getPrecision(); - holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * ${type.width}); + holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * VALUE_WIDTH); } public void get(int index, Nullable${minor.class}Holder holder){ holder.isSet = 1; - holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * ${type.width}); + holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * VALUE_WIDTH); } <#-- type.width --> - } - - /** - * ${minor.class}.Mutator implements a mutable vector of fixed width values. Elements in the - * vector are accessed by position from the logical start of the vector. Values should be pushed - * onto the vector sequentially, but may be randomly accessed. - * The width of each element is ${type.width} byte(s) - * The equivalent Java primitive is '${minor.javaType!type.javaType}' - * - * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. - */ - public final class Mutator extends BaseDataValueVector.BaseMutator { - -private Mutator(){}; - /** -* Set the element at the given index to the given value. Note that widths smaller than -* 32 bits are handled by the DrillBuf interface. -* -* @param index position of the bit to set -* @param value value to set -*/ + } + + /** + * ${minor.class}.Mutator implements a mutable vector of fixed width values. Elements in the + * vector are accessed by position from the logical start of the vector. Values should be pushed + * onto the vector sequentially, but may be randomly accessed. + * + * The width of each element is {@link #VALUE_WIDTH} (= ${type.width}) byte(s). + * The equivalent Java primitive is '${minor.javaType!type.javaType}' + * + * + * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + */ + public final class Mutator extends BaseDataValueVector.BaseMutator { + +private Mutator() {}; + +/** + * Set the element at the given index to the given value. Note that widths smaller than + * 32 bits are handled by the DrillBuf interface. + * + * @param index position of the bit to set + * @param value value to set + */ + <#if (type.width > 8)> public void set(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) { - data.setBytes(index * ${type.width}, value, 0, ${type.width}); + data.setBytes(index * VALUE_WIDTH, value, 0, VALUE_WIDTH); } public void setSafe(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) { while(index >= getValueCapacity()) { reAlloc(); } - data.setBytes(index * ${type.width}, value, 0, ${type.width}); + data.setBytes(index * VALUE_WIDTH, value, 0, VALUE_WIDTH); } - <#if (minor.class == "Interval")> -public void set(int index, int months, int days, int milliseconds){ - final int offsetIndex = index * ${type.width}; - data.setInt(offsetIndex, months); - data.setInt((offsetIndex + ${minor.daysOffset}), days); - data.setInt((offsetIndex + ${minor.millisecondsOffset}), milliseconds); +/** + * Set the value of a required or nullable vector. Enforces the value + * and size limits. + * @param index item to write + * @return true if the item was written, false if the index would --- End diff -- What about this ? The return type in Javadoc is not consistent with the return type here and a few other places in this file. > Provide size-aware set operations in value vectors > ---
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069413#comment-16069413 ] ASF GitHub Bot commented on DRILL-5517: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124958090 --- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java --- @@ -674,6 +764,14 @@ public void reset(){ setCount = 0; <#if type.major = "VarLen">lastSet = -1; } + +@Override +public void exchange(ValueVector.Mutator other) { --- End diff -- ok, thanks for the explanation. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069344#comment-16069344 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124938101 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -546,238 +576,400 @@ public DateTime getObject(int index) { public ${minor.javaType!type.javaType} getPrimitiveObject(int index) { return get(index); } - + public void get(int index, ${minor.class}Holder holder){ <#if minor.class.startsWith("Decimal")> holder.scale = getField().getScale(); holder.precision = getField().getPrecision(); - holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * ${type.width}); + holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * VALUE_WIDTH); } public void get(int index, Nullable${minor.class}Holder holder){ holder.isSet = 1; - holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * ${type.width}); + holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * VALUE_WIDTH); } <#-- type.width --> - } - - /** - * ${minor.class}.Mutator implements a mutable vector of fixed width values. Elements in the - * vector are accessed by position from the logical start of the vector. Values should be pushed - * onto the vector sequentially, but may be randomly accessed. - * The width of each element is ${type.width} byte(s) - * The equivalent Java primitive is '${minor.javaType!type.javaType}' - * - * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. - */ - public final class Mutator extends BaseDataValueVector.BaseMutator { - -private Mutator(){}; - /** -* Set the element at the given index to the given value. Note that widths smaller than -* 32 bits are handled by the DrillBuf interface. -* -* @param index position of the bit to set -* @param value value to set -*/ + } + + /** + * ${minor.class}.Mutator implements a mutable vector of fixed width values. Elements in the + * vector are accessed by position from the logical start of the vector. Values should be pushed + * onto the vector sequentially, but may be randomly accessed. + * + * The width of each element is {@link #VALUE_WIDTH} (= ${type.width}) byte(s). + * The equivalent Java primitive is '${minor.javaType!type.javaType}' + * + * + * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + */ + public final class Mutator extends BaseDataValueVector.BaseMutator { + +private Mutator() {}; + +/** + * Set the element at the given index to the given value. Note that widths smaller than + * 32 bits are handled by the DrillBuf interface. + * + * @param index position of the bit to set + * @param value value to set + */ + <#if (type.width > 8)> public void set(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) { - data.setBytes(index * ${type.width}, value, 0, ${type.width}); + data.setBytes(index * VALUE_WIDTH, value, 0, VALUE_WIDTH); } public void setSafe(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) { while(index >= getValueCapacity()) { reAlloc(); } - data.setBytes(index * ${type.width}, value, 0, ${type.width}); + data.setBytes(index * VALUE_WIDTH, value, 0, VALUE_WIDTH); } - <#if (minor.class == "Interval")> -public void set(int index, int months, int days, int milliseconds){ - final int offsetIndex = index * ${type.width}; - data.setInt(offsetIndex, months); - data.setInt((offsetIndex + ${minor.daysOffset}), days); - data.setInt((offsetIndex + ${minor.millisecondsOffset}), milliseconds); +/** + * Set the value of a required or nullable vector. Enforces the value + * and size limits. + * @param index item to write + * @return true if the item was written, false if the index would + * overfill the vector + */ + +public void setScalar(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) throws VectorOverflowException { --- End diff -- Actually, I copied this code fro
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069342#comment-16069342 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124950905 --- Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/VectorUtils.java --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector; + +public class VectorUtils { + + /** + * Vectors cannot be any larger than the Netty memory allocation + * block size. + */ + + private static final int ABSOLUTE_MAX_SIZE = 16 * 1024 * 1024; + private static final int ABSOLUTE_MIN_SIZE = 16 * 1024; --- End diff -- This code allows a boot-time option to limit maximum vector size. I thought it would be handy for testing, but turned out not to be necessary, so I might remove it. To prevent failures when the value is set wrong, I clamped the value to some minimum and maximum. The maximum is set by the Netty block size. The minimum I just made up: seems if we limit vector size to much smaller than 16 K we'd be unlikely to make progress. Added a comment to clarify. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069341#comment-16069341 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124938564 --- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java --- @@ -674,6 +764,14 @@ public void reset(){ setCount = 0; <#if type.major = "VarLen">lastSet = -1; } + +@Override +public void exchange(ValueVector.Mutator other) { --- End diff -- Yes, indeed I am trusting the caller, in the same way we trust the caller to free a vector's buffer at the right time, to call `setSafe()` instead of `set()`, to write to the vector only once, and so on. In short, if someone does try to use `exchange()` with a vector owned by another allocator, the allocator is likely to complain about memory leaks. An alternative is to always use transfer pairs, though that adds quite a bit of complexity for this one use case. This method has a very specific usage: an operator writes to a vector, discovers that the vector overflows, and must swap buffers between two sets of vectors. This swapping is necessary because downstream operators depend on a vector instance, as does the writer, so it is necessary to swap buffers into and out of these fixed set of value vectors. Rather confusing, but the most performant solution given how vectors work today. For this particular method, confusion is understandable. This is an `exchange()` on the `Mutator` class, swapping the state that nullable vector mutators maintain. It is called from a vector `exchange()` method earlier in this file. Added a comment to help future readers. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069343#comment-16069343 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124937740 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -31,68 +31,98 @@ import org.apache.drill.exec.util.DecimalUtility; /** - * ${minor.class} implements a vector of fixed width values. Elements in the vector are accessed - * by position, starting from the logical start of the vector. Values should be pushed onto the - * vector sequentially, but may be randomly accessed. - * The width of each element is ${type.width} byte(s) - * The equivalent Java primitive is '${minor.javaType!type.javaType}' + * ${minor.class} implements a vector of fixed width values. Elements in the vector are accessed + * by position, starting from the logical start of the vector. Values should be pushed onto the + * vector sequentially, but may be accessed randomly. + * + * The width of each element is {@link #VALUE_WIDTH} (= ${type.width}) byte<#if type.width != 1>s. + * The equivalent Java primitive is '${minor.javaType!type.javaType}'. + * * * NB: this class is automatically generated from ${.template_name} and ValueVectorTypes.tdd using FreeMarker. */ -public final class ${minor.class}Vector extends BaseDataValueVector implements FixedWidthVector{ +public final class ${minor.class}Vector extends BaseDataValueVector implements FixedWidthVector { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(${minor.class}Vector.class); + /** + * Width of each fixed-width value. + */ + + public static final int VALUE_WIDTH = ${type.width}; + + /** + * Maximum number of values that this fixed-width vector can hold + * and stay below the maximum vector size limit. This is the limit + * enforced when the vector is used to hold values in a repeated + * vector. + */ + + public static final int MAX_VALUE_COUNT = MAX_BUFFER_SIZE / VALUE_WIDTH; + + /** + * Maximum number of values that this fixed-width vector can hold + * and stay below the maximum vector size limit and/or stay below + * the maximum row count. This is the limit enforced when the + * vector is used to hold scalar (required or nullable) values. + */ + + public static final int MAX_SCALAR_COUNT = Math.min(MAX_ROW_COUNT, MAX_VALUE_COUNT); --- End diff -- @Added comment. For me, Eclipse sometimes opens the wrong version of this file, preventing it from finding the declaration of symbols, so the explanation is not entirely unnecessary... > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068503#comment-16068503 ] ASF GitHub Bot commented on DRILL-5517: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124833987 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -546,238 +576,400 @@ public DateTime getObject(int index) { public ${minor.javaType!type.javaType} getPrimitiveObject(int index) { return get(index); } - + public void get(int index, ${minor.class}Holder holder){ <#if minor.class.startsWith("Decimal")> holder.scale = getField().getScale(); holder.precision = getField().getPrecision(); - holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * ${type.width}); + holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * VALUE_WIDTH); } public void get(int index, Nullable${minor.class}Holder holder){ holder.isSet = 1; - holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * ${type.width}); + holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * VALUE_WIDTH); } <#-- type.width --> - } - - /** - * ${minor.class}.Mutator implements a mutable vector of fixed width values. Elements in the - * vector are accessed by position from the logical start of the vector. Values should be pushed - * onto the vector sequentially, but may be randomly accessed. - * The width of each element is ${type.width} byte(s) - * The equivalent Java primitive is '${minor.javaType!type.javaType}' - * - * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. - */ - public final class Mutator extends BaseDataValueVector.BaseMutator { - -private Mutator(){}; - /** -* Set the element at the given index to the given value. Note that widths smaller than -* 32 bits are handled by the DrillBuf interface. -* -* @param index position of the bit to set -* @param value value to set -*/ + } + + /** + * ${minor.class}.Mutator implements a mutable vector of fixed width values. Elements in the + * vector are accessed by position from the logical start of the vector. Values should be pushed + * onto the vector sequentially, but may be randomly accessed. + * + * The width of each element is {@link #VALUE_WIDTH} (= ${type.width}) byte(s). + * The equivalent Java primitive is '${minor.javaType!type.javaType}' + * + * + * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + */ + public final class Mutator extends BaseDataValueVector.BaseMutator { + +private Mutator() {}; + +/** + * Set the element at the given index to the given value. Note that widths smaller than + * 32 bits are handled by the DrillBuf interface. + * + * @param index position of the bit to set + * @param value value to set + */ + <#if (type.width > 8)> public void set(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) { - data.setBytes(index * ${type.width}, value, 0, ${type.width}); + data.setBytes(index * VALUE_WIDTH, value, 0, VALUE_WIDTH); } public void setSafe(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) { while(index >= getValueCapacity()) { reAlloc(); } - data.setBytes(index * ${type.width}, value, 0, ${type.width}); + data.setBytes(index * VALUE_WIDTH, value, 0, VALUE_WIDTH); } - <#if (minor.class == "Interval")> -public void set(int index, int months, int days, int milliseconds){ - final int offsetIndex = index * ${type.width}; - data.setInt(offsetIndex, months); - data.setInt((offsetIndex + ${minor.daysOffset}), days); - data.setInt((offsetIndex + ${minor.millisecondsOffset}), milliseconds); +/** + * Set the value of a required or nullable vector. Enforces the value + * and size limits. + * @param index item to write + * @return true if the item was written, false if the index would --- End diff -- Method returns a void but comment says the method returns a boolean.. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 >
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068504#comment-16068504 ] ASF GitHub Bot commented on DRILL-5517: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124813604 --- Diff: exec/vector/src/main/codegen/templates/NullableValueVectors.java --- @@ -674,6 +764,14 @@ public void reset(){ setCount = 0; <#if type.major = "VarLen">lastSet = -1; } + +@Override +public void exchange(ValueVector.Mutator other) { --- End diff -- How does swapping the value count exchange the buffer ? The name 'exchange' indicates more than what this method is doing. The JIRA description says the 'exchange is for vectors within a single operator' but I am not sure how that is enforced considering the ValueVector.Mutator that is passed in could belong to any ValueVector. So, essentially we are trusting the caller to use it only for specific cases. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068507#comment-16068507 ] ASF GitHub Bot commented on DRILL-5517: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124710371 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -546,238 +576,400 @@ public DateTime getObject(int index) { public ${minor.javaType!type.javaType} getPrimitiveObject(int index) { return get(index); } - + public void get(int index, ${minor.class}Holder holder){ <#if minor.class.startsWith("Decimal")> holder.scale = getField().getScale(); holder.precision = getField().getPrecision(); - holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * ${type.width}); + holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * VALUE_WIDTH); } public void get(int index, Nullable${minor.class}Holder holder){ holder.isSet = 1; - holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * ${type.width}); + holder.value = data.get${(minor.javaType!type.javaType)?cap_first}(index * VALUE_WIDTH); } <#-- type.width --> - } - - /** - * ${minor.class}.Mutator implements a mutable vector of fixed width values. Elements in the - * vector are accessed by position from the logical start of the vector. Values should be pushed - * onto the vector sequentially, but may be randomly accessed. - * The width of each element is ${type.width} byte(s) - * The equivalent Java primitive is '${minor.javaType!type.javaType}' - * - * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. - */ - public final class Mutator extends BaseDataValueVector.BaseMutator { - -private Mutator(){}; - /** -* Set the element at the given index to the given value. Note that widths smaller than -* 32 bits are handled by the DrillBuf interface. -* -* @param index position of the bit to set -* @param value value to set -*/ + } + + /** + * ${minor.class}.Mutator implements a mutable vector of fixed width values. Elements in the + * vector are accessed by position from the logical start of the vector. Values should be pushed + * onto the vector sequentially, but may be randomly accessed. + * + * The width of each element is {@link #VALUE_WIDTH} (= ${type.width}) byte(s). + * The equivalent Java primitive is '${minor.javaType!type.javaType}' + * + * + * NB: this class is automatically generated from ValueVectorTypes.tdd using FreeMarker. + */ + public final class Mutator extends BaseDataValueVector.BaseMutator { + +private Mutator() {}; + +/** + * Set the element at the given index to the given value. Note that widths smaller than + * 32 bits are handled by the DrillBuf interface. + * + * @param index position of the bit to set + * @param value value to set + */ + <#if (type.width > 8)> public void set(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) { - data.setBytes(index * ${type.width}, value, 0, ${type.width}); + data.setBytes(index * VALUE_WIDTH, value, 0, VALUE_WIDTH); } public void setSafe(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) { while(index >= getValueCapacity()) { reAlloc(); } - data.setBytes(index * ${type.width}, value, 0, ${type.width}); + data.setBytes(index * VALUE_WIDTH, value, 0, VALUE_WIDTH); } - <#if (minor.class == "Interval")> -public void set(int index, int months, int days, int milliseconds){ - final int offsetIndex = index * ${type.width}; - data.setInt(offsetIndex, months); - data.setInt((offsetIndex + ${minor.daysOffset}), days); - data.setInt((offsetIndex + ${minor.millisecondsOffset}), milliseconds); +/** + * Set the value of a required or nullable vector. Enforces the value + * and size limits. + * @param index item to write + * @return true if the item was written, false if the index would + * overfill the vector + */ + +public void setScalar(int index, <#if (type.width > 4)>${minor.javaType!type.javaType}<#else>int value) throws VectorOverflowException { --- End diff -- Is there a more readable way of
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068505#comment-16068505 ] ASF GitHub Bot commented on DRILL-5517: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124706419 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -31,68 +31,98 @@ import org.apache.drill.exec.util.DecimalUtility; /** - * ${minor.class} implements a vector of fixed width values. Elements in the vector are accessed - * by position, starting from the logical start of the vector. Values should be pushed onto the - * vector sequentially, but may be randomly accessed. - * The width of each element is ${type.width} byte(s) - * The equivalent Java primitive is '${minor.javaType!type.javaType}' + * ${minor.class} implements a vector of fixed width values. Elements in the vector are accessed + * by position, starting from the logical start of the vector. Values should be pushed onto the + * vector sequentially, but may be accessed randomly. + * + * The width of each element is {@link #VALUE_WIDTH} (= ${type.width}) byte<#if type.width != 1>s. + * The equivalent Java primitive is '${minor.javaType!type.javaType}'. + * * * NB: this class is automatically generated from ${.template_name} and ValueVectorTypes.tdd using FreeMarker. */ -public final class ${minor.class}Vector extends BaseDataValueVector implements FixedWidthVector{ +public final class ${minor.class}Vector extends BaseDataValueVector implements FixedWidthVector { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(${minor.class}Vector.class); + /** + * Width of each fixed-width value. + */ + + public static final int VALUE_WIDTH = ${type.width}; + + /** + * Maximum number of values that this fixed-width vector can hold + * and stay below the maximum vector size limit. This is the limit + * enforced when the vector is used to hold values in a repeated + * vector. + */ + + public static final int MAX_VALUE_COUNT = MAX_BUFFER_SIZE / VALUE_WIDTH; + + /** + * Maximum number of values that this fixed-width vector can hold + * and stay below the maximum vector size limit and/or stay below + * the maximum row count. This is the limit enforced when the + * vector is used to hold scalar (required or nullable) values. + */ + + public static final int MAX_SCALAR_COUNT = Math.min(MAX_ROW_COUNT, MAX_VALUE_COUNT); --- End diff -- Where is the constant MAX_ROW_COUNT defined ? (later I saw it defined in the ValueVector interface...perhaps add a comment here) > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068506#comment-16068506 ] ASF GitHub Bot commented on DRILL-5517: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r124835442 --- Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/VectorUtils.java --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.vector; + +public class VectorUtils { + + /** + * Vectors cannot be any larger than the Netty memory allocation + * block size. + */ + + private static final int ABSOLUTE_MAX_SIZE = 16 * 1024 * 1024; + private static final int ABSOLUTE_MIN_SIZE = 16 * 1024; --- End diff -- It wasn't clear why the absolute_min_size is 16K .. that's not related to Netty I presume. Couldn't it be much smaller ? > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065681#comment-16065681 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/840 Reran pre-commit tests. One unit test fails, but in an area untouched by this commit. Indeed, this appears to be a random failure as the very same tests passes on my Mac and on my Linux development VM. See DRILL-5612. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064001#comment-16064001 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/840 Rebased on master. Resolved merge conflict. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056840#comment-16056840 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r123129202 --- Diff: exec/memory/base/src/main/java/io/netty/buffer/UnsafeDirectLittleEndian.java --- @@ -174,6 +175,40 @@ public ByteBuf setDouble(int index, double value) { return this; } + // Clone of the super class checkIndex, but this version returns a boolean rather + // than throwing an exception. + + protected boolean hasCapacity(int index, int fieldLength) { +if (fieldLength < 0) { +throw new IllegalArgumentException("length: " + fieldLength + " (expected: >= 0)"); +} +return (! (index < 0 || index > capacity() - fieldLength)); + } + + // Clone of the super class setBytes(), but with bounds checking done as a boolean, + // not assertion. + + public boolean setBytesBounded(int index, byte[] src, int srcIndex, int length) { +if (! hasCapacity(index, length)) { + return false; +} +if (length != 0) { --- End diff -- Fixed > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056841#comment-16056841 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r123131374 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -806,10 +998,32 @@ public void generateTestDataAlt(int size) { } <#-- type.width --> +/** + * Backfill missing offsets from the given last written position to the + * given current write position. Used by the "new" size-safe column + * writers to allow skipping values. The set() and setSafe() + * do not fill empties. See DRILL-5529 and DRILL-. + * @param lastWrite the position of the last valid write: the offset + * to be copied forward + * @param index the current write position filling occurs up to, + * but not including, this position + */ + +public void fillEmptiesBounded(int lastWrite, int index) +throws VectorOverflowException { + for (int i = lastWrite; i < index; i++) { +<#if type.width <= 8> --- End diff -- Fixed. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056843#comment-16056843 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r123134097 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -548,6 +567,23 @@ public void setSafe(int index, ByteBuffer bytes, int start, int length) { } } +public void setScalar(int index, DrillBuf bytes, int start, int length) throws VectorOverflowException { + assert index >= 0; + + if (index >= MAX_ROW_COUNT) { +throw new VectorOverflowException(); + } + int currentOffset = offsetVector.getAccessor().get(index); + final int newSize = currentOffset + length; + if (newSize > MAX_BUFFER_SIZE) { +throw new VectorOverflowException(); + } + while (! data.setBytesBounded(currentOffset, bytes, start, length)) { --- End diff -- Sorry, where? Looks OK to me... > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056842#comment-16056842 ] ASF GitHub Bot commented on DRILL-5517: --- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r123129149 --- Diff: exec/memory/base/src/main/java/io/netty/buffer/UnsafeDirectLittleEndian.java --- @@ -174,6 +175,40 @@ public ByteBuf setDouble(int index, double value) { return this; } + // Clone of the super class checkIndex, but this version returns a boolean rather + // than throwing an exception. + + protected boolean hasCapacity(int index, int fieldLength) { +if (fieldLength < 0) { --- End diff -- Fixed > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054181#comment-16054181 ] ASF GitHub Bot commented on DRILL-5517: --- Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r118806486 --- Diff: exec/memory/base/src/main/java/io/netty/buffer/UnsafeDirectLittleEndian.java --- @@ -174,6 +175,40 @@ public ByteBuf setDouble(int index, double value) { return this; } + // Clone of the super class checkIndex, but this version returns a boolean rather + // than throwing an exception. + + protected boolean hasCapacity(int index, int fieldLength) { +if (fieldLength < 0) { --- End diff -- change this to an assertion as per our discussion? > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054183#comment-16054183 ] ASF GitHub Bot commented on DRILL-5517: --- Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r122526465 --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java --- @@ -806,10 +998,32 @@ public void generateTestDataAlt(int size) { } <#-- type.width --> +/** + * Backfill missing offsets from the given last written position to the + * given current write position. Used by the "new" size-safe column + * writers to allow skipping values. The set() and setSafe() + * do not fill empties. See DRILL-5529 and DRILL-. + * @param lastWrite the position of the last valid write: the offset + * to be copied forward + * @param index the current write position filling occurs up to, + * but not including, this position + */ + +public void fillEmptiesBounded(int lastWrite, int index) +throws VectorOverflowException { + for (int i = lastWrite; i < index; i++) { +<#if type.width <= 8> --- End diff -- You can avoid an extra op in the loop by adjusting the bounds of the induction variable. The compiler's induction variable analysis might automatically figure this one out. public void fillEmptiesBounded(int lastWrite, int index) throws VectorOverflowException { for (int i = lastWrite + 1; i <= index; i++) { setSafe(i, (int) 0);<-- one less addition in the loop and less register pressure } } > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054180#comment-16054180 ] ASF GitHub Bot commented on DRILL-5517: --- Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r122533121 --- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java --- @@ -548,6 +567,23 @@ public void setSafe(int index, ByteBuffer bytes, int start, int length) { } } +public void setScalar(int index, DrillBuf bytes, int start, int length) throws VectorOverflowException { + assert index >= 0; + + if (index >= MAX_ROW_COUNT) { +throw new VectorOverflowException(); + } + int currentOffset = offsetVector.getAccessor().get(index); + final int newSize = currentOffset + length; + if (newSize > MAX_BUFFER_SIZE) { +throw new VectorOverflowException(); + } + while (! data.setBytesBounded(currentOffset, bytes, start, length)) { --- End diff -- indentation > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16054182#comment-16054182 ] ASF GitHub Bot commented on DRILL-5517: --- Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/840#discussion_r122541355 --- Diff: exec/memory/base/src/main/java/io/netty/buffer/UnsafeDirectLittleEndian.java --- @@ -174,6 +175,40 @@ public ByteBuf setDouble(int index, double value) { return this; } + // Clone of the super class checkIndex, but this version returns a boolean rather + // than throwing an exception. + + protected boolean hasCapacity(int index, int fieldLength) { +if (fieldLength < 0) { +throw new IllegalArgumentException("length: " + fieldLength + " (expected: >= 0)"); +} +return (! (index < 0 || index > capacity() - fieldLength)); + } + + // Clone of the super class setBytes(), but with bounds checking done as a boolean, + // not assertion. + + public boolean setBytesBounded(int index, byte[] src, int srcIndex, int length) { +if (! hasCapacity(index, length)) { + return false; +} +if (length != 0) { --- End diff -- Remove length check > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013258#comment-16013258 ] ASF GitHub Bot commented on DRILL-5517: --- GitHub user paul-rogers opened a pull request: https://github.com/apache/drill/pull/840 DRILL-5517: Size-aware set methods in value vectors Please see DRILL-5517 for an explanation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-rogers/drill DRILL-5517 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/840.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #840 commit 6d9d8f5e93c7ad0fd45242a3aba43334c9385239 Author: Paul Rogers Date: 2017-05-16T20:20:32Z DRILL-5517: Size-aware set methods in value vectors Please see DRILL-5517 for an explanation. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013037#comment-16013037 ] Paul Rogers commented on DRILL-5517: Code reviewers can't help but notice the level of redundant code that has crept into value vectors. We now have three distinct ways to set values: * {{set()}} -- set values, throws an exception if vector is full, does not expand vector * {{setSafe()}} -- always expands vectors to fit values, with no size limits * {{setScalar()}}, {{setArrayItem()}} -- sets values, expands vectors up to a limit, throws an exception after that Then, many vectors provide multiple ways to set the data: from a {{DrillBuf}}, from a byte array, from a scalar, from an Object. The result is the proliferation of very similar code. Over the long term, we may want to control the redundancy. For now, existing code uses all the various methods. To avoid breaking existing code, the new methods are added in parallel. As code is converted to be size-aware, we have an opportunity to retire the other two variations. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Throw a new, specific exception ({{VectorOverflowException}}) if setting > the value (and growing the vector) would exceed the vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-5517) Provide size-aware set operations in value vectors
[ https://issues.apache.org/jira/browse/DRILL-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013023#comment-16013023 ] Paul Rogers commented on DRILL-5517: Highlights of changes: * Provide a {{setBytesBounded()}} method in DrillBuf which sets bytes if they fit in the current buffer, else returns false. (Current versions throw exceptions.) * Equivalent methods in the UDLE class. * Constant that defines the maximum vector buffer size. Also includes an experimental option to adjust this size as a system option at startup. * For all vectors with data, define new {{setScalar}} and {{setArrayItem}} methods that implement the semantics mentioned above. {{setScalar()}} is for individual values, {{setArrayItem()}} is for items that are members of an array. The semantics of the two cases are slightly different. * For all vectors with data, provide a {{copyEntry()}} method to be used when handling "overflow" rows in record batch writers. * For all vectors with data, provide an {{exchange()}} method that swaps the buffers between two vectors. Unlike the {{TransferPair}} mechanism, this exchange is for vectors within a single operator, using the same allocator, so no memory accounting is needed. Used when swapping between the "full" and "overflow" batches in the record batch writer. * In each fixed-width vector, define a constant for the value width rather than using a (generated) "magic number". * In each fixed-width vector, create a constant for the maximum number of values that fit in either 64K or the maximum vector length. This work required reviewing much existing code. A number of cosmetic cleanups were done as problems were noticed. A unit test verifies the semantics of the new methods for typical generated required, optional and repeated vectors for both the fixed-width and variable-width cases. > Provide size-aware set operations in value vectors > -- > > Key: DRILL-5517 > URL: https://issues.apache.org/jira/browse/DRILL-5517 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: 1.11.0 > > > DRILL-5211 describes a memory fragmentation issue in Drill. The resolution is > to limit vector sizes to 16 MB (the size of Netty memory allocation "slabs.") > Effort starts by providing "size-aware" set operations in value vectors which: > * Operate as {{setSafe()}} while vectors are below 16 MB. > * Return false if setting the value (and growing the vector) would exceed the > vector limit. > The methods in value vectors then become the foundation on which we can > construct size-aware record batch "writers." -- This message was sent by Atlassian JIRA (v6.3.15#6346)