[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007504#comment-15007504 ] ASF GitHub Bot commented on DRILL-4081: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/257 > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006233#comment-15006233 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/257#issuecomment-156919949 I think it is fine if you handle the unsupported map behaviors with a clear user exception. Other than the refactoring of getFieldId (if possible), LGTM. +1. (Assuming that this functionality only changes system behavior in the case that Union vector is enabled.) > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006151#comment-15006151 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/257#discussion_r44883694 --- Diff: exec/vector/src/main/java/org/apache/drill/exec/vector/complex/FieldIdUtil.java --- @@ -121,4 +125,63 @@ public static TypedFieldId getFieldIdIfMatches(ValueVector vector, TypedFieldId. } } } + + public static TypedFieldId getFieldId(ValueVector vector, int id, SchemaPath expectedPath, boolean hyper) { --- End diff -- Is it possible to share this code (or much of it with getFieldIdIfMatches() above). It seems like they are related (although I didn't go line by line) > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006150#comment-15006150 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/257#discussion_r44883581 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/SchemaUtil.java --- @@ -0,0 +1,159 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; +import com.google.common.collect.Sets; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.DataMode; +import org.apache.drill.common.types.TypeProtos.MajorType; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.expr.TypeHelper; +import org.apache.drill.exec.memory.BufferAllocator; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.sort.RecordBatchData; +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.complex.UnionVector; + +import java.util.List; +import java.util.Map; +import java.util.Set; + +/** + * Utility class for dealing with changing schemas + */ +public class SchemaUtil { + + /** + * Returns the merger of schemas. The merged schema will include the union all columns. If there is a type conflict + * between columns with the same schemapath but different types, the merged schema will contain a Union type. + * @param schemas + * @return + */ + public static BatchSchema mergeSchemas(BatchSchema... schemas) { +Map> typeSetMap = Maps.newLinkedHashMap(); + +for (BatchSchema s : schemas) { + for (MaterializedField field : s) { +SchemaPath path = field.getPath(); +Set currentTypes = typeSetMap.get(path); +if (currentTypes == null) { + currentTypes = Sets.newHashSet(); + typeSetMap.put(path, currentTypes); +} +MinorType newType = field.getType().getMinorType(); +if (newType == MinorType.UNION) { + for (MinorType subType : field.getType().getSubTypeList()) { +currentTypes.add(subType); + } +} else { + currentTypes.add(newType); +} + } +} + +List fields = Lists.newArrayList(); + +for (SchemaPath path : typeSetMap.keySet()) { + Set types = typeSetMap.get(path); + if (types.size() > 1) { +MajorType.Builder builder = MajorType.newBuilder().setMinorType(MinorType.UNION).setMode(DataMode.OPTIONAL); +for (MinorType t : types) { + builder.addSubType(t); +} +MaterializedField field = MaterializedField.create(path, builder.build()); +fields.add(field); + } else { +MaterializedField field = MaterializedField.create(path, Types.optional(types.iterator().next())); +fields.add(field); + } +} + +SchemaBuilder schemaBuilder = new SchemaBuilder(); +BatchSchema s = schemaBuilder.addFields(fields).setSelectionVectorMode(schemas[0].getSelectionVectorMode()).build(); +return s; + } + + /** + * Creates a copy a record batch, converting any fields as necessary to coerce it into the provided schema + * @param in + * @param toSchema + * @param context + * @return + */ + public static VectorContainer coerceContainer(VectorAccessible in, BatchSchema toSchema, OperatorContext context) { +int r
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006149#comment-15006149 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/257#discussion_r44883570 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/SchemaUtil.java --- @@ -0,0 +1,159 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; +import com.google.common.collect.Sets; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.DataMode; +import org.apache.drill.common.types.TypeProtos.MajorType; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.expr.TypeHelper; +import org.apache.drill.exec.memory.BufferAllocator; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.sort.RecordBatchData; +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.complex.UnionVector; + +import java.util.List; +import java.util.Map; +import java.util.Set; + +/** + * Utility class for dealing with changing schemas + */ +public class SchemaUtil { + + /** + * Returns the merger of schemas. The merged schema will include the union all columns. If there is a type conflict + * between columns with the same schemapath but different types, the merged schema will contain a Union type. + * @param schemas + * @return + */ + public static BatchSchema mergeSchemas(BatchSchema... schemas) { +Map> typeSetMap = Maps.newLinkedHashMap(); + +for (BatchSchema s : schemas) { + for (MaterializedField field : s) { +SchemaPath path = field.getPath(); +Set currentTypes = typeSetMap.get(path); +if (currentTypes == null) { + currentTypes = Sets.newHashSet(); + typeSetMap.put(path, currentTypes); +} +MinorType newType = field.getType().getMinorType(); +if (newType == MinorType.UNION) { + for (MinorType subType : field.getType().getSubTypeList()) { +currentTypes.add(subType); + } +} else { + currentTypes.add(newType); +} + } +} + +List fields = Lists.newArrayList(); + +for (SchemaPath path : typeSetMap.keySet()) { + Set types = typeSetMap.get(path); + if (types.size() > 1) { +MajorType.Builder builder = MajorType.newBuilder().setMinorType(MinorType.UNION).setMode(DataMode.OPTIONAL); +for (MinorType t : types) { + builder.addSubType(t); +} +MaterializedField field = MaterializedField.create(path, builder.build()); +fields.add(field); + } else { +MaterializedField field = MaterializedField.create(path, Types.optional(types.iterator().next())); +fields.add(field); + } +} + +SchemaBuilder schemaBuilder = new SchemaBuilder(); +BatchSchema s = schemaBuilder.addFields(fields).setSelectionVectorMode(schemas[0].getSelectionVectorMode()).build(); +return s; + } + + /** + * Creates a copy a record batch, converting any fields as necessary to coerce it into the provided schema + * @param in + * @param toSchema + * @param context + * @return + */ + public static VectorContainer coerceContainer(VectorAccessible in, BatchSchema toSchema, OperatorContext context) { +int r
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006145#comment-15006145 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/257#discussion_r44883499 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/SchemaUtil.java --- @@ -0,0 +1,159 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Lists; +import com.google.common.collect.Maps; +import com.google.common.collect.Sets; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.DataMode; +import org.apache.drill.common.types.TypeProtos.MajorType; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.expr.TypeHelper; +import org.apache.drill.exec.memory.BufferAllocator; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.sort.RecordBatchData; +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.complex.UnionVector; + +import java.util.List; +import java.util.Map; +import java.util.Set; + +/** + * Utility class for dealing with changing schemas + */ +public class SchemaUtil { + + /** + * Returns the merger of schemas. The merged schema will include the union all columns. If there is a type conflict + * between columns with the same schemapath but different types, the merged schema will contain a Union type. + * @param schemas + * @return + */ + public static BatchSchema mergeSchemas(BatchSchema... schemas) { +Map> typeSetMap = Maps.newLinkedHashMap(); + +for (BatchSchema s : schemas) { + for (MaterializedField field : s) { +SchemaPath path = field.getPath(); +Set currentTypes = typeSetMap.get(path); +if (currentTypes == null) { + currentTypes = Sets.newHashSet(); + typeSetMap.put(path, currentTypes); +} +MinorType newType = field.getType().getMinorType(); +if (newType == MinorType.UNION) { + for (MinorType subType : field.getType().getSubTypeList()) { +currentTypes.add(subType); + } +} else { + currentTypes.add(newType); +} + } +} + +List fields = Lists.newArrayList(); + +for (SchemaPath path : typeSetMap.keySet()) { + Set types = typeSetMap.get(path); + if (types.size() > 1) { --- End diff -- Isn't it possible two different map types would be added above? If so, it seems like we would produce a union when we shouldn't. > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006138#comment-15006138 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/257#discussion_r44883068 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java --- @@ -113,10 +115,45 @@ public boolean equals(Object obj) { } else if (!fields.equals(other.fields)) { return false; } +for (int i = 0; i < fields.size(); i++) { + MajorType t1 = fields.get(i).getType(); + MajorType t2 = other.fields.get(i).getType(); + if (t1 == null) { +if (t2 != null) { + return false; +} + } else { +if (!majorTypeEqual(t1, t2)) { + return false; +} + } +} if (selectionVectorMode != other.selectionVectorMode) { return false; } return true; } + /** + * We treat fields with same set of Subtypes as equal, even if they are in a different order + * @param t1 + * @param t2 + * @return + */ + private boolean majorTypeEqual(MajorType t1, MajorType t2) { +if (t1.equals(t2)) { + return true; +} +if (!t1.getMinorType().equals(t2.getMinorType())) { + return false; +} +if (!t1.getMode().equals(t2.getMode())) { + return false; +} +if (!Sets.newHashSet(t1.getSubTypeList()).equals(Sets.newHashSet(t2.getSubTypeList( { --- End diff -- Should we recursively check that the subtype of subtypes are also set equal as opposed to your current direct equality (below the second level) > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006137#comment-15006137 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/257#discussion_r44883050 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java --- @@ -113,10 +115,45 @@ public boolean equals(Object obj) { } else if (!fields.equals(other.fields)) { return false; } +for (int i = 0; i < fields.size(); i++) { + MajorType t1 = fields.get(i).getType(); + MajorType t2 = other.fields.get(i).getType(); + if (t1 == null) { +if (t2 != null) { + return false; +} + } else { +if (!majorTypeEqual(t1, t2)) { + return false; +} + } +} if (selectionVectorMode != other.selectionVectorMode) { return false; } return true; } + /** + * We treat fields with same set of Subtypes as equal, even if they are in a different order + * @param t1 + * @param t2 + * @return + */ + private boolean majorTypeEqual(MajorType t1, MajorType t2) { --- End diff -- A better function name would be good here. maybe majorTypeSubfieldSetsAreEqual? > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006135#comment-15006135 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/257#discussion_r44882984 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/BaseVectorWrapper.java --- @@ -0,0 +1,92 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.record; + +import org.apache.drill.common.expression.PathSegment; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.DataMode; +import org.apache.drill.common.types.TypeProtos.MajorType; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.complex.AbstractContainerVector; +import org.apache.drill.exec.vector.complex.ListVector; +import org.apache.drill.exec.vector.complex.UnionVector; + +import java.util.List; + +public abstract class BaseVectorWrapper implements VectorWrapper { + + protected TypedFieldId getFieldIdIfMatches(ValueVector vector, int id, SchemaPath expectedPath) { +if (!expectedPath.getRootSegment().segmentEquals(vector.getField().getPath().getRootSegment())) { + return null; +} +PathSegment seg = expectedPath.getRootSegment(); + +if (vector instanceof UnionVector) { --- End diff -- Need docs on all of this. > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006134#comment-15006134 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/257#discussion_r44882959 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/ExpressionTreeMaterializer.java --- @@ -830,4 +785,50 @@ private boolean castEqual(ExpressionPosition pos, MajorType from, MajorType to) } } } + --- End diff -- Is this big block a reorganization or actual changes? > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006133#comment-15006133 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/257#discussion_r44882947 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/UnionFunctions.java --- @@ -32,16 +33,72 @@ import org.apache.drill.exec.expr.holders.UnionHolder; import org.apache.drill.exec.expr.holders.IntHolder; import org.apache.drill.exec.expr.holders.VarCharHolder; +import org.apache.drill.exec.resolver.TypeCastRules; import org.apache.drill.exec.vector.complex.impl.UnionReader; import org.apache.drill.exec.vector.complex.reader.FieldReader; import javax.inject.Inject; +import java.util.Set; /** * The class contains additional functions for union types in addition to those in GUnionFunctions */ public class UnionFunctions { + /** + * Returns zero if the inputs have equivalent types. Two numeric types are considered equivalent, as are a combination + * of date/timestamp. If not equivalent, returns a value determined by the numeric value of the MinorType enum + */ + @FunctionTemplate(names = {"compareType"}, + scope = FunctionTemplate.FunctionScope.SIMPLE, + nulls = NullHandling.INTERNAL) + public static class CompareType implements DrillSimpleFunc { + +@Param +FieldReader input1; +@Param +FieldReader input2; +@Output +IntHolder out; + +public void setup() {} + +public void eval() { + org.apache.drill.common.types.TypeProtos.MinorType type1; + if (input1.isSet()) { +type1 = input1.getType().getMinorType(); + } else { +type1 = org.apache.drill.common.types.TypeProtos.MinorType.NULL; + } + org.apache.drill.common.types.TypeProtos.MinorType type2; + if (input2.isSet()) { +type2 = input2.getType().getMinorType(); + } else { +type2 = org.apache.drill.common.types.TypeProtos.MinorType.NULL; + } + + out.value = org.apache.drill.exec.expr.fn.impl.UnionFunctions.compareTypes(type1, type2); +} + } + + public static int compareTypes(MinorType type1, MinorType type2) { +int typeValue1 = getTypeValue(type1); +int typeValue2 = getTypeValue(type2); +return typeValue1 - typeValue2; + } + + private static int getTypeValue(MinorType type) { +if (TypeCastRules.isNumericType(type)) { --- End diff -- would be good to add comments here. > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006129#comment-15006129 ] ASF GitHub Bot commented on DRILL-4081: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/257#discussion_r44882906 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/UnionFunctions.java --- @@ -32,16 +33,72 @@ import org.apache.drill.exec.expr.holders.UnionHolder; import org.apache.drill.exec.expr.holders.IntHolder; import org.apache.drill.exec.expr.holders.VarCharHolder; +import org.apache.drill.exec.resolver.TypeCastRules; import org.apache.drill.exec.vector.complex.impl.UnionReader; import org.apache.drill.exec.vector.complex.reader.FieldReader; import javax.inject.Inject; +import java.util.Set; /** * The class contains additional functions for union types in addition to those in GUnionFunctions */ public class UnionFunctions { + /** + * Returns zero if the inputs have equivalent types. Two numeric types are considered equivalent, as are a combination + * of date/timestamp. If not equivalent, returns a value determined by the numeric value of the MinorType enum --- End diff -- let's add an additional ordering ordinal to the minortype enum so that we match Mongo's ordering among types > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort
[ https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006124#comment-15006124 ] ASF GitHub Bot commented on DRILL-4081: --- GitHub user StevenMPhillips opened a pull request: https://github.com/apache/drill/pull/257 DRILL-4081: Handle schema changes in ExternalSort You can merge this pull request into a Git repository by running: $ git pull https://github.com/StevenMPhillips/drill drill-4081 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/257.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #257 commit db573ffe2c15caad4a9d12dab9d28cc3e0b2f51d Author: Steven Phillips Date: 2015-11-13T19:27:16Z DRILL-4081: Handle schema changes in ExternalSort > Handle schema changes in ExternalSort > - > > Key: DRILL-4081 > URL: https://issues.apache.org/jira/browse/DRILL-4081 > Project: Apache Drill > Issue Type: Improvement >Reporter: Steven Phillips >Assignee: Steven Phillips > > This improvement will make use of the Union vector to handle schema changes. > When a new schema appears, the schema will be "merged" with the previous > schema. The result will be a new schema that uses Union type to store the > columns where this is a type conflict. All of the batches (including the > batches that have already arrived) will be coerced into this new schema. > A new comparison function will be included to handle the comparison of Union > type. Comparison of union type will work as follows: > 1. All numeric types can be mutually compared, and will be compared using > Drill implicit cast rules. > 2. All other types will not be compared against other types, but only among > values of the same type. > 3. There will be an overall precedence of types with regards to ordering. > This precedence is not yet defined, but will be as part of the work on this > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)