[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007504#comment-15007504
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/257


> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006233#comment-15006233
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/257#issuecomment-156919949
  
I think it is fine if you handle the unsupported map behaviors with a clear 
user exception. Other than the refactoring of getFieldId (if possible), LGTM. 
+1.  (Assuming that this functionality only changes system behavior in the case 
that Union vector is enabled.)


> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006151#comment-15006151
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/257#discussion_r44883694
  
--- Diff: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/FieldIdUtil.java 
---
@@ -121,4 +125,63 @@ public static TypedFieldId 
getFieldIdIfMatches(ValueVector vector, TypedFieldId.
   }
 }
   }
+
+  public static TypedFieldId getFieldId(ValueVector vector, int id, 
SchemaPath expectedPath, boolean hyper) {
--- End diff --

Is it possible to share this code (or much of it with getFieldIdIfMatches() 
above). It seems like they are related (although I didn't go line by line)


> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006150#comment-15006150
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/257#discussion_r44883581
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/SchemaUtil.java ---
@@ -0,0 +1,159 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.UnionVector;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Utility class for dealing with changing schemas
+ */
+public class SchemaUtil {
+
+  /**
+   * Returns the merger of schemas. The merged schema will include the 
union all columns. If there is a type conflict
+   * between columns with the same schemapath but different types, the 
merged schema will contain a Union type.
+   * @param schemas
+   * @return
+   */
+  public static BatchSchema mergeSchemas(BatchSchema... schemas) {
+Map> typeSetMap = Maps.newLinkedHashMap();
+
+for (BatchSchema s : schemas) {
+  for (MaterializedField field : s) {
+SchemaPath path = field.getPath();
+Set currentTypes = typeSetMap.get(path);
+if (currentTypes == null) {
+  currentTypes = Sets.newHashSet();
+  typeSetMap.put(path, currentTypes);
+}
+MinorType newType = field.getType().getMinorType();
+if (newType == MinorType.UNION) {
+  for (MinorType subType : field.getType().getSubTypeList()) {
+currentTypes.add(subType);
+  }
+} else {
+  currentTypes.add(newType);
+}
+  }
+}
+
+List fields = Lists.newArrayList();
+
+for (SchemaPath path : typeSetMap.keySet()) {
+  Set types = typeSetMap.get(path);
+  if (types.size() > 1) {
+MajorType.Builder builder = 
MajorType.newBuilder().setMinorType(MinorType.UNION).setMode(DataMode.OPTIONAL);
+for (MinorType t : types) {
+  builder.addSubType(t);
+}
+MaterializedField field = MaterializedField.create(path, 
builder.build());
+fields.add(field);
+  } else {
+MaterializedField field = MaterializedField.create(path, 
Types.optional(types.iterator().next()));
+fields.add(field);
+  }
+}
+
+SchemaBuilder schemaBuilder = new SchemaBuilder();
+BatchSchema s = 
schemaBuilder.addFields(fields).setSelectionVectorMode(schemas[0].getSelectionVectorMode()).build();
+return s;
+  }
+
+  /**
+   * Creates a copy a record batch, converting any fields as necessary to 
coerce it into the provided schema
+   * @param in
+   * @param toSchema
+   * @param context
+   * @return
+   */
+  public static VectorContainer coerceContainer(VectorAccessible in, 
BatchSchema toSchema, OperatorContext context) {
+int r

[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006149#comment-15006149
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/257#discussion_r44883570
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/SchemaUtil.java ---
@@ -0,0 +1,159 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.UnionVector;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Utility class for dealing with changing schemas
+ */
+public class SchemaUtil {
+
+  /**
+   * Returns the merger of schemas. The merged schema will include the 
union all columns. If there is a type conflict
+   * between columns with the same schemapath but different types, the 
merged schema will contain a Union type.
+   * @param schemas
+   * @return
+   */
+  public static BatchSchema mergeSchemas(BatchSchema... schemas) {
+Map> typeSetMap = Maps.newLinkedHashMap();
+
+for (BatchSchema s : schemas) {
+  for (MaterializedField field : s) {
+SchemaPath path = field.getPath();
+Set currentTypes = typeSetMap.get(path);
+if (currentTypes == null) {
+  currentTypes = Sets.newHashSet();
+  typeSetMap.put(path, currentTypes);
+}
+MinorType newType = field.getType().getMinorType();
+if (newType == MinorType.UNION) {
+  for (MinorType subType : field.getType().getSubTypeList()) {
+currentTypes.add(subType);
+  }
+} else {
+  currentTypes.add(newType);
+}
+  }
+}
+
+List fields = Lists.newArrayList();
+
+for (SchemaPath path : typeSetMap.keySet()) {
+  Set types = typeSetMap.get(path);
+  if (types.size() > 1) {
+MajorType.Builder builder = 
MajorType.newBuilder().setMinorType(MinorType.UNION).setMode(DataMode.OPTIONAL);
+for (MinorType t : types) {
+  builder.addSubType(t);
+}
+MaterializedField field = MaterializedField.create(path, 
builder.build());
+fields.add(field);
+  } else {
+MaterializedField field = MaterializedField.create(path, 
Types.optional(types.iterator().next()));
+fields.add(field);
+  }
+}
+
+SchemaBuilder schemaBuilder = new SchemaBuilder();
+BatchSchema s = 
schemaBuilder.addFields(fields).setSelectionVectorMode(schemas[0].getSelectionVectorMode()).build();
+return s;
+  }
+
+  /**
+   * Creates a copy a record batch, converting any fields as necessary to 
coerce it into the provided schema
+   * @param in
+   * @param toSchema
+   * @param context
+   * @return
+   */
+  public static VectorContainer coerceContainer(VectorAccessible in, 
BatchSchema toSchema, OperatorContext context) {
+int r

[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006145#comment-15006145
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/257#discussion_r44883499
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/SchemaUtil.java ---
@@ -0,0 +1,159 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import com.google.common.collect.Maps;
+import com.google.common.collect.Sets;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.expr.TypeHelper;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.UnionVector;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * Utility class for dealing with changing schemas
+ */
+public class SchemaUtil {
+
+  /**
+   * Returns the merger of schemas. The merged schema will include the 
union all columns. If there is a type conflict
+   * between columns with the same schemapath but different types, the 
merged schema will contain a Union type.
+   * @param schemas
+   * @return
+   */
+  public static BatchSchema mergeSchemas(BatchSchema... schemas) {
+Map> typeSetMap = Maps.newLinkedHashMap();
+
+for (BatchSchema s : schemas) {
+  for (MaterializedField field : s) {
+SchemaPath path = field.getPath();
+Set currentTypes = typeSetMap.get(path);
+if (currentTypes == null) {
+  currentTypes = Sets.newHashSet();
+  typeSetMap.put(path, currentTypes);
+}
+MinorType newType = field.getType().getMinorType();
+if (newType == MinorType.UNION) {
+  for (MinorType subType : field.getType().getSubTypeList()) {
+currentTypes.add(subType);
+  }
+} else {
+  currentTypes.add(newType);
+}
+  }
+}
+
+List fields = Lists.newArrayList();
+
+for (SchemaPath path : typeSetMap.keySet()) {
+  Set types = typeSetMap.get(path);
+  if (types.size() > 1) {
--- End diff --

Isn't it possible two different map types would be added above? If so, it 
seems like we would produce a union when we shouldn't.


> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast 

[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006138#comment-15006138
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/257#discussion_r44883068
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java ---
@@ -113,10 +115,45 @@ public boolean equals(Object obj) {
 } else if (!fields.equals(other.fields)) {
   return false;
 }
+for (int i = 0; i < fields.size(); i++) {
+  MajorType t1 = fields.get(i).getType();
+  MajorType t2 = other.fields.get(i).getType();
+  if (t1 == null) {
+if (t2 != null) {
+  return false;
+}
+  } else {
+if (!majorTypeEqual(t1, t2)) {
+  return false;
+}
+  }
+}
 if (selectionVectorMode != other.selectionVectorMode) {
   return false;
 }
 return true;
   }
 
+  /**
+   * We treat fields with same set of Subtypes as equal, even if they are 
in a different order
+   * @param t1
+   * @param t2
+   * @return
+   */
+  private boolean majorTypeEqual(MajorType t1, MajorType t2) {
+if (t1.equals(t2)) {
+  return true;
+}
+if (!t1.getMinorType().equals(t2.getMinorType())) {
+  return false;
+}
+if (!t1.getMode().equals(t2.getMode())) {
+  return false;
+}
+if 
(!Sets.newHashSet(t1.getSubTypeList()).equals(Sets.newHashSet(t2.getSubTypeList(
 {
--- End diff --

Should we recursively check that the subtype of subtypes are also set equal 
as opposed to your current direct equality (below the second level)


> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006137#comment-15006137
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/257#discussion_r44883050
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java ---
@@ -113,10 +115,45 @@ public boolean equals(Object obj) {
 } else if (!fields.equals(other.fields)) {
   return false;
 }
+for (int i = 0; i < fields.size(); i++) {
+  MajorType t1 = fields.get(i).getType();
+  MajorType t2 = other.fields.get(i).getType();
+  if (t1 == null) {
+if (t2 != null) {
+  return false;
+}
+  } else {
+if (!majorTypeEqual(t1, t2)) {
+  return false;
+}
+  }
+}
 if (selectionVectorMode != other.selectionVectorMode) {
   return false;
 }
 return true;
   }
 
+  /**
+   * We treat fields with same set of Subtypes as equal, even if they are 
in a different order
+   * @param t1
+   * @param t2
+   * @return
+   */
+  private boolean majorTypeEqual(MajorType t1, MajorType t2) {
--- End diff --

A better function name would be good here. maybe 
majorTypeSubfieldSetsAreEqual?


> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006135#comment-15006135
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/257#discussion_r44882984
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BaseVectorWrapper.java
 ---
@@ -0,0 +1,92 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record;
+
+import org.apache.drill.common.expression.PathSegment;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.AbstractContainerVector;
+import org.apache.drill.exec.vector.complex.ListVector;
+import org.apache.drill.exec.vector.complex.UnionVector;
+
+import java.util.List;
+
+public abstract class BaseVectorWrapper implements VectorWrapper {
+
+  protected TypedFieldId getFieldIdIfMatches(ValueVector vector, int id, 
SchemaPath expectedPath) {
+if 
(!expectedPath.getRootSegment().segmentEquals(vector.getField().getPath().getRootSegment()))
 {
+  return null;
+}
+PathSegment seg = expectedPath.getRootSegment();
+
+if (vector instanceof UnionVector) {
--- End diff --

Need docs on all of this.


> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006134#comment-15006134
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/257#discussion_r44882959
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/ExpressionTreeMaterializer.java
 ---
@@ -830,4 +785,50 @@ private boolean castEqual(ExpressionPosition pos, 
MajorType from, MajorType to)
   }
 }
   }
+
--- End diff --

Is this big block a reorganization or actual changes?


> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006133#comment-15006133
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/257#discussion_r44882947
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/UnionFunctions.java
 ---
@@ -32,16 +33,72 @@
 import org.apache.drill.exec.expr.holders.UnionHolder;
 import org.apache.drill.exec.expr.holders.IntHolder;
 import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.resolver.TypeCastRules;
 import org.apache.drill.exec.vector.complex.impl.UnionReader;
 import org.apache.drill.exec.vector.complex.reader.FieldReader;
 
 import javax.inject.Inject;
+import java.util.Set;
 
 /**
  * The class contains additional functions for union types in addition to 
those in GUnionFunctions
  */
 public class UnionFunctions {
 
+  /**
+   * Returns zero if the inputs have equivalent types. Two numeric types 
are considered equivalent, as are a combination
+   * of date/timestamp. If not equivalent, returns a value determined by 
the numeric value of the MinorType enum
+   */
+  @FunctionTemplate(names = {"compareType"},
+  scope = FunctionTemplate.FunctionScope.SIMPLE,
+  nulls = NullHandling.INTERNAL)
+  public static class CompareType implements DrillSimpleFunc {
+
+@Param
+FieldReader input1;
+@Param
+FieldReader input2;
+@Output
+IntHolder out;
+
+public void setup() {}
+
+public void eval() {
+  org.apache.drill.common.types.TypeProtos.MinorType type1;
+  if (input1.isSet()) {
+type1 = input1.getType().getMinorType();
+  } else {
+type1 = org.apache.drill.common.types.TypeProtos.MinorType.NULL;
+  }
+  org.apache.drill.common.types.TypeProtos.MinorType type2;
+  if (input2.isSet()) {
+type2 = input2.getType().getMinorType();
+  } else {
+type2 = org.apache.drill.common.types.TypeProtos.MinorType.NULL;
+  }
+
+  out.value = 
org.apache.drill.exec.expr.fn.impl.UnionFunctions.compareTypes(type1, type2);
+}
+  }
+
+  public static int compareTypes(MinorType type1, MinorType type2) {
+int typeValue1 = getTypeValue(type1);
+int typeValue2 = getTypeValue(type2);
+return typeValue1 - typeValue2;
+  }
+
+  private static int getTypeValue(MinorType type) {
+if (TypeCastRules.isNumericType(type)) {
--- End diff --

would be good to add comments here.


> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006129#comment-15006129
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/257#discussion_r44882906
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/UnionFunctions.java
 ---
@@ -32,16 +33,72 @@
 import org.apache.drill.exec.expr.holders.UnionHolder;
 import org.apache.drill.exec.expr.holders.IntHolder;
 import org.apache.drill.exec.expr.holders.VarCharHolder;
+import org.apache.drill.exec.resolver.TypeCastRules;
 import org.apache.drill.exec.vector.complex.impl.UnionReader;
 import org.apache.drill.exec.vector.complex.reader.FieldReader;
 
 import javax.inject.Inject;
+import java.util.Set;
 
 /**
  * The class contains additional functions for union types in addition to 
those in GUnionFunctions
  */
 public class UnionFunctions {
 
+  /**
+   * Returns zero if the inputs have equivalent types. Two numeric types 
are considered equivalent, as are a combination
+   * of date/timestamp. If not equivalent, returns a value determined by 
the numeric value of the MinorType enum
--- End diff --

let's add an additional ordering ordinal to the minortype enum so that we 
match Mongo's ordering among types


> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4081) Handle schema changes in ExternalSort

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15006124#comment-15006124
 ] 

ASF GitHub Bot commented on DRILL-4081:
---

GitHub user StevenMPhillips opened a pull request:

https://github.com/apache/drill/pull/257

DRILL-4081: Handle schema changes in ExternalSort



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StevenMPhillips/drill drill-4081

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #257


commit db573ffe2c15caad4a9d12dab9d28cc3e0b2f51d
Author: Steven Phillips 
Date:   2015-11-13T19:27:16Z

DRILL-4081: Handle schema changes in ExternalSort




> Handle schema changes in ExternalSort
> -
>
> Key: DRILL-4081
> URL: https://issues.apache.org/jira/browse/DRILL-4081
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>
> This improvement will make use of the Union vector to handle schema changes. 
> When a new schema appears, the schema will be "merged" with the previous 
> schema. The result will be a new schema that uses Union type to store the 
> columns where this is a type conflict. All of the batches (including the 
> batches that have already arrived) will be coerced into this new schema.
> A new comparison function will be included to handle the comparison of Union 
> type. Comparison of union type will work as follows:
> 1. All numeric types can be mutually compared, and will be compared using 
> Drill implicit cast rules.
> 2. All other types will not be compared against other types, but only among 
> values of the same type.
> 3. There will be an overall precedence of types with regards to ordering. 
> This precedence is not yet defined, but will be as part of the work on this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)