[jira] [Commented] (DRILL-7360) Refactor WatchService code in Drillbit class

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914525#comment-16914525
 ] 

ASF GitHub Bot commented on DRILL-7360:
---

sohami commented on pull request #1848: DRILL-7360: Refactor WatchService in 
Drillbit class and fix concurrency issues
URL: https://github.com/apache/drill/pull/1848#discussion_r317244970
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/Drillbit.java
 ##
 @@ -371,65 +376,91 @@ private void javaPropertiesToSystemOptions() {
 }
   }
 
-
-  // Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+  /**
+   * Polls for graceful file to check if graceful shutdown is triggered from 
the script.
+   */
   private static class GracefulShutdownThread extends Thread {
 
+private static final String DRILL_HOME = "DRILL_HOME";
+private static final String GRACEFUL_SIGFILE = "GRACEFUL_SIGFILE";
+private static final String NOT_SUPPORTED_MESSAGE = "Graceful shutdown 
from command line will not be supported.";
+
 private final Drillbit drillbit;
 private final StackTrace stackTrace;
-GracefulShutdownThread(final Drillbit drillbit, final StackTrace 
stackTrace) {
+
+GracefulShutdownThread(Drillbit drillbit, StackTrace stackTrace) {
   this.drillbit = drillbit;
   this.stackTrace = stackTrace;
+
+  setName("Drillbit-Graceful-Shutdown#" + getName());
 }
 
 @Override
 public void run () {
   try {
 pollShutdown(drillbit);
-  } catch (InterruptedException  e) {
-logger.debug("Interrupted GracefulShutdownThread");
+  } catch (InterruptedException e) {
+drillbit.interruptPollShutdown = false;
+logger.debug("Graceful Shutdown thread was interrupted", e);
   } catch (IOException e) {
-throw new RuntimeException("Caught exception while polling for 
gracefulshutdown\n" + stackTrace, e);
+throw new RuntimeException("Exception while polling for graceful 
shutdown\n" + stackTrace, e);
   }
 }
 
-/*
- * Poll for the graceful file, if the file is found cloase the drillbit. 
In case if the DRILL_HOME path is not
- * set, graceful shutdown will not be supported from the command line.
+/**
+ * Poll for the graceful file, if the file is found or modified, close the 
Drillbit.
+ * In case if the {@link #DRILL_HOME} or {@link #GRACEFUL_SIGFILE} 
environment variables are not set,
+ * graceful shutdown will not be supported from the command line.
+ *
+ * @param drillbit current Drillbit
  */
 private void pollShutdown(Drillbit drillbit) throws IOException, 
InterruptedException {
-  final String drillHome = System.getenv("DRILL_HOME");
-  final String gracefulFile = System.getenv("GRACEFUL_SIGFILE");
-  final Path drillHomePath;
+  String drillHome = System.getenv(DRILL_HOME);
+  String gracefulFile = System.getenv(GRACEFUL_SIGFILE);
+  Path drillHomePath;
   if (drillHome == null || gracefulFile == null) {
-logger.warn("Cannot access graceful file. Graceful shutdown from 
command line will not be supported.");
+if (logger.isWarnEnabled()) {
+  StringBuilder builder = new StringBuilder(NOT_SUPPORTED_MESSAGE);
+  if (drillHome == null) {
+builder.append(" ").append(DRILL_HOME).append(" is not set.");
+  }
+  if (gracefulFile == null) {
+builder.append(" ").append(GRACEFUL_SIGFILE).append(" is not 
set.");
+  }
+  logger.warn(builder.toString());
+}
 return;
   }
   try {
 drillHomePath = Paths.get(drillHome);
+if (!Files.exists(drillHomePath)) {
+  logger.warn("{} path [{}] does not exist. {}", DRILL_HOME, 
drillHomePath, NOT_SUPPORTED_MESSAGE);
+  return;
+}
   } catch (InvalidPathException e) {
-logger.warn("Cannot access graceful file. Graceful shutdown from 
command line will not be supported.");
+logger.warn("Unable to construct {}} path [{}]: {}. {}",
+  DRILL_HOME, drillHome, e.getMessage(), NOT_SUPPORTED_MESSAGE);
 return;
   }
-  boolean triggered_shutdown = false;
-  WatchKey wk = null;
-  try (final WatchService watchService = 
drillHomePath.getFileSystem().newWatchService()) {
-drillHomePath.register(watchService, 
StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_CREATE);
-while (!triggered_shutdown) {
-  wk = watchService.take();
-  for (WatchEvent event : wk.pollEvents()) {
-final Path changed = (Path) event.context();
-if (changed != null && changed.endsWith(gracefulFile)) {
-  drillbit.interruptPollShutdown = false;
-  triggered_shutdown = true;
-  drillbit.close();
-  break;
+
+  try 

[jira] [Updated] (DRILL-7360) Refactor WatchService code in Drillbit class

2019-08-23 Thread Volodymyr Vysotskyi (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7360:
---
Labels: ready-to-commit  (was: )

> Refactor WatchService code in Drillbit class
> 
>
> Key: DRILL-7360
> URL: https://issues.apache.org/jira/browse/DRILL-7360
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> Refactor WatchService to user proper code (see 
> https://docs.oracle.com/javase/tutorial/essential/io/notification.html for 
> details) in Drillbit class and fix concurrency issues connected with 
> variables assigning from different thread.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914446#comment-16914446
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

vvysotskyi commented on pull request #1847: DRILL-7253: Read Hive struct w/o 
nulls
URL: https://github.com/apache/drill/pull/1847#discussion_r317200826
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
 ##
 @@ -215,7 +215,18 @@ public LogicalExpression visitCall(RexCall call) {
 switch(call.getKind()){
 case CAST:
   return getDrillCastFunctionFromOptiq(call);
-case LIKE:
+case ROW:
+  List fieldList = call.getType().getFieldList();
+  List oldOperands = call.getOperands();
+  List newOperands = new ArrayList<>();
+  for (int i = 0; i < oldOperands.size(); i++) {
+RexLiteral nameOperand = 
getRexBuilder().makeLiteral(fieldList.get(i).getName());
+RexNode valueOperand = call.operands.get(i);
+newOperands.add(nameOperand.accept(this));
+newOperands.add(valueOperand.accept(this));
+  }
+  return 
FunctionCallFactory.createExpression(call.op.getName().toLowerCase(), 
newOperands);
+  case LIKE:
 
 Review comment:
   ```suggestion
   case LIKE:
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914447#comment-16914447
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

vvysotskyi commented on pull request #1847: DRILL-7253: Read Hive struct w/o 
nulls
URL: https://github.com/apache/drill/pull/1847#discussion_r317209323
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SpecialFunctions.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.expr.fn.impl;
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.vector.complex.reader.FieldReader;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+
+/**
+ * Container class for functions with {@link 
org.apache.calcite.sql.SqlSyntax#SPECIAL} syntax.
+ */
+public class SpecialFunctions {
+
+  private SpecialFunctions() {
+  }
+
+  /**
+   * Implementation of ROW(col1, col2, ..., colN) constructor function. Most 
often it's
+   * used to reconstruct struct columns after Calcite's flattening.
+   */
+  @FunctionTemplate(name = "row", scope = 
FunctionTemplate.FunctionScope.SIMPLE, isVarArg = true)
 
 Review comment:
   It would be useful to point out that this function expects arguments (field 
name and field value), ...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914440#comment-16914440
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

vvysotskyi commented on pull request #1847: DRILL-7253: Read Hive struct w/o 
nulls
URL: https://github.com/apache/drill/pull/1847#discussion_r317199453
 
 

 ##
 File path: 
contrib/storage-hive/core/src/test/java/org/apache/drill/exec/hive/complex_types/TestHiveStructs.java
 ##
 @@ -0,0 +1,357 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.hive.complex_types;
+
+import java.math.BigDecimal;
+import java.nio.file.Paths;
+
+import org.apache.drill.categories.HiveStorageTest;
+import org.apache.drill.categories.SlowTest;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.hive.HiveTestFixture;
+import org.apache.drill.exec.hive.HiveTestUtilities;
+import org.apache.drill.exec.util.JsonStringHashMap;
+import org.apache.drill.exec.util.StoragePluginTestUtils;
+import org.apache.drill.exec.util.Text;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.TestBuilder;
+import org.apache.hadoop.hive.ql.Driver;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+
+import static java.util.Arrays.asList;
+import static org.apache.drill.exec.expr.fn.impl.DateUtility.parseBest;
+import static org.apache.drill.exec.expr.fn.impl.DateUtility.parseLocalDate;
+import static org.hamcrest.CoreMatchers.containsString;
+import static org.junit.Assert.assertThat;
+
+@Category({SlowTest.class, HiveStorageTest.class})
+public class TestHiveStructs extends ClusterTest {
+
+  private static final JsonStringHashMap STR_N0_ROW_1 = 
TestBuilder.mapOf(
+  "f_int", -3000, "f_string", new Text("AbbBBa"), "f_varchar", new 
Text("-c54g"), "f_char", new Text("Th"),
+  "f_tinyint", -128, "f_smallint", -32768, "f_decimal", new 
BigDecimal("375098.406"), "f_boolean", true,
+  "f_bigint", -9223372036854775808L, "f_float", -32.058f, "f_double", 
-13.241563769628,
+  "f_date", parseLocalDate("2018-10-21"),
+  "f_timestamp", parseBest("2018-10-21 04:51:36"));
+
+  private static final JsonStringHashMap STR_N0_ROW_2 = 
TestBuilder.mapOf(
+  "f_int", 33000, "f_string", new Text("ZzZzZz"), "f_varchar", new 
Text("-+-+1"), "f_char", new Text("hh"),
+  "f_tinyint", 127, "f_smallint", 32767, "f_decimal", new 
BigDecimal("500.500"), "f_boolean", true,
+  "f_bigint", 798798798798798799L, "f_float", 102.058f, "f_double", 
111.241563769628,
+  "f_date", parseLocalDate("2019-10-21"),
+  "f_timestamp", parseBest("2019-10-21 05:51:31"));
+
+  private static final JsonStringHashMap STR_N0_ROW_3 = 
TestBuilder.mapOf(
+  "f_int", 9199, "f_string", new Text("z x cz"), "f_varchar", new 
Text(")(*1`"), "f_char", new Text("za"),
+  "f_tinyint", 57, "f_smallint", 1010, "f_decimal", new 
BigDecimal("2.302"), "f_boolean", false,
+  "f_bigint", 101010L, "f_float", 12.2001f, "f_double", 1.0001,
+  "f_date", parseLocalDate("2010-01-01"),
+  "f_timestamp", parseBest("2000-02-02 01:10:09"));
+
+  private static final JsonStringHashMap STR_N2_ROW_1 = 
TestBuilder.mapOf("a",
+  TestBuilder.mapOf("b", TestBuilder.mapOf("c", 1000, "k", "Z")));
+
+  private static final JsonStringHashMap STR_N2_ROW_2 = 
TestBuilder.mapOf(
+  "a", TestBuilder.mapOf("b", TestBuilder.mapOf("c", 2000, "k", "X")));
+
+  private static final JsonStringHashMap STR_N2_ROW_3 = 
TestBuilder.mapOf(
+  "a", TestBuilder.mapOf("b", TestBuilder.mapOf("c", 3000, "k", "C")));
+
+  private static HiveTestFixture hiveTestFixture;
+
+  @BeforeClass
+  public static void setUp() throws Exception {
+startCluster(ClusterFixture.builder(dirTestWatcher)
+
.sessionOption(ExecConstants.HIVE_OPTIMIZE_PARQUET_SCAN_WITH_NATIVE_READER, 
true)
+.configProperty("drill.exec.rpc.user.timeout", 999)
 
 Review comment:
   Could you please explain the reason for this change?
 

[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914445#comment-16914445
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

vvysotskyi commented on pull request #1847: DRILL-7253: Read Hive struct w/o 
nulls
URL: https://github.com/apache/drill/pull/1847#discussion_r317206540
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SplitUpComplexExpressions.java
 ##
 @@ -44,103 +44,108 @@
 
 public class SplitUpComplexExpressions extends BasePrelVisitor {
 
-  RelDataTypeFactory factory;
-  DrillOperatorTable table;
-  FunctionImplementationRegistry funcReg;
+  private final RelDataTypeFactory factory;
+  private final RexBuilder rexBuilder;
+  private final FunctionImplementationRegistry funcReg;
 
-  public SplitUpComplexExpressions(RelDataTypeFactory factory, 
DrillOperatorTable table, FunctionImplementationRegistry funcReg) {
-super();
+  public SplitUpComplexExpressions(RelDataTypeFactory factory, 
FunctionImplementationRegistry funcReg, RexBuilder rexBuilder) {
 this.factory = factory;
-this.table = table;
 this.funcReg = funcReg;
+this.rexBuilder = rexBuilder;
   }
 
   @Override
-  public Prel visitPrel(Prel prel, Object value) throws RelConversionException 
{
-List children = Lists.newArrayList();
-for(Prel child : prel){
-  child = child.accept(this, null);
+  public Prel visitPrel(Prel prel, Object unused) throws 
RelConversionException {
+List children = new ArrayList<>();
+for (Prel child : prel) {
+  child = child.accept(this, unused);
   children.add(child);
 }
 return (Prel) prel.copy(prel.getTraitSet(), children);
   }
 
 
   @Override
-  public Prel visitProject(ProjectPrel project, Object unused) throws 
RelConversionException {
+  public Prel visitProject(final ProjectPrel project, Object unused) throws 
RelConversionException {
+final Prel oldInput = (Prel) project.getInput(0);
+RelNode newInput = oldInput.accept(this, unused);
 
-// Apply the rule to the child
-RelNode originalInput = ((Prel)project.getInput(0)).accept(this, null);
-project = (ProjectPrel) project.copy(project.getTraitSet(), 
Lists.newArrayList(originalInput));
-
-List exprList = new ArrayList<>();
-
-List relDataTypes = new ArrayList<>();
-List origRelDataTypes = new ArrayList<>();
-int i = 0;
-final int lastColumnReferenced = 
PrelUtil.getLastUsedColumnReference(project.getProjects());
+ProjectPrel newProject = (ProjectPrel) project.copy(project.getTraitSet(), 
Lists.newArrayList(newInput));
 
+final int lastColumnReferenced = 
PrelUtil.getLastUsedColumnReference(newProject.getProjects());
 if (lastColumnReferenced == -1) {
-  return project;
+  return newProject;
 }
 
-final int lastRexInput = lastColumnReferenced + 1;
-RexVisitorComplexExprSplitter exprSplitter = new 
RexVisitorComplexExprSplitter(factory, funcReg, lastRexInput);
 
-for (RexNode rex : project.getChildExps()) {
-  origRelDataTypes.add(project.getRowType().getFieldList().get(i));
-  i++;
-  exprList.add(rex.accept(exprSplitter));
+List projectFields = 
newProject.getRowType().getFieldList();
+List origRelDataTypes = new ArrayList<>();
+List exprList = new ArrayList<>();
+final int lastRexInput = lastColumnReferenced + 1;
+RexVisitorComplexExprSplitter exprSplitter = new 
RexVisitorComplexExprSplitter(funcReg, rexBuilder, lastRexInput);
+int i = 0;
+for (RexNode rex : newProject.getChildExps()) {
+  RelDataTypeField originField = projectFields.get(i++);
+  RexNode splitRex = rex.accept(exprSplitter);
+  origRelDataTypes.add(originField);
+  exprList.add(splitRex);
 }
-List complexExprs = exprSplitter.getComplexExprs();
 
-if (complexExprs.size() == 1 && 
findTopComplexFunc(project.getChildExps()).size() == 1) {
-  return project;
+final List complexExprs = exprSplitter.getComplexExprs();
+if (complexExprs.size() == 1 && 
findTopComplexFunc(newProject.getChildExps()).size() == 1) {
+  return newProject;
 }
 
-ProjectPrel childProject;
 
-List allExprs = new ArrayList<>();
-int exprIndex = 0;
-List fieldNames = originalInput.getRowType().getFieldNames();
-for (int index = 0; index < lastRexInput; index++) {
-  RexBuilder builder = new RexBuilder(factory);
-  allExprs.add(builder.makeInputRef( new RelDataTypeDrillImpl(new 
RelDataTypeHolder(), factory), index));
 
-  if (fieldNames.get(index).contains(SchemaPath.DYNAMIC_STAR)) {
-relDataTypes.add(new RelDataTypeFieldImpl(fieldNames.get(index), 
allExprs.size(), factory.createSqlType(SqlTypeName.ANY)));
-  } else {
-relDataTypes.add(new RelDataTypeFieldImpl("EXPR$" + exprIndex, 
allExprs.size(), factory.createSqlType(SqlTypeName.ANY)));
-exprIndex++;
-  }

[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1691#comment-1691
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

vvysotskyi commented on pull request #1847: DRILL-7253: Read Hive struct w/o 
nulls
URL: https://github.com/apache/drill/pull/1847#discussion_r317208189
 
 

 ##
 File path: exec/vector/src/main/codegen/templates/HolderReaderImpl.java
 ##
 @@ -319,7 +319,9 @@ public void copyAsField(String name, MapWriter writer) {
 <#else>
   <#if !(minor.class == "Decimal9" || minor.class == "Decimal18")>
   public void copyAsValue(${minor.class?cap_first}Writer writer) {
-writer.write${minor.class}(<#list fields as field>holder.${field.name}<#if 
field_has_next>, );
+if (isSet()) {
 
 Review comment:
   Could you please also add this check to `copyAsField` methods.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914441#comment-16914441
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

vvysotskyi commented on pull request #1847: DRILL-7253: Read Hive struct w/o 
nulls
URL: https://github.com/apache/drill/pull/1847#discussion_r317206684
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/SplitUpComplexExpressions.java
 ##
 @@ -44,103 +44,108 @@
 
 public class SplitUpComplexExpressions extends BasePrelVisitor {
 
-  RelDataTypeFactory factory;
-  DrillOperatorTable table;
-  FunctionImplementationRegistry funcReg;
+  private final RelDataTypeFactory factory;
+  private final RexBuilder rexBuilder;
+  private final FunctionImplementationRegistry funcReg;
 
-  public SplitUpComplexExpressions(RelDataTypeFactory factory, 
DrillOperatorTable table, FunctionImplementationRegistry funcReg) {
-super();
+  public SplitUpComplexExpressions(RelDataTypeFactory factory, 
FunctionImplementationRegistry funcReg, RexBuilder rexBuilder) {
 this.factory = factory;
-this.table = table;
 this.funcReg = funcReg;
+this.rexBuilder = rexBuilder;
   }
 
   @Override
-  public Prel visitPrel(Prel prel, Object value) throws RelConversionException 
{
-List children = Lists.newArrayList();
-for(Prel child : prel){
-  child = child.accept(this, null);
+  public Prel visitPrel(Prel prel, Object unused) throws 
RelConversionException {
+List children = new ArrayList<>();
+for (Prel child : prel) {
+  child = child.accept(this, unused);
   children.add(child);
 }
 return (Prel) prel.copy(prel.getTraitSet(), children);
   }
 
 
   @Override
-  public Prel visitProject(ProjectPrel project, Object unused) throws 
RelConversionException {
+  public Prel visitProject(final ProjectPrel project, Object unused) throws 
RelConversionException {
+final Prel oldInput = (Prel) project.getInput(0);
+RelNode newInput = oldInput.accept(this, unused);
 
-// Apply the rule to the child
-RelNode originalInput = ((Prel)project.getInput(0)).accept(this, null);
-project = (ProjectPrel) project.copy(project.getTraitSet(), 
Lists.newArrayList(originalInput));
-
-List exprList = new ArrayList<>();
-
-List relDataTypes = new ArrayList<>();
-List origRelDataTypes = new ArrayList<>();
-int i = 0;
-final int lastColumnReferenced = 
PrelUtil.getLastUsedColumnReference(project.getProjects());
+ProjectPrel newProject = (ProjectPrel) project.copy(project.getTraitSet(), 
Lists.newArrayList(newInput));
 
+final int lastColumnReferenced = 
PrelUtil.getLastUsedColumnReference(newProject.getProjects());
 if (lastColumnReferenced == -1) {
-  return project;
+  return newProject;
 }
 
-final int lastRexInput = lastColumnReferenced + 1;
-RexVisitorComplexExprSplitter exprSplitter = new 
RexVisitorComplexExprSplitter(factory, funcReg, lastRexInput);
 
-for (RexNode rex : project.getChildExps()) {
-  origRelDataTypes.add(project.getRowType().getFieldList().get(i));
-  i++;
-  exprList.add(rex.accept(exprSplitter));
+List projectFields = 
newProject.getRowType().getFieldList();
+List origRelDataTypes = new ArrayList<>();
+List exprList = new ArrayList<>();
+final int lastRexInput = lastColumnReferenced + 1;
+RexVisitorComplexExprSplitter exprSplitter = new 
RexVisitorComplexExprSplitter(funcReg, rexBuilder, lastRexInput);
+int i = 0;
+for (RexNode rex : newProject.getChildExps()) {
+  RelDataTypeField originField = projectFields.get(i++);
+  RexNode splitRex = rex.accept(exprSplitter);
+  origRelDataTypes.add(originField);
+  exprList.add(splitRex);
 }
-List complexExprs = exprSplitter.getComplexExprs();
 
-if (complexExprs.size() == 1 && 
findTopComplexFunc(project.getChildExps()).size() == 1) {
-  return project;
+final List complexExprs = exprSplitter.getComplexExprs();
+if (complexExprs.size() == 1 && 
findTopComplexFunc(newProject.getChildExps()).size() == 1) {
+  return newProject;
 }
 
-ProjectPrel childProject;
 
-List allExprs = new ArrayList<>();
-int exprIndex = 0;
-List fieldNames = originalInput.getRowType().getFieldNames();
-for (int index = 0; index < lastRexInput; index++) {
-  RexBuilder builder = new RexBuilder(factory);
-  allExprs.add(builder.makeInputRef( new RelDataTypeDrillImpl(new 
RelDataTypeHolder(), factory), index));
 
-  if (fieldNames.get(index).contains(SchemaPath.DYNAMIC_STAR)) {
-relDataTypes.add(new RelDataTypeFieldImpl(fieldNames.get(index), 
allExprs.size(), factory.createSqlType(SqlTypeName.ANY)));
-  } else {
-relDataTypes.add(new RelDataTypeFieldImpl("EXPR$" + exprIndex, 
allExprs.size(), factory.createSqlType(SqlTypeName.ANY)));
-exprIndex++;
-  }

[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914443#comment-16914443
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

vvysotskyi commented on pull request #1847: DRILL-7253: Read Hive struct w/o 
nulls
URL: https://github.com/apache/drill/pull/1847#discussion_r317200659
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/SpecialFunctions.java
 ##
 @@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.expr.fn.impl;
+
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.vector.complex.reader.FieldReader;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+
+/**
+ * Container class for functions with {@link 
org.apache.calcite.sql.SqlSyntax#SPECIAL} syntax.
+ */
+public class SpecialFunctions {
+
+  private SpecialFunctions() {
+  }
+
+  /**
+   * Implementation of ROW(col1, col2, ..., colN) constructor function. Most 
often it's
+   * used to reconstruct struct columns after Calcite's flattening.
+   */
+  @FunctionTemplate(name = "row", scope = 
FunctionTemplate.FunctionScope.SIMPLE, isVarArg = true)
 
 Review comment:
   Should this function be exposed to the end-user? If no, then we should mark 
it as `isInternal`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914442#comment-16914442
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

vvysotskyi commented on pull request #1847: DRILL-7253: Read Hive struct w/o 
nulls
URL: https://github.com/apache/drill/pull/1847#discussion_r317202299
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/visitor/RexVisitorComplexExprSplitter.java
 ##
 @@ -85,21 +83,21 @@ public RexNode visitCorrelVariable(RexCorrelVariable 
correlVariable) {
 
   @Override
   public RexNode visitCall(RexCall call) {
-
-String functionName = call.getOperator().getName();
-
-List newOps = new ArrayList<>();
+final List newOps = new ArrayList<>();
 
 Review comment:
   This list population may be replaced with collect.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914390#comment-16914390
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317130089
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
 ##
 @@ -364,8 +365,8 @@ private TransferPair 
getFlattenFieldTransferPair(FieldReference reference) {
 final ValueVector flattenField = 
incoming.getValueAccessorById(vectorClass, 
fieldId.getFieldIds()).getValueVector();
 
 TransferPair tp = null;
-if (flattenField instanceof RepeatedMapVector) {
-  tp = 
((RepeatedMapVector)flattenField).getTransferPairToSingleMap(reference.getAsNamePart().getName(),
 oContext.getAllocator());
+if (flattenField instanceof AbstractRepeatedMapVector) {
 
 Review comment:
   Does it mean that we will be able to produce flattening for DictVector?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914403#comment-16914403
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317162013
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetTableMetadataUtils.java
 ##
 @@ -379,14 +382,8 @@ public static Object getValue(Object value, 
PrimitiveType.PrimitiveTypeName prim
   }
 
   private static Integer getInt(Object value) {
-if (value instanceof Integer) {
-  return (Integer) value;
-} else if (value instanceof Long) {
-  return ((Long) value).intValue();
-} else if (value instanceof Float) {
-  return ((Float) value).intValue();
-} else if (value instanceof Double) {
-  return ((Double) value).intValue();
+if (value instanceof Number) {
 
 Review comment:
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914396#comment-16914396
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317152531
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/TypedFieldId.java
 ##
 @@ -42,6 +45,7 @@
   final boolean isHyperReader;
   final boolean isListVector;
   final PathSegment remainder;
+  private final Map types;
 
 Review comment:
   Please make other fields also `private`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914399#comment-16914399
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317196664
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/impl/RepeatedMapWriter.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.complex.impl;
+
+import org.apache.drill.exec.expr.holders.RepeatedMapHolder;
+import org.apache.drill.exec.vector.complex.RepeatedMapVector;
+import org.apache.drill.exec.vector.complex.writer.FieldWriter;
+
+public class RepeatedMapWriter extends 
AbstractRepeatedMapWriter {
+
+  public RepeatedMapWriter(RepeatedMapVector container, FieldWriter parent, 
boolean unionEnabled) {
+super(container, parent, unionEnabled);
+  }
+
+  public RepeatedMapWriter(RepeatedMapVector container, FieldWriter parent) {
+this(container, parent, false);
+  }
+
+  @Override
+  public void start() {
+// update the repeated vector to state that there is current+1 objects.
+final RepeatedMapHolder h = new RepeatedMapHolder();
+final RepeatedMapVector map = container;
+final RepeatedMapVector.Mutator mutator = map.getMutator();
+
+// Make sure that the current vector can support the end position of this 
list.
+if(container.getValueCapacity() <= idx()) {
 
 Review comment:
   ```suggestion
   if (container.getValueCapacity() <= idx()) {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914397#comment-16914397
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317140836
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/EvaluationVisitor.java
 ##
 @@ -523,14 +516,43 @@ private HoldingContainer 
visitValueVectorReadExpression(ValueVectorReadExpressio
 eval.add(expr.invoke("setPosition").arg(recordIndex));
 int listNum = 0;
 
+JVar valueIndex = eval.decl(generator.getModel().INT, "valueIndex", 
JExpr.lit(-1));
+
+int depth = 0;
+boolean isMap = e.getFieldId().isMap(depth);
+
 while (seg != null) {
   if (seg.isArray()) {
+
 // stop once we get to the last segment and the final type is 
neither complex nor repeated (map, list, repeated list).
 // In case of non-complex and non-repeated type, we return Holder, 
in stead of FieldReader.
 if (seg.isLastPath() && !complex && !repeated && !listVector) {
   break;
 }
 
+depth++;
+
+if (isMap) {
+  JExpression keyExpr = 
JExpr.lit(seg.getArraySegment().getIndex());
+
+  JVar dictReader = generator.declareClassField("dictReader", 
generator.getModel()._ref(BaseReader.DictReader.class));
 
 Review comment:
   Could you please also use FieldReader here to be consistent with the code in 
this method?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914406#comment-16914406
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317156892
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ##
 @@ -713,4 +715,63 @@ public static boolean 
containsComplexColumn(ParquetMetadata footer, ListNOTE: current implementation cares about {@link OriginalType#MAP} 
only
+   * converting it to {@link 
org.apache.drill.common.types.TypeProtos.MinorType#DICT}.
+   * Other original types are converted to {@code null}.
+   *
+   * @param originalTypes list of Parquet's types
+   * @return list containing either {@code null} or type with minor
+   * type {@link 
org.apache.drill.common.types.TypeProtos.MinorType#DICT} values
+   */
+  public static List getComplexTypes(List 
originalTypes) {
 
 Review comment:
   Is it possible to use `BitSet` here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914392#comment-16914392
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317153368
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/avro/AvroRecordReader.java
 ##
 @@ -207,8 +209,11 @@ private void process(final Object value, final Schema 
schema, final String field
 final GenericArray array = (GenericArray) value;
 Schema elementSchema = array.getSchema().getElementType();
 Type elementType = elementSchema.getType();
-if (elementType == Schema.Type.RECORD || elementType == 
Schema.Type.MAP){
+if (elementType == Schema.Type.RECORD){
 
 Review comment:
   ```suggestion
   if (elementType == Schema.Type.RECORD) {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914394#comment-16914394
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317151040
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/TypedFieldId.java
 ##
 @@ -42,6 +45,7 @@
   final boolean isHyperReader;
   final boolean isListVector;
   final PathSegment remainder;
+  private final Map types;
 
 Review comment:
   Could you please add a comment that this field is used to determine that the 
map is placed at a specific depth. Also, would it be simpler to use `BitSet` 
instead of `Map` for these purposes?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914391#comment-16914391
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317152242
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/TypedFieldId.java
 ##
 @@ -110,6 +116,11 @@ public MajorType getIntermediateType() {
 return intermediateType;
   }
 
+  public boolean isMap(int depth) {
 
 Review comment:
   I'm a little bit confused with namings: somewhere map still means map, but 
somewhere it means dict. Should we use dict where possible to avoid confusion 
and add corresponding comments in places where map means dict?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914405#comment-16914405
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317193969
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/AbstractRepeatedMapVector.java
 ##
 @@ -0,0 +1,524 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.complex;
+
+import io.netty.buffer.DrillBuf;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.commons.lang3.ArrayUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.BasicTypeHelper;
+import org.apache.drill.exec.expr.holders.RepeatedValueHolder;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.memory.AllocationManager.BufferLedger;
+import org.apache.drill.exec.proto.UserBitShared.SerializedField;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.util.CallBack;
+import org.apache.drill.exec.vector.AddOrGetResult;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VectorDescriptor;
+
+public abstract class AbstractRepeatedMapVector extends AbstractMapVector 
implements RepeatedValueVector {
+
+  protected final UInt4Vector offsets; // offsets to start of each record 
(considering record indices are 0-indexed)
+  protected final EmptyValuePopulator emptyPopulator;
+
+  protected AbstractRepeatedMapVector(MaterializedField field, BufferAllocator 
allocator, CallBack callBack) {
+this(field, new UInt4Vector(BaseRepeatedValueVector.OFFSETS_FIELD, 
allocator), callBack);
+  }
+
+  protected AbstractRepeatedMapVector(MaterializedField field, UInt4Vector 
offsets, CallBack callBack) {
+super(field, offsets.getAllocator(), callBack);
+this.offsets = offsets;
+this.emptyPopulator = new EmptyValuePopulator(offsets);
+  }
+
+  @Override
+  public UInt4Vector getOffsetVector() { return offsets; }
+
+  @Override
+  public ValueVector getDataVector() {
+throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public  AddOrGetResult 
addOrGetVector(VectorDescriptor descriptor) {
+throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public void setInitialCapacity(int numRecords) {
+offsets.setInitialCapacity(numRecords + 1);
+for (final ValueVector v : this) {
+  v.setInitialCapacity(numRecords * 
RepeatedValueVector.DEFAULT_REPEAT_PER_RECORD);
+}
+  }
+
+  public void allocateNew(int groupCount, int innerValueCount) {
+clear();
+try {
+  allocateOffsetsNew(groupCount);
+  for (ValueVector v : getChildren()) {
+AllocationHelper.allocatePrecomputedChildCount(v, groupCount, 50, 
innerValueCount);
+  }
+} catch (OutOfMemoryException e){
+  clear();
+  throw e;
+}
+reset();
+  }
+
+  abstract void reset();
+
+  public void allocateOffsetsNew(int groupCount) {
+offsets.allocateNew(groupCount + 1);
+offsets.zeroVector();
+  }
+
+  public Iterator fieldNameIterator() {
+return getChildFieldNames().iterator();
+  }
+
+  @Override
+  public List getPrimitiveVectors() {
+final List primitiveVectors = super.getPrimitiveVectors();
+primitiveVectors.add(offsets);
+return primitiveVectors;
+  }
+
+  @Override
+  public int getBufferSize() {
+if (getValueCount() == 0) {
+  return 0;
+}
+return offsets.getBufferSize() + super.getBufferSize();
+  }
+
+  @Override
+  public int getAllocatedSize() {
+return offsets.getAllocatedSize() + super.getAllocatedSize();
+  }
+
+  @Override
+  public int 

[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914393#comment-16914393
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317178005
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillParquetReader.java
 ##
 @@ -179,6 +189,31 @@ private static MessageType getProjection(MessageType 
schema,
 return projection;
   }
 
+  /**
+   * Get type from the supplied {@code type} corresponding to given {@code 
segment}.
+   * @param type type to extract field corresponding to segment
+   * @param segment segment which type will be returned
+   * @return type corresponding to the {@code segment} or {@code null} if 
there is no field found in {@code type}.
+   */
+  private static Type getType(Type type, PathSegment segment) {
+Type result = null;
+if (type != null && !type.isPrimitive()) {
+  GroupType groupType = type.asGroupType();
+  if (segment.isNamed()) {
+String fieldName = segment.getNameSegment().getPath();
+Optional foundType = groupType.getFields().stream()
 
 Review comment:
   Is it required to declare `foundType` here, since methods calls chain may be 
continued?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914401#comment-16914401
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317193686
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/AbstractRepeatedMapVector.java
 ##
 @@ -0,0 +1,524 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.complex;
+
+import io.netty.buffer.DrillBuf;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.commons.lang3.ArrayUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.BasicTypeHelper;
+import org.apache.drill.exec.expr.holders.RepeatedValueHolder;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.memory.AllocationManager.BufferLedger;
+import org.apache.drill.exec.proto.UserBitShared.SerializedField;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.util.CallBack;
+import org.apache.drill.exec.vector.AddOrGetResult;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VectorDescriptor;
+
+public abstract class AbstractRepeatedMapVector extends AbstractMapVector 
implements RepeatedValueVector {
+
+  protected final UInt4Vector offsets; // offsets to start of each record 
(considering record indices are 0-indexed)
+  protected final EmptyValuePopulator emptyPopulator;
+
+  protected AbstractRepeatedMapVector(MaterializedField field, BufferAllocator 
allocator, CallBack callBack) {
+this(field, new UInt4Vector(BaseRepeatedValueVector.OFFSETS_FIELD, 
allocator), callBack);
+  }
+
+  protected AbstractRepeatedMapVector(MaterializedField field, UInt4Vector 
offsets, CallBack callBack) {
+super(field, offsets.getAllocator(), callBack);
+this.offsets = offsets;
+this.emptyPopulator = new EmptyValuePopulator(offsets);
+  }
+
+  @Override
+  public UInt4Vector getOffsetVector() { return offsets; }
+
+  @Override
+  public ValueVector getDataVector() {
+throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public  AddOrGetResult 
addOrGetVector(VectorDescriptor descriptor) {
+throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public void setInitialCapacity(int numRecords) {
+offsets.setInitialCapacity(numRecords + 1);
+for (final ValueVector v : this) {
+  v.setInitialCapacity(numRecords * 
RepeatedValueVector.DEFAULT_REPEAT_PER_RECORD);
+}
+  }
+
+  public void allocateNew(int groupCount, int innerValueCount) {
+clear();
+try {
+  allocateOffsetsNew(groupCount);
+  for (ValueVector v : getChildren()) {
+AllocationHelper.allocatePrecomputedChildCount(v, groupCount, 50, 
innerValueCount);
+  }
+} catch (OutOfMemoryException e){
+  clear();
+  throw e;
+}
+reset();
+  }
+
+  abstract void reset();
+
+  public void allocateOffsetsNew(int groupCount) {
+offsets.allocateNew(groupCount + 1);
+offsets.zeroVector();
+  }
+
+  public Iterator fieldNameIterator() {
+return getChildFieldNames().iterator();
+  }
+
+  @Override
+  public List getPrimitiveVectors() {
+final List primitiveVectors = super.getPrimitiveVectors();
+primitiveVectors.add(offsets);
+return primitiveVectors;
+  }
+
+  @Override
+  public int getBufferSize() {
+if (getValueCount() == 0) {
+  return 0;
+}
+return offsets.getBufferSize() + super.getBufferSize();
+  }
+
+  @Override
+  public int getAllocatedSize() {
+return offsets.getAllocatedSize() + super.getAllocatedSize();
+  }
+
+  @Override
+  public int 

[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914404#comment-16914404
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317196269
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/impl/RepeatedDictReaderImpl.java
 ##
 @@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.complex.impl;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.exec.expr.holders.RepeatedDictHolder;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.RepeatedDictVector;
+import org.apache.drill.exec.vector.complex.reader.FieldReader;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+
+public class RepeatedDictReaderImpl extends AbstractFieldReader {
+
+  private static final int NO_VALUES = Integer.MAX_VALUE - 1;
 
 Review comment:
   I think it would be better to use some negative value there.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914400#comment-16914400
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317194371
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java
 ##
 @@ -250,6 +251,54 @@ public void exchange(ValueVector other) {
 offsets.exchange(target.offsets);
   }
 
+  protected abstract class BaseRepeatedValueVectorTransferPair implements TransferPair {
+
+protected final T target;
+protected final TransferPair[] children;
+
+protected BaseRepeatedValueVectorTransferPair(T target) {
+  this.target = Preconditions.checkNotNull(target);
+  if (target.getDataVector() == DEFAULT_DATA_VECTOR) {
+
target.addOrGetVector(VectorDescriptor.create(getDataVector().getField()));
+target.getDataVector().allocateNew();
+  }
+  this.children = new TransferPair[] {
+  getOffsetVector().makeTransferPair(target.getOffsetVector()),
+  getDataVector().makeTransferPair(target.getDataVector())
+  };
+}
+
+@Override
+public void transfer() {
+  for (TransferPair child : children) {
+child.transfer();
+  }
+}
+
+@Override
+public ValueVector getTo() {
+  return target;
+}
+
+@Override
+public void splitAndTransfer(int startIndex, int length) {
+  target.allocateNew();
+  for (int i = 0; i < length; i++) {
+copyValueSafe(startIndex + i, i);
+  }
+}
+
+protected void copyValueSafe(int destIndex, int start, int end) {
+  final TransferPair vectorTransfer = children[1];
+  int newIndex = target.getOffsetVector().getAccessor().get(destIndex);
+  //todo: make this a bulk copy.
 
 Review comment:
   ```suggestion
 // TODO: make this a bulk copy.
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914408#comment-16914408
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317194671
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/DictVector.java
 ##
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.complex;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.expr.holders.RepeatedValueHolder;
+import org.apache.drill.exec.expr.holders.DictHolder;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.util.CallBack;
+import org.apache.drill.exec.util.JsonStringHashMap;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.impl.SingleDictReaderImpl;
+
+/**
+ * A {@link ValueVector} holding key-value pairs.
+ * This vector is essentially a {@link RepeatedMapVector} but with 
constraints:
+ * it may have 2 children only, named {@link #FIELD_KEY_NAME} and {@link 
#FIELD_VALUE_NAME}.
+ * The {@link #FIELD_KEY_NAME} can be of primitive type only and its values 
should not be {@code null},
+ * while the other, {@link #FIELD_VALUE_NAME}, field can be either of 
primitive or complex type.
+ *
+ * This vector has it's own {@link 
org.apache.drill.exec.vector.complex.reader.FieldReader} and
+ * {@link org.apache.drill.exec.vector.complex.writer.FieldWriter} to ensure 
data is read and written correctly.
+ * In addition, the reader is responsible for getting a value for a given key.
+ *
+ * Additionally, {@code Object} representation is changed in {@link 
Accessor#getObject(int)}
+ * to represent it as {@link JsonStringHashMap} with appropriate {@code key} 
and {@code value} types.
+ *
+ * (The structure corresponds to Java's notion of {@link Map}).
+ *
+ * @see SingleDictReaderImpl reader corresponding to the vector
+ * @see org.apache.drill.exec.vector.complex.impl.SingleDictWriter writer 
corresponding to the vector
+ */
+public final class DictVector extends AbstractRepeatedMapVector {
+
+  public final static MajorType TYPE = Types.optional(MinorType.DICT);
+
+  public static final String FIELD_KEY_NAME = "key";
+  public static final String FIELD_VALUE_NAME = "value";
+  public static final List fieldNames = 
Collections.unmodifiableList(Arrays.asList(FIELD_KEY_NAME, FIELD_VALUE_NAME));
 
 Review comment:
   There is no need to wrap `Arrays.asList`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard 

[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914402#comment-16914402
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317196848
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/impl/RepeatedMapWriter.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.complex.impl;
+
+import org.apache.drill.exec.expr.holders.RepeatedMapHolder;
+import org.apache.drill.exec.vector.complex.RepeatedMapVector;
+import org.apache.drill.exec.vector.complex.writer.FieldWriter;
+
+public class RepeatedMapWriter extends 
AbstractRepeatedMapWriter {
+
+  public RepeatedMapWriter(RepeatedMapVector container, FieldWriter parent, 
boolean unionEnabled) {
+super(container, parent, unionEnabled);
+  }
+
+  public RepeatedMapWriter(RepeatedMapVector container, FieldWriter parent) {
+this(container, parent, false);
+  }
+
+  @Override
+  public void start() {
+// update the repeated vector to state that there is current+1 objects.
+final RepeatedMapHolder h = new RepeatedMapHolder();
+final RepeatedMapVector map = container;
+final RepeatedMapVector.Mutator mutator = map.getMutator();
+
+// Make sure that the current vector can support the end position of this 
list.
+if(container.getValueCapacity() <= idx()) {
+  mutator.setValueCount(idx()+1);
 
 Review comment:
   ```suggestion
 mutator.setValueCount(idx() + 1);
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914395#comment-16914395
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317181670
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/record/vector/TestDictVector.java
 ##
 @@ -0,0 +1,460 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.vector;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.categories.VectorTest;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.ExecTest;
+import org.apache.drill.exec.expr.holders.NullableBigIntHolder;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.memory.RootAllocatorFactory;
+import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatchLoader;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.WritableBatch;
+import org.apache.drill.exec.vector.complex.DictVector;
+import org.apache.drill.exec.vector.complex.impl.SingleDictWriter;
+import org.apache.drill.exec.vector.complex.reader.BaseReader;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.drill.test.TestBuilder;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.rules.ExpectedException;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+
+import static org.hamcrest.CoreMatchers.containsString;
+import static org.junit.Assert.assertEquals;
+
+@Category(VectorTest.class)
+public class TestDictVector extends ExecTest {
+
+  @Rule
+  public ExpectedException thrown = ExpectedException.none();
+
+  private BufferAllocator allocator;
+
+  @Before
+  public void setUp() {
+allocator = RootAllocatorFactory.newRoot(DrillConfig.create());
+  }
+
+  @After
+  public void tearDown(){
+allocator.close();
+  }
+
+  @Test
+  public void testVectorCreation() {
+MaterializedField field = MaterializedField.create("map", DictVector.TYPE);
+try (DictVector mapVector = new DictVector(field, allocator, null)) {
+  mapVector.allocateNew();
+
+  List> maps = Arrays.asList(
+  TestBuilder.mapOfObject(4f, 1L, 5.3f, 2L, 0.3f, 3L, -0.2f, 4L, 
102.07f, 5L),
 
 Review comment:
   Please rework these tests to use generics if possible.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers 

[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914407#comment-16914407
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

vvysotskyi commented on pull request #1829: DRILL-7096: Develop vector for 
canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317196771
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/impl/RepeatedMapWriter.java
 ##
 @@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.complex.impl;
+
+import org.apache.drill.exec.expr.holders.RepeatedMapHolder;
+import org.apache.drill.exec.vector.complex.RepeatedMapVector;
+import org.apache.drill.exec.vector.complex.writer.FieldWriter;
+
+public class RepeatedMapWriter extends 
AbstractRepeatedMapWriter {
+
+  public RepeatedMapWriter(RepeatedMapVector container, FieldWriter parent, 
boolean unionEnabled) {
+super(container, parent, unionEnabled);
+  }
+
+  public RepeatedMapWriter(RepeatedMapVector container, FieldWriter parent) {
+this(container, parent, false);
+  }
+
+  @Override
+  public void start() {
+// update the repeated vector to state that there is current+1 objects.
+final RepeatedMapHolder h = new RepeatedMapHolder();
+final RepeatedMapVector map = container;
+final RepeatedMapVector.Mutator mutator = map.getMutator();
+
+// Make sure that the current vector can support the end position of this 
list.
+if(container.getValueCapacity() <= idx()) {
+  mutator.setValueCount(idx()+1);
+}
+
+map.getAccessor().get(idx(), h);
+if (h.start >= h.end) {
+  container.getMutator().startNewValue(idx());
+}
+currentChildIndex = container.getMutator().add(idx());
+for(final FieldWriter w : fields.values()) {
 
 Review comment:
   ```suggestion
   for (FieldWriter w : fields.values()) {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7361) Add Map (Dict) support for schema provisioning using file

2019-08-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7361:

Fix Version/s: Future

> Add Map (Dict) support for schema provisioning using file
> -
>
> Key: DRILL-7361
> URL: https://issues.apache.org/jira/browse/DRILL-7361
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Priority: Major
> Fix For: Future
>
>
> Once Dict is added to row set framework, schema commands must be able to 
> process this type.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7361) Add Map (Dict) support for schema provisioning using file

2019-08-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7361:

Affects Version/s: 1.17.0

> Add Map (Dict) support for schema provisioning using file
> -
>
> Key: DRILL-7361
> URL: https://issues.apache.org/jira/browse/DRILL-7361
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.17.0
>Reporter: Arina Ielchiieva
>Priority: Major
>
> Once Dict is added to row set framework, schema commands must be able to 
> process this type.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7361) Add Map (Dict) support for schema provisioning using file

2019-08-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7361:

Summary: Add Map (Dict) support for schema provisioning using file  (was: 
Add Map (Dict) support for schema file provisioning)

> Add Map (Dict) support for schema provisioning using file
> -
>
> Key: DRILL-7361
> URL: https://issues.apache.org/jira/browse/DRILL-7361
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Arina Ielchiieva
>Priority: Major
>
> Once Dict is added to row set framework, schema commands must be able to 
> process this type.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (DRILL-7361) Add Map (Dict) support for schema file provisioning

2019-08-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva reassigned DRILL-7361:
---

Assignee: (was: Arina Ielchiieva)

> Add Map (Dict) support for schema file provisioning
> ---
>
> Key: DRILL-7361
> URL: https://issues.apache.org/jira/browse/DRILL-7361
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Arina Ielchiieva
>Priority: Major
>
> Once Dict is added to row set framework, schema commands must be able to 
> process this type.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7361) Add Map (Dict) support for schema file provisioning

2019-08-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7361:

Reviewer:   (was: Volodymyr Vysotskyi)

> Add Map (Dict) support for schema file provisioning
> ---
>
> Key: DRILL-7361
> URL: https://issues.apache.org/jira/browse/DRILL-7361
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Arina Ielchiieva
>Priority: Major
>
> Once Dict is added to row set framework, schema commands must be able to 
> process this type.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (DRILL-7361) Add Map (Dict) support for schema file provisioning

2019-08-23 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7361:
---

 Summary: Add Map (Dict) support for schema file provisioning
 Key: DRILL-7361
 URL: https://issues.apache.org/jira/browse/DRILL-7361
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva


Once Dict is added to row set framework, schema commands must be able to 
process this type.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7360) Refactor WatchService code in Drillbit class

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914308#comment-16914308
 ] 

ASF GitHub Bot commented on DRILL-7360:
---

arina-ielchiieva commented on issue #1848: DRILL-7360: Refactor WatchService in 
Drillbit class and fix concurrency issues
URL: https://github.com/apache/drill/pull/1848#issuecomment-524341918
 
 
   @vvysotskyi please review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor WatchService code in Drillbit class
> 
>
> Key: DRILL-7360
> URL: https://issues.apache.org/jira/browse/DRILL-7360
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Refactor WatchService to user proper code (see 
> https://docs.oracle.com/javase/tutorial/essential/io/notification.html for 
> details) in Drillbit class and fix concurrency issues connected with 
> variables assigning from different thread.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7360) Refactor WatchService code in Drillbit class

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914306#comment-16914306
 ] 

ASF GitHub Bot commented on DRILL-7360:
---

arina-ielchiieva commented on pull request #1848: DRILL-7360: Refactor 
WatchService in Drillbit class and fix concurrency issues
URL: https://github.com/apache/drill/pull/1848
 
 
   See [DRILL-7360](https://issues.apache.org/jira/browse/DRILL-7360) for 
details.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Refactor WatchService code in Drillbit class
> 
>
> Key: DRILL-7360
> URL: https://issues.apache.org/jira/browse/DRILL-7360
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Refactor WatchService to user proper code (see 
> https://docs.oracle.com/javase/tutorial/essential/io/notification.html for 
> details) in Drillbit class and fix concurrency issues connected with 
> variables assigning from different thread.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7360) Refactor WatchService code in Drillbit class

2019-08-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7360:

Description: Refactor WatchService to user proper code (see 
https://docs.oracle.com/javase/tutorial/essential/io/notification.html for 
details) in Drillbit class and fix concurrency issues connected with variables 
assigning from different thread.  (was: Refactor WatchService to user proper 
code (see 
https://docs.oracle.com/javase/tutorial/essential/io/notification.html for 
details) and fix concurrency issues connected with variables assigning from 
different thread.)

> Refactor WatchService code in Drillbit class
> 
>
> Key: DRILL-7360
> URL: https://issues.apache.org/jira/browse/DRILL-7360
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.16.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Major
> Fix For: 1.17.0
>
>
> Refactor WatchService to user proper code (see 
> https://docs.oracle.com/javase/tutorial/essential/io/notification.html for 
> details) in Drillbit class and fix concurrency issues connected with 
> variables assigning from different thread.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (DRILL-7360) Refactor WatchService code in Drillbit class

2019-08-23 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7360:
---

 Summary: Refactor WatchService code in Drillbit class
 Key: DRILL-7360
 URL: https://issues.apache.org/jira/browse/DRILL-7360
 Project: Apache Drill
  Issue Type: Task
Affects Versions: 1.16.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.17.0


Refactor WatchService to user proper code (see 
https://docs.oracle.com/javase/tutorial/essential/io/notification.html for 
details) and fix concurrency issues connected with variables assigning from 
different thread.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-08-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7359:

Fix Version/s: 1.17.0

> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-08-23 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7359:

Reviewer: Paul Rogers

> Add support for DICT type in RowSet Framework
> -
>
> Key: DRILL-7359
> URL: https://issues.apache.org/jira/browse/DRILL-7359
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Bohdan Kazydub
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7326) Support repeated lists for CTAS parquet format

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914253#comment-16914253
 ] 

ASF GitHub Bot commented on DRILL-7326:
---

ihuzenko commented on pull request #1844: DRILL-7326: Support repeated lists 
for CTAS parquet format
URL: https://github.com/apache/drill/pull/1844#discussion_r317124843
 
 

 ##
 File path: 
exec/java-exec/src/main/codegen/templates/EventBasedRecordWriter.java
 ##
 @@ -96,7 +96,7 @@ private void initFieldWriters() throws IOException {
 
   public static abstract class FieldConverter {
 protected static final String LIST = "list";
-protected static final String ELEMENT = "element";
+public static final String ELEMENT = "element";
 
 Review comment:
   @KazydubB , done. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support repeated lists for CTAS parquet format
> --
>
> Key: DRILL-7326
> URL: https://issues.apache.org/jira/browse/DRILL-7326
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.16.0
>Reporter: Pavel Semenov
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> *STEPS TO REPRODUCE*
>  # Create json file with which has double nesting array as a value e.g.
> {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]]
>  {code}
>  # Use CTAS to create table in drill with created json file
>  # Observe the result
> *EXPECTED RESULT*
>  Table is created
> *ACTUAL RESULT*
>  UnsupportedOperationException appears on attempting to create the table
> *ADDITIONAL INFO*
>  It is possible to create table with with *single* nested array
>  Error log
> {code:java}
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010]
> (java.lang.UnsupportedOperationException) Unsupported type LIST
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.record.AbstractRecordBatch.next():126
>  org.apache.drill.exec.record.AbstractRecordBatch.next():116
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1669
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7326) Support repeated lists for CTAS parquet format

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914249#comment-16914249
 ] 

ASF GitHub Bot commented on DRILL-7326:
---

ihuzenko commented on pull request #1844: DRILL-7326: Support repeated lists 
for CTAS parquet format
URL: https://github.com/apache/drill/pull/1844#discussion_r317121315
 
 

 ##
 File path: 
exec/java-exec/src/main/codegen/templates/EventBasedRecordWriter.java
 ##
 @@ -96,7 +96,7 @@ private void initFieldWriters() throws IOException {
 
   public static abstract class FieldConverter {
 protected static final String LIST = "list";
-protected static final String ELEMENT = "element";
+public static final String ELEMENT = "element";
 
 Review comment:
   Only thing here that the constants should be part of parent 
```ParquetOutputRecordWriter```.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support repeated lists for CTAS parquet format
> --
>
> Key: DRILL-7326
> URL: https://issues.apache.org/jira/browse/DRILL-7326
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.16.0
>Reporter: Pavel Semenov
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> *STEPS TO REPRODUCE*
>  # Create json file with which has double nesting array as a value e.g.
> {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]]
>  {code}
>  # Use CTAS to create table in drill with created json file
>  # Observe the result
> *EXPECTED RESULT*
>  Table is created
> *ACTUAL RESULT*
>  UnsupportedOperationException appears on attempting to create the table
> *ADDITIONAL INFO*
>  It is possible to create table with with *single* nested array
>  Error log
> {code:java}
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010]
> (java.lang.UnsupportedOperationException) Unsupported type LIST
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.record.AbstractRecordBatch.next():126
>  org.apache.drill.exec.record.AbstractRecordBatch.next():116
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1669
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7326) Support repeated lists for CTAS parquet format

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914230#comment-16914230
 ] 

ASF GitHub Bot commented on DRILL-7326:
---

ihuzenko commented on pull request #1844: DRILL-7326: Support repeated lists 
for CTAS parquet format
URL: https://github.com/apache/drill/pull/1844#discussion_r317115031
 
 

 ##
 File path: 
exec/java-exec/src/main/codegen/templates/EventBasedRecordWriter.java
 ##
 @@ -96,7 +96,7 @@ private void initFieldWriters() throws IOException {
 
   public static abstract class FieldConverter {
 protected static final String LIST = "list";
-protected static final String ELEMENT = "element";
+public static final String ELEMENT = "element";
 
 Review comment:
   ok. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support repeated lists for CTAS parquet format
> --
>
> Key: DRILL-7326
> URL: https://issues.apache.org/jira/browse/DRILL-7326
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.16.0
>Reporter: Pavel Semenov
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> *STEPS TO REPRODUCE*
>  # Create json file with which has double nesting array as a value e.g.
> {code:java}"stringx2":[["asd","фывфы","asg"],["as","acz","gte"],["as","tdf","dsd"]]
>  {code}
>  # Use CTAS to create table in drill with created json file
>  # Observe the result
> *EXPECTED RESULT*
>  Table is created
> *ACTUAL RESULT*
>  UnsupportedOperationException appears on attempting to create the table
> *ADDITIONAL INFO*
>  It is possible to create table with with *single* nested array
>  Error log
> {code:java}
> Error: SYSTEM ERROR: UnsupportedOperationException: Unsupported type LIST
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: c48c6154-30a1-49c8-ac3b-7c2f898a7f4e on node1.cluster.com:31010]
> (java.lang.UnsupportedOperationException) Unsupported type LIST
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.getType():295
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.newSchema():226
>  org.apache.drill.exec.store.parquet.ParquetRecordWriter.updateSchema():211
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.setupNewSchema():160
>  org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():108
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.record.AbstractRecordBatch.next():126
>  org.apache.drill.exec.record.AbstractRecordBatch.next():116
>  org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext():63
>  
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():141
>  org.apache.drill.exec.record.AbstractRecordBatch.next():186
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>  org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():83
>  org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():296
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():283
>  java.security.AccessController.doPrivileged():-2
>  javax.security.auth.Subject.doAs():422
>  org.apache.hadoop.security.UserGroupInformation.doAs():1669
>  org.apache.drill.exec.work.fragment.FragmentExecutor.run():283
>  org.apache.drill.common.SelfCleaningRunnable.run():38
>  java.util.concurrent.ThreadPoolExecutor.runWorker():1149
>  java.util.concurrent.ThreadPoolExecutor$Worker.run():624
>  java.lang.Thread.run():748 (state=,code=0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914174#comment-16914174
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

arina-ielchiieva commented on pull request #1847: DRILL-7253: Read Hive struct 
w/o nulls
URL: https://github.com/apache/drill/pull/1847#discussion_r317081991
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillCalciteSqlOperatorWrapper.java
 ##
 @@ -136,4 +138,14 @@ public void unparse(
   int rightPrec) {
 operator.unparse(writer, call, leftPrec, rightPrec);
   }
+
+  @Override
+  public RelDataType inferReturnType(SqlOperatorBinding opBinding) {
+if (operator instanceof SqlRowOperator) {
+  return operator.inferReturnType(opBinding);
+} else {
+  return super.inferReturnType(opBinding);
+}
+  }
+
 
 Review comment:
   Nit: new line
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914175#comment-16914175
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

arina-ielchiieva commented on pull request #1847: DRILL-7253: Read Hive struct 
w/o nulls
URL: https://github.com/apache/drill/pull/1847#discussion_r317080996
 
 

 ##
 File path: 
contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/writers/complex/HiveStructWriter.java
 ##
 @@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.hive.writers.complex;
+
+import org.apache.drill.exec.store.hive.writers.HiveValueWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+
+public class HiveStructWriter implements HiveValueWriter {
+
+  private final StructObjectInspector structObjectInspector;
+
+  private final StructField[] structFields;
+
+  private final HiveValueWriter[] fieldWriters;
+
+  private final BaseWriter.MapWriter structWriter;
+
+  public HiveStructWriter(StructObjectInspector structObjectInspector, 
StructField[] structFields, HiveValueWriter[] fieldWriters, 
BaseWriter.MapWriter structWriter) {
+this.structObjectInspector = structObjectInspector;
+this.structFields = structFields;
+this.fieldWriters = fieldWriters;
+this.structWriter = structWriter;
+  }
+
+  @Override
+  public void write(Object value) {
+structWriter.start();
+for (int fieldIdx = 0; fieldIdx < structFields.length; fieldIdx++) {
+  Object fieldValue = structObjectInspector.getStructFieldData(value, 
structFields[fieldIdx]);
+  if (fieldValue == null) {
+throw new UnsupportedOperationException("Null is not supported in Hive 
struct!");
 
 Review comment:
   ```suggestion
   throw new UnsupportedOperationException("Null is not supported in 
Hive struct");
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914168#comment-16914168
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317079385
 
 

 ##
 File path: 
logical/src/main/java/org/apache/drill/common/expression/SchemaPath.java
 ##
 @@ -74,6 +74,16 @@ public static SchemaPath getCompoundPath(String... strings) 
{
 return new SchemaPath(s);
   }
 
+  public static SchemaPath getCompoundPath(int to, String... strings) {
 
 Review comment:
   Please add java doc with example and test into 
`org.apache.drill.common.expression.SchemaPathTest`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914162#comment-16914162
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317075663
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata_V4.java
 ##
 @@ -400,11 +405,23 @@ public Object deserializeKey(String key, 
com.fasterxml.jackson.databind.Deserial
* Note: Since the structure of column metadata hasn't changes from v3, 
ColumnMetadata_v4 extends ColumnMetadata_v3
*/
   public static class ColumnMetadata_v4 extends Metadata_V3.ColumnMetadata_v3 {
+
+List originalTypes;
+
 public ColumnMetadata_v4() {
 }
 
 public ColumnMetadata_v4(String[] name, PrimitiveType.PrimitiveTypeName 
primitiveType, Object minValue, Object maxValue, Long nulls) {
+  this(name, primitiveType, minValue, maxValue, nulls, new ArrayList<>());
 
 Review comment:
   Collections.emptyList()?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914166#comment-16914166
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317078570
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/DictVector.java
 ##
 @@ -0,0 +1,321 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.complex;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.expr.holders.RepeatedValueHolder;
+import org.apache.drill.exec.expr.holders.DictHolder;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.util.CallBack;
+import org.apache.drill.exec.util.JsonStringHashMap;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.impl.SingleDictReaderImpl;
+
+/**
+ * A {@link ValueVector} holding key-value pairs.
+ * This vector is essentially a {@link RepeatedMapVector} but with 
constraints:
+ * it may have 2 children only, named {@link #FIELD_KEY_NAME} and {@link 
#FIELD_VALUE_NAME}.
+ * The {@link #FIELD_KEY_NAME} can be of primitive type only and its values 
should not be {@code null},
+ * while the other, {@link #FIELD_VALUE_NAME}, field can be either of 
primitive or complex type.
+ *
+ * This vector has it's own {@link 
org.apache.drill.exec.vector.complex.reader.FieldReader} and
+ * {@link org.apache.drill.exec.vector.complex.writer.FieldWriter} to ensure 
data is read and written correctly.
+ * In addition, the reader is responsible for getting a value for a given key.
+ *
+ * Additionally, {@code Object} representation is changed in {@link 
Accessor#getObject(int)}
+ * to represent it as {@link JsonStringHashMap} with appropriate {@code key} 
and {@code value} types.
+ *
+ * (The structure corresponds to Java's notion of {@link Map}).
+ *
+ * @see SingleDictReaderImpl reader corresponding to the vector
+ * @see org.apache.drill.exec.vector.complex.impl.SingleDictWriter writer 
corresponding to the vector
+ */
+public final class DictVector extends AbstractRepeatedMapVector {
+
+  public final static MajorType TYPE = Types.optional(MinorType.DICT);
+
+  public static final String FIELD_KEY_NAME = "key";
+  public static final String FIELD_VALUE_NAME = "value";
+  public static final List fieldNames = 
Collections.unmodifiableList(Arrays.asList(FIELD_KEY_NAME, FIELD_VALUE_NAME));
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(DictVector.class);
 
 Review comment:
   Imports...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such 

[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914164#comment-16914164
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317076269
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet2/DrillMapGroupConverter.java
 ##
 @@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.parquet2;
+
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.physical.impl.OutputMutator;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.parquet.ParquetReaderUtility;
+import org.apache.drill.exec.vector.complex.DictVector;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter.DictWriter;
+import org.apache.parquet.io.api.Converter;
+import org.apache.parquet.schema.GroupType;
+import org.apache.parquet.schema.Type;
+
+import java.util.Collections;
+
+class DrillMapGroupConverter extends DrillParquetGroupConverter {
+
+  private final DictWriter writer;
+
+  DrillMapGroupConverter(OutputMutator mutator, DictWriter mapWriter, 
GroupType schema,
+OptionManager options, 
ParquetReaderUtility.DateCorruptionStatus containsCorruptedDates) {
+super(mutator, mapWriter, options, containsCorruptedDates);
+writer = mapWriter;
+
+GroupType type = schema.getType(0).asGroupType();
+Converter innerConverter = new KeyValueGroupConverter(mutator, type, 
options, containsCorruptedDates);
+converters.add(innerConverter);
+  }
+
+  @Override
+  public void start() {
+writer.start();
+  }
+
+  @Override
+  public void end() {
+writer.end();
+  }
+
+  private class KeyValueGroupConverter extends DrillParquetGroupConverter {
+
+private static final int INDEX_KEY = 0;
+private static final int INDEX_VALUE = 1;
+
+KeyValueGroupConverter(OutputMutator mutator, GroupType schema, 
OptionManager options,
+   ParquetReaderUtility.DateCorruptionStatus 
containsCorruptedDates) {
+  super(mutator, writer, options, containsCorruptedDates);
+
+  Type keyType = schema.getType(INDEX_KEY);
+  if (!keyType.isPrimitive()) {
+throw new DrillRuntimeException("Map supports primitive key only. 
Found: " + keyType);
 
 Review comment:
   Map or Dict?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector 

[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914167#comment-16914167
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317078013
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/record/metadata/AbstractMapColumnMetadata.java
 ##
 @@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.metadata;
+
+import org.apache.drill.common.types.TypeProtos.DataMode;
+import org.apache.drill.common.types.TypeProtos.MajorType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.MaterializedField;
+
+import java.util.stream.Collectors;
+
+/**
+ * Describes a base column type for map, dict, repeated map and repeated dict. 
All are tuples that have a tuple
+ * schema as part of the column definition.
+ */
+public abstract class AbstractMapColumnMetadata extends AbstractColumnMetadata 
{
+
+  TupleMetadata parentTuple;
 
 Review comment:
   Why not protected?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914165#comment-16914165
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317078387
 
 

 ##
 File path: 
exec/vector/src/main/java/org/apache/drill/exec/vector/complex/AbstractRepeatedMapVector.java
 ##
 @@ -0,0 +1,524 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.vector.complex;
+
+import io.netty.buffer.DrillBuf;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.commons.lang3.ArrayUtils;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.expr.BasicTypeHelper;
+import org.apache.drill.exec.expr.holders.RepeatedValueHolder;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.memory.AllocationManager.BufferLedger;
+import org.apache.drill.exec.proto.UserBitShared.SerializedField;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.util.CallBack;
+import org.apache.drill.exec.vector.AddOrGetResult;
+import org.apache.drill.exec.vector.AllocationHelper;
+import org.apache.drill.exec.vector.SchemaChangeCallBack;
+import org.apache.drill.exec.vector.UInt4Vector;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.VectorDescriptor;
+
+public abstract class AbstractRepeatedMapVector extends AbstractMapVector 
implements RepeatedValueVector {
+
+  protected final UInt4Vector offsets; // offsets to start of each record 
(considering record indices are 0-indexed)
+  protected final EmptyValuePopulator emptyPopulator;
+
+  protected AbstractRepeatedMapVector(MaterializedField field, BufferAllocator 
allocator, CallBack callBack) {
+this(field, new UInt4Vector(BaseRepeatedValueVector.OFFSETS_FIELD, 
allocator), callBack);
+  }
+
+  protected AbstractRepeatedMapVector(MaterializedField field, UInt4Vector 
offsets, CallBack callBack) {
+super(field, offsets.getAllocator(), callBack);
+this.offsets = offsets;
+this.emptyPopulator = new EmptyValuePopulator(offsets);
+  }
+
+  @Override
+  public UInt4Vector getOffsetVector() { return offsets; }
+
+  @Override
+  public ValueVector getDataVector() {
+throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public  AddOrGetResult 
addOrGetVector(VectorDescriptor descriptor) {
+throw new UnsupportedOperationException();
+  }
+
+  @Override
+  public void setInitialCapacity(int numRecords) {
+offsets.setInitialCapacity(numRecords + 1);
+for (final ValueVector v : this) {
+  v.setInitialCapacity(numRecords * 
RepeatedValueVector.DEFAULT_REPEAT_PER_RECORD);
+}
+  }
+
+  public void allocateNew(int groupCount, int innerValueCount) {
+clear();
+try {
+  allocateOffsetsNew(groupCount);
+  for (ValueVector v : getChildren()) {
+AllocationHelper.allocatePrecomputedChildCount(v, groupCount, 50, 
innerValueCount);
+  }
+} catch (OutOfMemoryException e){
 
 Review comment:
   ```suggestion
   } catch (OutOfMemoryException e) {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map 

[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914157#comment-16914157
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317075408
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/MetadataVersion.java
 ##
 @@ -145,10 +145,16 @@ public int compareTo(MetadataVersion o) {
  */
 public static final String V3_3 = "3.3";
 
-  /**
-   *  Version 4.0: Split the metadata cache file into summary and file metadata
-   */
-  public static final String V4 = "4.0";
+/**
+ *  Version 4.0: Split the metadata cache file into summary and file 
metadata
+ */
+public static final String V4 = "4.0";
+
+/**
+ *  Version 4.1: Added parents' original types in {@link 
Metadata_V4.ColumnTypeMetadata_v4}
 
 Review comment:
   Please provide an example...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914160#comment-16914160
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317079385
 
 

 ##
 File path: 
logical/src/main/java/org/apache/drill/common/expression/SchemaPath.java
 ##
 @@ -74,6 +74,16 @@ public static SchemaPath getCompoundPath(String... strings) 
{
 return new SchemaPath(s);
   }
 
+  public static SchemaPath getCompoundPath(int to, String... strings) {
 
 Review comment:
   Please add java doc with example.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914155#comment-16914155
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317073940
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordWriter.java
 ##
 @@ -85,6 +84,22 @@
   private static final int MAXIMUM_RECORD_COUNT_FOR_CHECK = 1;
   private static final int BLOCKSIZE_MULTIPLE = 64 * 1024;
 
+  /**
+   * Name of nested group for Parquet's {@code MAP} type.
+   * @see https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps;>MAP
 logical type
+   */
+  private static final String GROUP_KEY_VALUE_NAME = "key_value";
+  /**
+   * Name of nested group for Parquet's {@code LIST} type.
+   * @see https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#lists;>LIST
 logical type
+   */
+  private static final String GROUP_LIST_NAME = "list";
 
 Review comment:
   Recently Igor was adding similar constants in 
https://github.com/apache/drill/pull/1844, please revisit.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914152#comment-16914152
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317072548
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java
 ##
 @@ -116,7 +116,7 @@ public boolean load(RecordBatchDef def, DrillBuf buf) 
throws SchemaChangeExcepti
 
 // If the field is a map, check if the map schema changed.
 
 Review comment:
   Update comment to reflect new logic.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914156#comment-16914156
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317075152
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetTableMetadataUtils.java
 ##
 @@ -573,19 +585,30 @@ public static OriginalType 
getOriginalType(MetadataBase.ParquetTableMetadataBase
 for (MetadataBase.ParquetFileMetadata file : 
parquetTableMetadata.getFiles()) {
   // row groups in the file have the same schema, so using the first one
   Map fileColumns = 
getFileFields(parquetTableMetadata, file);
-  fileColumns.forEach((columnPath, type) -> {
-TypeProtos.MajorType majorType = columns.get(columnPath);
-if (majorType == null) {
-  columns.put(columnPath, type);
-} else {
-  TypeProtos.MinorType leastRestrictiveType = 
TypeCastRules.getLeastRestrictiveType(Arrays.asList(majorType.getMinorType(), 
type.getMinorType()));
-  if (leastRestrictiveType != majorType.getMinorType()) {
-columns.put(columnPath, type);
-  }
-}
-  });
+  fileColumns.forEach((columnPath, type) -> putType(columns, columnPath, 
type));
+}
+return columns;
+  }
+
+  static Map 
resolveIntermediateFields(MetadataBase.ParquetTableMetadataBase 
parquetTableMetadata) {
+LinkedHashMap columns = new 
LinkedHashMap<>();
 
 Review comment:
   Can we use `Map columns = ...`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914158#comment-16914158
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317077225
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/record/vector/TestDictVector.java
 ##
 @@ -0,0 +1,460 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.record.vector;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.categories.VectorTest;
+import org.apache.drill.common.config.DrillConfig;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.exec.ExecTest;
+import org.apache.drill.exec.expr.holders.NullableBigIntHolder;
+import org.apache.drill.exec.memory.BufferAllocator;
+import org.apache.drill.exec.memory.RootAllocatorFactory;
+import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatchLoader;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.WritableBatch;
+import org.apache.drill.exec.vector.complex.DictVector;
+import org.apache.drill.exec.vector.complex.impl.SingleDictWriter;
+import org.apache.drill.exec.vector.complex.reader.BaseReader;
+import org.apache.drill.exec.vector.complex.writer.BaseWriter;
+import org.apache.drill.test.TestBuilder;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import org.junit.rules.ExpectedException;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+
+import static org.hamcrest.CoreMatchers.containsString;
+import static org.junit.Assert.assertEquals;
+
+@Category(VectorTest.class)
+public class TestDictVector extends ExecTest {
+
+  @Rule
+  public ExpectedException thrown = ExpectedException.none();
+
+  private BufferAllocator allocator;
+
+  @Before
+  public void setUp() {
+allocator = RootAllocatorFactory.newRoot(DrillConfig.create());
+  }
+
+  @After
+  public void tearDown(){
+allocator.close();
+  }
+
+  @Test
+  public void testVectorCreation() {
+MaterializedField field = MaterializedField.create("map", DictVector.TYPE);
+try (DictVector mapVector = new DictVector(field, allocator, null)) {
+  mapVector.allocateNew();
+
+  List> maps = Arrays.asList(
+  TestBuilder.mapOfObject(4f, 1L, 5.3f, 2L, 0.3f, 3L, -0.2f, 4L, 
102.07f, 5L),
+  TestBuilder.mapOfObject(45f, 6L, 9.2f, 7L),
+  TestBuilder.mapOfObject(4.01f, 8L, 9.2f, 9L, -2.3f, 10L),
+  TestBuilder.mapOfObject(),
+  TestBuilder.mapOfObject(11f, 11L, 9.73f, 12L, 0.03f, 13L)
+  );
+
+  BaseWriter.DictWriter mapWriter = new SingleDictWriter(mapVector, null);
+  int index = 0;
+  for (Map map : maps) {
+mapWriter.setPosition(index++);
+mapWriter.start();
+for (Map.Entry entry : map.entrySet()) {
+  mapWriter.startKeyValuePair();
+  mapWriter.float4(DictVector.FIELD_KEY_NAME).writeFloat4((float) 
entry.getKey());
+  mapWriter.bigInt(DictVector.FIELD_VALUE_NAME).writeBigInt((long) 
entry.getValue());
+  mapWriter.endKeyValuePair();
+}
+mapWriter.end();
+  }
+
+  BaseReader.DictReader mapReader = mapVector.getReader();
+  index = 0;
+  for (Map map : maps) {
+mapReader.setPosition(index++);
+for (Map.Entry entry : map.entrySet()) {
+  mapReader.next();
+  Float actualKey = 
mapReader.reader(DictVector.FIELD_KEY_NAME).readFloat();
+  Long actualValue = 
mapReader.reader(DictVector.FIELD_VALUE_NAME).readLong();
+  Assert.assertEquals(entry.getKey(), actualKey);
 
 Review comment:
   Use static imports...
 

This is an automated message from the 

[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914161#comment-16914161
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317076122
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/metadata/Metadata_V4.java
 ##
 @@ -400,11 +405,23 @@ public Object deserializeKey(String key, 
com.fasterxml.jackson.databind.Deserial
* Note: Since the structure of column metadata hasn't changes from v3, 
ColumnMetadata_v4 extends ColumnMetadata_v3
*/
   public static class ColumnMetadata_v4 extends Metadata_V3.ColumnMetadata_v3 {
+
+List originalTypes;
+
 public ColumnMetadata_v4() {
 }
 
 public ColumnMetadata_v4(String[] name, PrimitiveType.PrimitiveTypeName 
primitiveType, Object minValue, Object maxValue, Long nulls) {
+  this(name, primitiveType, minValue, maxValue, nulls, new ArrayList<>());
+}
+
+public ColumnMetadata_v4(String[] name, PrimitiveType.PrimitiveTypeName 
primitiveType, Object minValue, Object maxValue, Long nulls, List 
originalTypes) {
   super(name, primitiveType, minValue, maxValue, nulls);
+  this.originalTypes = Collections.unmodifiableList(originalTypes);
 
 Review comment:
   `this.parentTypes = new ArrayList<>(parentTypes);` you wrap into list, here 
you wrap in unmodifiable... looks like different approach for similar cases...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914154#comment-16914154
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317073646
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java
 ##
 @@ -713,4 +715,63 @@ public static boolean 
containsComplexColumn(ParquetMetadata footer, ListNOTE: current implementation cares about {@link OriginalType#MAP} 
only
+   * converting it to {@link 
org.apache.drill.common.types.TypeProtos.MinorType#DICT}.
+   * Other original types are converted to {@code null}.
+   *
+   * @param originalTypes list of Parquet's types
+   * @return list containing either {@code null} or type with minor
+   * type {@link 
org.apache.drill.common.types.TypeProtos.MinorType#DICT} values
+   */
+  public static List getComplexTypes(List 
originalTypes) {
+List result = new ArrayList<>();
+if (originalTypes == null) {
+  return result;
+}
+for (OriginalType type : originalTypes) {
+  if (type == OriginalType.MAP) {
+TypeProtos.MajorType drillType = TypeProtos.MajorType.newBuilder()
+.setMinorType(TypeProtos.MinorType.DICT)
+.setMode(TypeProtos.DataMode.OPTIONAL)
+.build();
+result.add(drillType);
+  } else {
+result.add(null);
+  }
+}
+
+return result;
+  }
+
+  /**
+   * Checks whether group field matches pattern for Logical Map type:
+   *
+   *  group  (MAP) {
 
 Review comment:
   Use proper html formatting.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914159#comment-16914159
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317077102
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/FieldIdUtil.java
 ##
 @@ -196,18 +206,48 @@ public static TypedFieldId getFieldId(ValueVector 
vector, int id, SchemaPath exp
 return getFieldIdIfMatchesUnion((UnionVector) vector, builder, false, 
seg.getChild());
   }
 } else if (vector instanceof ListVector) {
-  ListVector list = (ListVector) vector;
   builder.intermediateType(vector.getField().getType());
   builder.addId(id);
-  return getFieldIdIfMatches(list, builder, true, 
expectedPath.getRootSegment().getChild());
-} else
-if (vector instanceof AbstractContainerVector) {
+  return getFieldIdIfMatches(vector, builder, true, 
expectedPath.getRootSegment().getChild(), 1);
+} else if (vector instanceof DictVector) {
+  MajorType vectorType = vector.getField().getType();
+  builder.intermediateType(vectorType);
+  builder.addId(id);
+  if (seg.isLastPath()) {
+builder.finalType(vectorType);
+return builder.build();
+  } else {
+PathSegment child = seg.getChild();
+builder.remainder(child);
+return getFieldIdIfMatches(vector, builder, false, 
expectedPath.getRootSegment().getChild(), 0);
+  }
+} else if (vector instanceof AbstractContainerVector) {
   // we're looking for a multi path.
-  AbstractContainerVector c = (AbstractContainerVector) vector;
   builder.intermediateType(vector.getField().getType());
   builder.addId(id);
-  return getFieldIdIfMatches(c, builder, true, 
expectedPath.getRootSegment().getChild());
-
+  return getFieldIdIfMatches(vector, builder, true, 
expectedPath.getRootSegment().getChild(), 1);
+} else if (vector instanceof RepeatedDictVector) {
+  MajorType vectorType = vector.getField().getType();
+  builder.intermediateType(vectorType);
+  builder.addId(id);
+  if (seg.isLastPath()) {
+builder.finalType(vectorType);
+return builder.build();
+  } else {
+PathSegment child = seg.getChild();
+if (!child.isArray()) {
+  // repeated map is accessed not by index, ignore?
 
 Review comment:
   ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7253) Read Hive struct w/o nulls

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914120#comment-16914120
 ] 

ASF GitHub Bot commented on DRILL-7253:
---

ihuzenko commented on pull request #1847: DRILL-7253: Read Hive struct w/o nulls
URL: https://github.com/apache/drill/pull/1847
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Read Hive struct w/o nulls
> --
>
> Key: DRILL-7253
> URL: https://issues.apache.org/jira/browse/DRILL-7253
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.17.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (DRILL-7359) Add support for DICT type in RowSet Framework

2019-08-23 Thread Bohdan Kazydub (Jira)
Bohdan Kazydub created DRILL-7359:
-

 Summary: Add support for DICT type in RowSet Framework
 Key: DRILL-7359
 URL: https://issues.apache.org/jira/browse/DRILL-7359
 Project: Apache Drill
  Issue Type: New Feature
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


Add support for new DICT data type (see DRILL-7096) in RowSet Framework



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914074#comment-16914074
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317043649
 
 

 ##
 File path: exec/java-exec/src/main/codegen/templates/AbstractRecordWriter.java
 ##
 @@ -71,6 +71,16 @@ public FieldConverter getNewRepeatedListConverter(int 
fieldId, String fieldName,
 throw new UnsupportedOperationException("Doesn't support writing 
RepeatedList");
   }
 
+  @Override
+  public FieldConverter getNewDictConverter(int fieldId, String fieldName, 
FieldReader reader) {
+throw new UnsupportedOperationException("Doesn't support writing TrueMap");
 
 Review comment:
   Dict?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914073#comment-16914073
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

arina-ielchiieva commented on pull request #1829: DRILL-7096: Develop vector 
for canonical Map
URL: https://github.com/apache/drill/pull/1829#discussion_r317043601
 
 

 ##
 File path: exec/java-exec/src/main/codegen/templates/AbstractRecordWriter.java
 ##
 @@ -71,6 +71,16 @@ public FieldConverter getNewRepeatedListConverter(int 
fieldId, String fieldName,
 throw new UnsupportedOperationException("Doesn't support writing 
RepeatedList");
   }
 
+  @Override
+  public FieldConverter getNewDictConverter(int fieldId, String fieldName, 
FieldReader reader) {
+throw new UnsupportedOperationException("Doesn't support writing TrueMap");
+  }
+
+  @Override
+  public FieldConverter getNewRepeatedDictConverter(int fieldId, String 
fieldName, FieldReader reader) {
+throw new UnsupportedOperationException("Doesn't support writing 
RepeatedTrueMap");
 
 Review comment:
   Repeated Dict?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (DRILL-7096) Develop vector for canonical Map

2019-08-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914050#comment-16914050
 ] 

ASF GitHub Bot commented on DRILL-7096:
---

KazydubB commented on issue #1829: DRILL-7096: Develop vector for canonical 
Map
URL: https://github.com/apache/drill/pull/1829#issuecomment-524225454
 
 
   @ihuzenko I've addressed review comments, please take a look
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Develop vector for canonical Map
> -
>
> Key: DRILL-7096
> URL: https://issues.apache.org/jira/browse/DRILL-7096
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Igor Guzenko
>Assignee: Bohdan Kazydub
>Priority: Major
> Fix For: 1.17.0
>
>
> Canonical Map datatype can be represented using combination of three 
> value vectors:
> keysVector - vector for storing keys of each map
> valuesVector - vector for storing values of each map
> offsetsVector - vector for storing of start indexes of next each map
> So it's not very hard to create such Map vector, but there is a major issue 
> with such map representation. It's hard to search maps values by key in such 
> vector, need to investigate some advanced techniques to make such search 
> efficient. Or find other more suitable options to represent map datatype in 
> world of vectors.
> After question about maps, Apache Arrow developers responded that for Java 
> they don't have real Map vector, for now they just have logical Map type 
> definition where they define Map like: List< Struct value:value_type> >. So implementation of value vector would be useful for 
> Arrow too.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)