[jira] [Commented] (HIVE-21240) JSON SerDe Re-Write

2019-04-19 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822250#comment-16822250
 ] 

slim bouguerra commented on HIVE-21240:
---

I have left some comments, can you please address those and re-update the Pull 
request and patch.

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.11.patch, 
> HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.2.patch, 
> HIVE-21240.3.patch, HIVE-21240.4.patch, HIVE-21240.5.patch, 
> HIVE-21240.6.patch, HIVE-21240.7.patch, HIVE-21240.9.patch, 
> HIVE-24240.8.patch, kafka_storage_handler.diff
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21240) JSON SerDe Re-Write

2019-04-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?focusedWorklogId=230306&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230306
 ]

ASF GitHub Bot logged work on HIVE-21240:
-

Author: ASF GitHub Bot
Created on: 19/Apr/19 22:59
Start Date: 19/Apr/19 22:59
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #530: HIVE-21240: JSON 
SerDe Deserialize Re-Write
URL: https://github.com/apache/hive/pull/530#discussion_r277108764
 
 

 ##
 File path: 
serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java
 ##
 @@ -0,0 +1,532 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.serde2.json;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.charset.StandardCharsets;
+import java.time.ZoneId;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Map.Entry;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.common.type.HiveChar;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.common.type.HiveVarchar;
+import org.apache.hadoop.hive.common.type.Timestamp;
+import org.apache.hadoop.hive.common.type.TimestampTZ;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import org.apache.hadoop.hive.serde2.typeinfo.BaseCharTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TimestampLocalTZTypeInfo;
+import org.apache.hive.common.util.TimestampParser;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.node.JsonNodeType;
+import com.fasterxml.jackson.databind.node.TextNode;
+import com.google.common.base.Preconditions;
+
+/**
+ * This class converts JSON strings into Java or Hive Primitive objects.
+ *
+ * Support types are:
+ * 
+ * 
+ * 
+ * JSON Type
+ * Java Type
+ * Notes
+ * 
+ * 
+ * Object
+ * java.util.List
+ * Each element may be different type
+ * 
+ * 
+ * Array
+ * java.util.List
+ * Each element is same type
+ * 
+ * 
+ * Map
+ * java.util.Map
+ * Keys must be same primitive type; every value is the same type
+ * 
+ * 
+ */
+public class HiveJsonReader {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(HiveJsonReader.class);
+
+  private final Map, StructField> 
discoveredFields =
+  new HashMap<>();
+
+  private final Set> 
discoveredUnknownFields =
+  new HashSet<>();
+
+  private final EnumSet features = EnumSet.noneOf(Feature.class);
+
+  private final ObjectMapper objectMapper;
+
+  private final TimestampParser tsParser;
+  private BinaryEncoding binaryEncoding;
+  private final ObjectInspector oi;
+
+  /**
+   * Enumeration that defines all on/off features for this reader.
+   */
+  public enum Feature {
+COL_INDEX_PARSING, PRIMITIVE_TO_WRITABLE, IGNORE_UKNOWN_FIELDS
 
 Review comment:
   can you please document those features?
 

This is an automated message from the Apache 

[jira] [Work logged] (HIVE-21240) JSON SerDe Re-Write

2019-04-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?focusedWorklogId=230307&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230307
 ]

ASF GitHub Bot logged work on HIVE-21240:
-

Author: ASF GitHub Bot
Created on: 19/Apr/19 23:00
Start Date: 19/Apr/19 23:00
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #530: HIVE-21240: JSON 
SerDe Deserialize Re-Write
URL: https://github.com/apache/hive/pull/530#discussion_r277108929
 
 

 ##
 File path: 
serde/src/java/org/apache/hadoop/hive/serde2/json/HiveJsonReader.java
 ##
 @@ -0,0 +1,532 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.serde2.json;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.charset.StandardCharsets;
+import java.time.ZoneId;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Map.Entry;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.hadoop.hive.common.type.Date;
+import org.apache.hadoop.hive.common.type.HiveChar;
+import org.apache.hadoop.hive.common.type.HiveDecimal;
+import org.apache.hadoop.hive.common.type.HiveVarchar;
+import org.apache.hadoop.hive.common.type.Timestamp;
+import org.apache.hadoop.hive.common.type.TimestampTZ;
+import org.apache.hadoop.hive.serde2.SerDeException;
+import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructField;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import org.apache.hadoop.hive.serde2.typeinfo.BaseCharTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.PrimitiveTypeInfo;
+import org.apache.hadoop.hive.serde2.typeinfo.TimestampLocalTZTypeInfo;
+import org.apache.hive.common.util.TimestampParser;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.fasterxml.jackson.databind.JsonNode;
+import com.fasterxml.jackson.databind.ObjectMapper;
+import com.fasterxml.jackson.databind.node.JsonNodeType;
+import com.fasterxml.jackson.databind.node.TextNode;
+import com.google.common.base.Preconditions;
+
+/**
+ * This class converts JSON strings into Java or Hive Primitive objects.
+ *
+ * Support types are:
+ * 
+ * 
+ * 
+ * JSON Type
+ * Java Type
+ * Notes
+ * 
+ * 
+ * Object
+ * java.util.List
+ * Each element may be different type
+ * 
+ * 
+ * Array
+ * java.util.List
+ * Each element is same type
+ * 
+ * 
+ * Map
+ * java.util.Map
+ * Keys must be same primitive type; every value is the same type
+ * 
+ * 
+ */
+public class HiveJsonReader {
+
+  private static final Logger LOG =
+  LoggerFactory.getLogger(HiveJsonReader.class);
+
+  private final Map, StructField> 
discoveredFields =
+  new HashMap<>();
+
+  private final Set> 
discoveredUnknownFields =
+  new HashSet<>();
+
+  private final EnumSet features = EnumSet.noneOf(Feature.class);
+
+  private final ObjectMapper objectMapper;
+
+  private final TimestampParser tsParser;
+  private BinaryEncoding binaryEncoding;
+  private final ObjectInspector oi;
+
+  /**
+   * Enumeration that defines all on/off features for this reader.
+   */
+  public enum Feature {
+COL_INDEX_PARSING, PRIMITIVE_TO_WRITABLE, IGNORE_UKNOWN_FIELDS
+  }
+
+  /**
+   * Constructor with default the Hive default timestamp parser.
+   *
+   * @param oi ObjectInspector for all the fields in the JSON object
+   */
+  public H

[jira] [Work logged] (HIVE-21240) JSON SerDe Re-Write

2019-04-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?focusedWorklogId=230303&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230303
 ]

ASF GitHub Bot logged work on HIVE-21240:
-

Author: ASF GitHub Bot
Created on: 19/Apr/19 22:54
Start Date: 19/Apr/19 22:54
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #530: HIVE-21240: JSON 
SerDe Deserialize Re-Write
URL: https://github.com/apache/hive/pull/530#discussion_r277108220
 
 

 ##
 File path: serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java
 ##
 @@ -142,227 +197,21 @@ public Object deserialize(Writable blob) throws 
SerDeException {
* and generate a Text representation of the object.
*/
   @Override
-  public Writable serialize(Object obj, ObjectInspector objInspector)
-throws SerDeException {
-StringBuilder sb = new StringBuilder();
-try {
+  public Writable serialize(final Object obj,
 
 Review comment:
   is this split on 2 lines for reason or it is github rendering ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230303)
Time Spent: 1h  (was: 50m)

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.11.patch, 
> HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.2.patch, 
> HIVE-21240.3.patch, HIVE-21240.4.patch, HIVE-21240.5.patch, 
> HIVE-21240.6.patch, HIVE-21240.7.patch, HIVE-21240.9.patch, 
> HIVE-24240.8.patch, kafka_storage_handler.diff
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21240) JSON SerDe Re-Write

2019-04-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?focusedWorklogId=230302&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230302
 ]

ASF GitHub Bot logged work on HIVE-21240:
-

Author: ASF GitHub Bot
Created on: 19/Apr/19 22:53
Start Date: 19/Apr/19 22:53
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #530: HIVE-21240: JSON 
SerDe Deserialize Re-Write
URL: https://github.com/apache/hive/pull/530#discussion_r277108143
 
 

 ##
 File path: serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java
 ##
 @@ -63,76 +43,151 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-@SerDeSpec(schemaProps = {serdeConstants.LIST_COLUMNS,
-serdeConstants.LIST_COLUMN_TYPES,
-serdeConstants.TIMESTAMP_FORMATS })
-
+/**
+ * Hive SerDe for processing JSON formatted data. This is typically paired with
+ * the TextInputFormat and therefore each line provided to this SerDe must be a
+ * single, and complete JSON object.
+ * Example
+ * 
+ * {"name="john","age"=30}
+ * {"name="sue","age"=32}
+ * 
+ */
+@SerDeSpec(schemaProps = { serdeConstants.LIST_COLUMNS,
+serdeConstants.LIST_COLUMN_TYPES, serdeConstants.TIMESTAMP_FORMATS,
+JsonSerDe.BINARY_FORMAT, JsonSerDe.IGNORE_EXTRA })
 public class JsonSerDe extends AbstractSerDe {
 
   private static final Logger LOG = LoggerFactory.getLogger(JsonSerDe.class);
+
+  public static final String BINARY_FORMAT = "json.binary.format";
+  public static final String IGNORE_EXTRA = "text.ignore.extra.fields";
+  public static final String NULL_EMPTY_LINES = "text.null.empty.line";
+
   private List columnNames;
 
-  private HiveJsonStructReader structReader;
+  private BinaryEncoding binaryEncoding;
+  private boolean nullEmptyLines;
+
+  private HiveJsonReader jsonReader;
+  private HiveJsonWriter jsonWriter;
   private StructTypeInfo rowTypeInfo;
+  private StructObjectInspector soi;
 
+  /**
+   * Initialize the SerDe. By default, items being deserialized are expected to
+   * be wrapped in Hadoop Writable objects and objects being serialized are
+   * expected to be Java primitive objects.
+   */
   @Override
-  public void initialize(Configuration conf, Properties tbl)
-throws SerDeException {
-List columnTypes;
+  public void initialize(final Configuration conf, final Properties tbl)
+  throws SerDeException {
+initialize(conf, tbl, true);
+  }
+
+  /**
+   * Initialize the SerDe.
+   *
+   * @param conf System properties; can be null in compile time
+   * @param tbl table properties
+   * @param writeablePrimitivesDeserialize true if outputs are Hadoop Writable
+   */
+  public void initialize(final Configuration conf, final Properties tbl,
+  final boolean writeablePrimitivesDeserialize) {
+
 LOG.debug("Initializing JsonSerDe: {}", tbl.entrySet());
 
 // Get column names
-String columnNameProperty = tbl.getProperty(serdeConstants.LIST_COLUMNS);
-final String columnNameDelimiter = 
tbl.containsKey(serdeConstants.COLUMN_NAME_DELIMITER) ? tbl
-.getProperty(serdeConstants.COLUMN_NAME_DELIMITER)
-  : String.valueOf(SerDeUtils.COMMA);
-// all table column names
-if (columnNameProperty.isEmpty()) {
-  columnNames = Collections.emptyList();
-} else {
-  columnNames = 
Arrays.asList(columnNameProperty.split(columnNameDelimiter));
-}
+final String columnNameProperty =
+tbl.getProperty(serdeConstants.LIST_COLUMNS);
+final String columnNameDelimiter = tbl.getProperty(
+serdeConstants.COLUMN_NAME_DELIMITER, 
String.valueOf(SerDeUtils.COMMA));
+
+this.columnNames = columnNameProperty.isEmpty() ? Collections.emptyList()
+: Arrays.asList(columnNameProperty.split(columnNameDelimiter));
 
 // all column types
-String columnTypeProperty = 
tbl.getProperty(serdeConstants.LIST_COLUMN_TYPES);
-if (columnTypeProperty.isEmpty()) {
-  columnTypes = Collections.emptyList();
-} else {
-  columnTypes = 
TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
-}
+final String columnTypeProperty =
+tbl.getProperty(serdeConstants.LIST_COLUMN_TYPES);
+
+final List columnTypes =
+columnTypeProperty.isEmpty() ? Collections.emptyList()
+: TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
 
 LOG.debug("columns: {}, {}", columnNameProperty, columnNames);
 LOG.debug("types: {}, {} ", columnTypeProperty, columnTypes);
 
 assert (columnNames.size() == columnTypes.size());
 
-rowTypeInfo = (StructTypeInfo) 
TypeInfoFactory.getStructTypeInfo(columnNames, columnTypes);
+final String nullEmpty = tbl.getProperty(NULL_EMPTY_LINES, "false");
+this.nullEmptyLines = Boolean.parseBoolean(nullEmpty);
+
+this.rowTypeInfo = (StructTypeInfo) TypeInfoFactory
+.getStructTypeInfo(columnNames, columnTypes);
+
+this.soi 

[jira] [Work logged] (HIVE-21240) JSON SerDe Re-Write

2019-04-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?focusedWorklogId=230300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230300
 ]

ASF GitHub Bot logged work on HIVE-21240:
-

Author: ASF GitHub Bot
Created on: 19/Apr/19 22:51
Start Date: 19/Apr/19 22:51
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #530: HIVE-21240: JSON 
SerDe Deserialize Re-Write
URL: https://github.com/apache/hive/pull/530#discussion_r277107870
 
 

 ##
 File path: serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java
 ##
 @@ -63,76 +43,151 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-@SerDeSpec(schemaProps = {serdeConstants.LIST_COLUMNS,
-serdeConstants.LIST_COLUMN_TYPES,
-serdeConstants.TIMESTAMP_FORMATS })
-
+/**
+ * Hive SerDe for processing JSON formatted data. This is typically paired with
+ * the TextInputFormat and therefore each line provided to this SerDe must be a
+ * single, and complete JSON object.
+ * Example
+ * 
+ * {"name="john","age"=30}
+ * {"name="sue","age"=32}
+ * 
+ */
+@SerDeSpec(schemaProps = { serdeConstants.LIST_COLUMNS,
+serdeConstants.LIST_COLUMN_TYPES, serdeConstants.TIMESTAMP_FORMATS,
+JsonSerDe.BINARY_FORMAT, JsonSerDe.IGNORE_EXTRA })
 public class JsonSerDe extends AbstractSerDe {
 
   private static final Logger LOG = LoggerFactory.getLogger(JsonSerDe.class);
+
+  public static final String BINARY_FORMAT = "json.binary.format";
+  public static final String IGNORE_EXTRA = "text.ignore.extra.fields";
+  public static final String NULL_EMPTY_LINES = "text.null.empty.line";
+
   private List columnNames;
 
-  private HiveJsonStructReader structReader;
+  private BinaryEncoding binaryEncoding;
+  private boolean nullEmptyLines;
+
+  private HiveJsonReader jsonReader;
+  private HiveJsonWriter jsonWriter;
   private StructTypeInfo rowTypeInfo;
+  private StructObjectInspector soi;
 
+  /**
+   * Initialize the SerDe. By default, items being deserialized are expected to
+   * be wrapped in Hadoop Writable objects and objects being serialized are
+   * expected to be Java primitive objects.
+   */
   @Override
-  public void initialize(Configuration conf, Properties tbl)
-throws SerDeException {
-List columnTypes;
+  public void initialize(final Configuration conf, final Properties tbl)
+  throws SerDeException {
+initialize(conf, tbl, true);
+  }
+
+  /**
+   * Initialize the SerDe.
+   *
+   * @param conf System properties; can be null in compile time
+   * @param tbl table properties
+   * @param writeablePrimitivesDeserialize true if outputs are Hadoop Writable
+   */
+  public void initialize(final Configuration conf, final Properties tbl,
+  final boolean writeablePrimitivesDeserialize) {
+
 LOG.debug("Initializing JsonSerDe: {}", tbl.entrySet());
 
 // Get column names
-String columnNameProperty = tbl.getProperty(serdeConstants.LIST_COLUMNS);
-final String columnNameDelimiter = 
tbl.containsKey(serdeConstants.COLUMN_NAME_DELIMITER) ? tbl
-.getProperty(serdeConstants.COLUMN_NAME_DELIMITER)
-  : String.valueOf(SerDeUtils.COMMA);
-// all table column names
-if (columnNameProperty.isEmpty()) {
-  columnNames = Collections.emptyList();
-} else {
-  columnNames = 
Arrays.asList(columnNameProperty.split(columnNameDelimiter));
-}
+final String columnNameProperty =
+tbl.getProperty(serdeConstants.LIST_COLUMNS);
+final String columnNameDelimiter = tbl.getProperty(
+serdeConstants.COLUMN_NAME_DELIMITER, 
String.valueOf(SerDeUtils.COMMA));
+
+this.columnNames = columnNameProperty.isEmpty() ? Collections.emptyList()
+: Arrays.asList(columnNameProperty.split(columnNameDelimiter));
 
 // all column types
-String columnTypeProperty = 
tbl.getProperty(serdeConstants.LIST_COLUMN_TYPES);
-if (columnTypeProperty.isEmpty()) {
-  columnTypes = Collections.emptyList();
-} else {
-  columnTypes = 
TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
-}
+final String columnTypeProperty =
+tbl.getProperty(serdeConstants.LIST_COLUMN_TYPES);
+
+final List columnTypes =
+columnTypeProperty.isEmpty() ? Collections.emptyList()
+: TypeInfoUtils.getTypeInfosFromTypeString(columnTypeProperty);
 
 LOG.debug("columns: {}, {}", columnNameProperty, columnNames);
 LOG.debug("types: {}, {} ", columnTypeProperty, columnTypes);
 
 assert (columnNames.size() == columnTypes.size());
 
-rowTypeInfo = (StructTypeInfo) 
TypeInfoFactory.getStructTypeInfo(columnNames, columnTypes);
+final String nullEmpty = tbl.getProperty(NULL_EMPTY_LINES, "false");
+this.nullEmptyLines = Boolean.parseBoolean(nullEmpty);
+
+this.rowTypeInfo = (StructTypeInfo) TypeInfoFactory
+.getStructTypeInfo(columnNames, columnTypes);
+
+this.soi 

[jira] [Work logged] (HIVE-21240) JSON SerDe Re-Write

2019-04-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?focusedWorklogId=230299&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230299
 ]

ASF GitHub Bot logged work on HIVE-21240:
-

Author: ASF GitHub Bot
Created on: 19/Apr/19 22:47
Start Date: 19/Apr/19 22:47
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #530: HIVE-21240: JSON 
SerDe Deserialize Re-Write
URL: https://github.com/apache/hive/pull/530#discussion_r277107457
 
 

 ##
 File path: ql/src/test/queries/clientpositive/kafka_storage_handler.q
 ##
 @@ -140,6 +140,7 @@ CREATE EXTERNAL TABLE kafka_table_2
 `country` string,`continent` string, `namespace` string, `newPage` boolean, 
`unpatrolled` boolean,
 `anonymous` boolean, `robot` boolean, added int, deleted int, delta bigint)
 STORED BY 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
+WITH SERDEPROPERTIES ("timestamp.formats"="-MM-dd\'T\'HH:mm:ss\'Z\'")
 
 Review comment:
   thanks for fixing this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230299)
Time Spent: 0.5h  (was: 20m)

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.11.patch, 
> HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.2.patch, 
> HIVE-21240.3.patch, HIVE-21240.4.patch, HIVE-21240.5.patch, 
> HIVE-21240.6.patch, HIVE-21240.7.patch, HIVE-21240.9.patch, 
> HIVE-24240.8.patch, kafka_storage_handler.diff
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (HIVE-21240) JSON SerDe Re-Write

2019-04-19 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21240?focusedWorklogId=230298&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-230298
 ]

ASF GitHub Bot logged work on HIVE-21240:
-

Author: ASF GitHub Bot
Created on: 19/Apr/19 22:46
Start Date: 19/Apr/19 22:46
Worklog Time Spent: 10m 
  Work Description: b-slim commented on pull request #530: HIVE-21240: JSON 
SerDe Deserialize Re-Write
URL: https://github.com/apache/hive/pull/530#discussion_r277107357
 
 

 ##
 File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFJsonRead.java
 ##
 @@ -156,10 +158,8 @@ public void testUndeclaredStructField() throws Exception {
   ObjectInspector[] arguments = buildArguments("struct");
   udf.initialize(arguments);
 
-  Object res = udf.evaluate(evalArgs("{\"b\":null}"));
-  assertTrue(res instanceof Object[]);
-  Object o[] = (Object[]) res;
-  assertEquals(null, o[0]);
+  // Invalid - should throw Exception
+  udf.evaluate(evalArgs("{\"b\":null}"));
 
 Review comment:
   am Not sure why this has changed ? seems like this is different from old 
behavior can you please explain more?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 230298)
Time Spent: 20m  (was: 10m)

> JSON SerDe Re-Write
> ---
>
> Key: HIVE-21240
> URL: https://issues.apache.org/jira/browse/HIVE-21240
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 4.0.0, 3.1.1
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-21240.1.patch, HIVE-21240.1.patch, 
> HIVE-21240.10.patch, HIVE-21240.11.patch, HIVE-21240.11.patch, 
> HIVE-21240.11.patch, HIVE-21240.11.patch, HIVE-21240.2.patch, 
> HIVE-21240.3.patch, HIVE-21240.4.patch, HIVE-21240.5.patch, 
> HIVE-21240.6.patch, HIVE-21240.7.patch, HIVE-21240.9.patch, 
> HIVE-24240.8.patch, kafka_storage_handler.diff
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The JSON SerDe has a few issues, I will link them to this JIRA.
> * Use Jackson Tree parser instead of manually parsing
> * Added support for base-64 encoded data (the expected format when using JSON)
> * Added support to skip blank lines (returns all columns as null values)
> * Current JSON parser accepts, but does not apply, custom timestamp formats 
> in most cases
> * Added some unit tests
> * Added cache for column-name to column-index searches, currently O\(n\) for 
> each row processed, for each column in the row



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21629) Monitor qtest progress realtime on console

2019-04-19 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822113#comment-16822113
 ] 

Vineet Garg commented on HIVE-21629:


I usually tail hive.log to see the progress. I agree that adding some realtime 
monitoring would be more elegant.

> Monitor qtest progress realtime on console
> --
>
> Key: HIVE-21629
> URL: https://issues.apache.org/jira/browse/HIVE-21629
> Project: Hive
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Priority: Major
>
> While running a qtest, or running multiple qtests with the same driver, 
> user/dev can only see top level message for a long time:
> {code}
> [INFO] Running org.apache.hadoop.hive.cli.TestCliDriver
> {code}
> It would be helpful to introduce a cli argument to enable some realtime 
> monitoring. The challenge basically is that the tests run in separate 
> surefire JVMs, which basically log into surefire report files, and everything 
> which is logged with System.out can be found in that file:
> {code}
> itests/qtest/target/surefire-reports/org.apache.hadoop.hive.cli.TestCliDriver-output.txt
> {code}
> so, I would expect the same or similar behavior that I would get if I tailed 
> this file in a separate terminal (tail -f ...)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21632) Hive should not push partition columns to the Parquet predicate, even if the data file contains the partition column

2019-04-19 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822110#comment-16822110
 ] 

Vineet Garg commented on HIVE-21632:


Duplicate of HIVE-21599?

> Hive should not push partition columns to the Parquet predicate, even if the 
> data file contains the partition column
> 
>
> Key: HIVE-21632
> URL: https://issues.apache.org/jira/browse/HIVE-21632
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Priority: Minor
>
> If there is a partitioned Parquet table in Hive, and the data file in one of 
> the partitions (not correctly) contains the partition column as well, 
> filtering on the partition column will return no rows if the Parquet 
> predicate pushdown is enabled. If the PPD is disabled, the rows will return 
> correctly.
> The reason why it doesn't work is that, if the PPD is switched on, Hive will 
> send the predicate 'partition_column= ...' to parquet and a requested schema 
> which doesn't contain the partition column. When the data is read from 
> parquet, this column will be skipped, because the requested schema doesn't 
> contain it, but it still tries to apply the filter predicate, so it will 
> return an empty result set.
> I think if the rows are returned correctly without PPD, they should be 
> returned with PPD as well. Hive should omit the partition column from the 
> Parquet predicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-20615) CachedStore: Background refresh thread bug fixes

2019-04-19 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-20615:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 4.0.0
   Status: Resolved  (was: Patch Available)

Unit test failures are not related. Patch pushed to master. Thanks Vaibhav!

> CachedStore: Background refresh thread bug fixes
> 
>
> Key: HIVE-20615
> URL: https://issues.apache.org/jira/browse/HIVE-20615
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 3.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Daniel Dai
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-20615.1.patch, HIVE-20615.1.patch, 
> HIVE-20615.1.patch, HIVE-20615.1.patch, HIVE-20615.1.patch, 
> HIVE-20615.1.patch, HIVE-21625.2.patch
>
>
> Regression introduced in HIVE-18264. Fixes background thread starting and 
> refreshing of the table cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21635) Break up DDLTask - extract Workload Management related operations

2019-04-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822006#comment-16822006
 ] 

Hive QA commented on HIVE-21635:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12966484/HIVE-21635.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 15939 tests 
executed
*Failed tests:*
{noformat}
TestReplAcidTablesWithJsonMessage - did not produce a TEST-*.xml file (likely 
timed out) (batchId=256)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_wm] 
(batchId=74)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[resourceplan]
 (batchId=174)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16998/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16998/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16998/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12966484 - PreCommit-HIVE-Build

> Break up DDLTask - extract Workload Management related operations
> -
>
> Key: HIVE-21635
> URL: https://issues.apache.org/jira/browse/HIVE-21635
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21635.01.patch
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #6: extract all the workload management related operations from the old 
> DDLTask, and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-21635) Break up DDLTask - extract Workload Management related operations

2019-04-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-21635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821976#comment-16821976
 ] 

Hive QA commented on HIVE-21635:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  4m  
9s{color} | {color:blue} ql in master has 2256 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
43s{color} | {color:red} ql: The patch generated 5 new + 291 unchanged - 22 
fixed = 296 total (was 313) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-16998/dev-support/hive-personality.sh
 |
| git revision | master / bb71ce5 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16998/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-16998/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Break up DDLTask - extract Workload Management related operations
> -
>
> Key: HIVE-21635
> URL: https://issues.apache.org/jira/browse/HIVE-21635
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21635.01.patch
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where b

[jira] [Updated] (HIVE-21635) Break up DDLTask - extract Workload Management related operations

2019-04-19 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21635:
--
Attachment: HIVE-21635.01.patch

> Break up DDLTask - extract Workload Management related operations
> -
>
> Key: HIVE-21635
> URL: https://issues.apache.org/jira/browse/HIVE-21635
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21635.01.patch
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #6: extract all the workload management related operations from the old 
> DDLTask, and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21635) Break up DDLTask - extract Workload Management related operations

2019-04-19 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21635:
--
Status: Patch Available  (was: Open)

> Break up DDLTask - extract Workload Management related operations
> -
>
> Key: HIVE-21635
> URL: https://issues.apache.org/jira/browse/HIVE-21635
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
> Attachments: HIVE-21635.01.patch
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #6: extract all the workload management related operations from the old 
> DDLTask, and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-21635) Break up DDLTask - extract Workload Management related operations

2019-04-19 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely updated HIVE-21635:
--
Description: 
DDLTask is a huge class, more than 5000 lines long. The related DDLWork is also 
a huge class, which has a field for each DDL operation it supports. The goal is 
to refactor these in order to have everything cut into more handleable classes 
under the package  org.apache.hadoop.hive.ql.exec.ddl:
 * have a separate class for each operation
 * have a package for each operation group (database ddl, table ddl, etc), so 
the amount of classes under a package is more manageable
 * make all the requests (DDLDesc subclasses) immutable
 * DDLTask should be agnostic to the actual operations
 * right now let's ignore the issue of having some operations handled by 
DDLTask which are not actual DDL operations (lock, unlock, desc...)

In the interim time when there are two DDLTask and DDLWork classes in the code 
base the new ones in the new package are called DDLTask2 and DDLWork2 thus 
avoiding the usage of fully qualified class names where both the old and the 
new classes are in use.

Step #6: extract all the workload management related operations from the old 
DDLTask, and move them under the new package.

  was:
DDLTask is a huge class, more than 5000 lines long. The related DDLWork is also 
a huge class, which has a field for each DDL operation it supports. The goal is 
to refactor these in order to have everything cut into more handleable classes 
under the package  org.apache.hadoop.hive.ql.exec.ddl:
 * have a separate class for each operation
 * have a package for each operation group (database ddl, table ddl, etc), so 
the amount of classes under a package is more manageable
 * make all the requests (DDLDesc subclasses) immutable
 * DDLTask should be agnostic to the actual operations
 * right now let's ignore the issue of having some operations handled by 
DDLTask which are not actual DDL operations (lock, unlock, desc...)

In the interim time when there are two DDLTask and DDLWork classes in the code 
base the new ones in the new package are called DDLTask2 and DDLWork2 thus 
avoiding the usage of fully qualified class names where both the old and the 
new classes are in use.

Step #5: extract all the privilege related operations from the old DDLTask, and 
move them under the new package.


> Break up DDLTask - extract Workload Management related operations
> -
>
> Key: HIVE-21635
> URL: https://issues.apache.org/jira/browse/HIVE-21635
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #6: extract all the workload management related operations from the old 
> DDLTask, and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-21635) Break up DDLTask - extract Workload Management related operations

2019-04-19 Thread Miklos Gergely (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Gergely reassigned HIVE-21635:
-


> Break up DDLTask - extract Workload Management related operations
> -
>
> Key: HIVE-21635
> URL: https://issues.apache.org/jira/browse/HIVE-21635
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 3.1.1
>Reporter: Miklos Gergely
>Assignee: Miklos Gergely
>Priority: Major
>  Labels: refactor-ddl
> Fix For: 4.0.0
>
>
> DDLTask is a huge class, more than 5000 lines long. The related DDLWork is 
> also a huge class, which has a field for each DDL operation it supports. The 
> goal is to refactor these in order to have everything cut into more 
> handleable classes under the package  org.apache.hadoop.hive.ql.exec.ddl:
>  * have a separate class for each operation
>  * have a package for each operation group (database ddl, table ddl, etc), so 
> the amount of classes under a package is more manageable
>  * make all the requests (DDLDesc subclasses) immutable
>  * DDLTask should be agnostic to the actual operations
>  * right now let's ignore the issue of having some operations handled by 
> DDLTask which are not actual DDL operations (lock, unlock, desc...)
> In the interim time when there are two DDLTask and DDLWork classes in the 
> code base the new ones in the new package are called DDLTask2 and DDLWork2 
> thus avoiding the usage of fully qualified class names where both the old and 
> the new classes are in use.
> Step #5: extract all the privilege related operations from the old DDLTask, 
> and move them under the new package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-20615) CachedStore: Background refresh thread bug fixes

2019-04-19 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-20615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821740#comment-16821740
 ] 

Hive QA commented on HIVE-20615:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12966436/HIVE-21625.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15940 tests 
executed
*Failed tests:*
{noformat}
TestReplicationScenariosAcidTables - did not produce a TEST-*.xml file (likely 
timed out) (batchId=258)
org.apache.hadoop.hive.ql.TestTxnCommandsWithSplitUpdateAndVectorization.testMergeOnTezEdges
 (batchId=318)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/16997/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16997/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16997/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12966436 - PreCommit-HIVE-Build

> CachedStore: Background refresh thread bug fixes
> 
>
> Key: HIVE-20615
> URL: https://issues.apache.org/jira/browse/HIVE-20615
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 3.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Daniel Dai
>Priority: Major
> Attachments: HIVE-20615.1.patch, HIVE-20615.1.patch, 
> HIVE-20615.1.patch, HIVE-20615.1.patch, HIVE-20615.1.patch, 
> HIVE-20615.1.patch, HIVE-21625.2.patch
>
>
> Regression introduced in HIVE-18264. Fixes background thread starting and 
> refreshing of the table cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)