[jira] [Work logged] (HIVE-24526) Get grouped locations of external table data using metatool.

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24526?focusedWorklogId=524869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524869
 ]

ASF GitHub Bot logged work on HIVE-24526:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 07:54
Start Date: 16/Dec/20 07:54
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1768:
URL: https://github.com/apache/hive/pull/1768#discussion_r544076223



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/metatool/MetaToolTaskListExtTblLocs.java
##
@@ -0,0 +1,439 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.tools.metatool;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.metastore.ObjectStore;
+import org.apache.hadoop.hive.metastore.TableType;
+import org.apache.hadoop.hive.metastore.Warehouse;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.thrift.TException;
+import org.codehaus.jettison.json.JSONException;
+import org.codehaus.jettison.json.JSONArray;
+import org.codehaus.jettison.json.JSONObject;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.util.*;

Review comment:
   Remove wild card import.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/metatool/MetaToolTaskListExtTblLocs.java
##
@@ -0,0 +1,439 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.metastore.tools.metatool;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.metastore.ObjectStore;
+import org.apache.hadoop.hive.metastore.TableType;
+import org.apache.hadoop.hive.metastore.Warehouse;
+import org.apache.hadoop.hive.metastore.api.Table;
+import org.apache.hadoop.hive.metastore.utils.MetaStoreUtils;
+import org.apache.thrift.TException;
+import org.codehaus.jettison.json.JSONException;
+import org.codehaus.jettison.json.JSONArray;
+import org.codehaus.jettison.json.JSONObject;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.util.*;
+
+public class MetaToolTaskListExtTblLocs extends MetaToolTask {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(MetaToolTaskListExtTblLocs.class);
+  private final HashMap> coverageList = new 
HashMap<>();

Review comment:
   Replace reference type with Map.

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/tools/metatool/MetaToolTaskListExtTblLocs.java
##
@@ -0,0 +1,439 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the 

[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24539?focusedWorklogId=524830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524830
 ]

ASF GitHub Bot logged work on HIVE-24539:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 04:52
Start Date: 16/Dec/20 04:52
Worklog Time Spent: 10m 
  Work Description: pgaref commented on pull request #1783:
URL: https://github.com/apache/hive/pull/1783#issuecomment-745762841


   Hey @abstractdog @maheshk114  can you please take a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524830)
Time Spent: 20m  (was: 10m)

> OrcInputFormat schema generation should respect column delimiter
> 
>
> Key: HIVE-24539
> URL: https://issues.apache.org/jira/browse/HIVE-24539
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24544) HBase Timestamp filter never gets converted to a timerange filter

2020-12-15 Thread Fabien Carrion (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabien Carrion updated HIVE-24544:
--
Description: 
When I was trying to apply a timestamp filter, I get the wrong data.

On a table like this

CREATE EXTERNAL TABLE t1 (key string, v string, ts timestamp) STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
("hbase.columns.mapping" = ":key,cf:v,:timestamp") TBLPROPERTIES 
("hbase.table.name" = "t1", "hbase.table.default.storage.type" = "binary", 
"external.table.purge" = "false");

A request such as:

select key, ts from t1 where ts >= '2020-12-01 00:00:00' and ts < '2020-12-02 
00:00:00';

returns values with ts < '2020-12-01 00:00:00'

After investigation, it looks like the timestamp filter is never used in the 
HiveHBaseTableInputFormat.getRecordReader method, which is used to create the 
actual mapreduce job.

But it used in the HiveHBaseTableInputFormat.getSplitsInternal method, which is 
used to create the mappings tasks.

So I copy the code from the second method in the first.

I attached a small patch. That's a little hacky and I am not sure I respect the 
philosophy of the component. But it works.

 

  was:
When I was trying to apply a timestamp filter, I get the wrong data. A request 
such as:

```

select key, ts from t1 where ts >= '2020-12-01 00:00:00' and ts < '2020-12-02 
00:00:00';

```

returns value with ts < '2020-12-01 00:00:00'

After investigation, it looks like the timestamp filter is never used in the 
HiveHBaseTableInputFormat.getRecordReader method, which is used to create the 
actual mapreduce job.

But it used in the HiveHBaseTableInputFormat.getSplitsInternal method, which is 
used to create the mappings tasks.

So I copy the code from the second method in the first.

I attached a small patch. That's a little hacky and I am not sure I respect the 
philosophy of the component. But it works.

 


> HBase Timestamp filter never gets converted to a timerange filter
> -
>
> Key: HIVE-24544
> URL: https://issues.apache.org/jira/browse/HIVE-24544
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Reporter: Fabien Carrion
>Priority: Minor
> Attachments: timerange.patch
>
>
> When I was trying to apply a timestamp filter, I get the wrong data.
> On a table like this
> CREATE EXTERNAL TABLE t1 (key string, v string, ts timestamp) STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
> ("hbase.columns.mapping" = ":key,cf:v,:timestamp") TBLPROPERTIES 
> ("hbase.table.name" = "t1", "hbase.table.default.storage.type" = "binary", 
> "external.table.purge" = "false");
> A request such as:
> select key, ts from t1 where ts >= '2020-12-01 00:00:00' and ts < '2020-12-02 
> 00:00:00';
> returns values with ts < '2020-12-01 00:00:00'
> After investigation, it looks like the timestamp filter is never used in the 
> HiveHBaseTableInputFormat.getRecordReader method, which is used to create the 
> actual mapreduce job.
> But it used in the HiveHBaseTableInputFormat.getSplitsInternal method, which 
> is used to create the mappings tasks.
> So I copy the code from the second method in the first.
> I attached a small patch. That's a little hacky and I am not sure I respect 
> the philosophy of the component. But it works.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=524797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524797
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 01:55
Start Date: 16/Dec/20 01:55
Worklog Time Spent: 10m 
  Work Description: dataproc-metastore opened a new pull request #1787:
URL: https://github.com/apache/hive/pull/1787


   Supersedes/Copy of #1740 and #1777 
   
   What changes were proposed in this pull request?
   Refactor HiveMetastore.HMSHandler into its own class
   Why are the changes needed?
   This will pave the way for cleaner changes since now we don't have the 
driver class nested with 10,000 line HMSHandler file so there is a clearer 
separation of duties.
   
   Does this PR introduce any user-facing change?
   No
   
   How was this patch tested?
   Existing unit tests, building/running manually
   Not additional tests were added since this was a pure refactoring
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524797)
Time Spent: 2h 50m  (was: 2h 40m)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24470) Separate HiveMetastore Thrift and Driver logic

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24470?focusedWorklogId=524796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524796
 ]

ASF GitHub Bot logged work on HIVE-24470:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 01:55
Start Date: 16/Dec/20 01:55
Worklog Time Spent: 10m 
  Work Description: dataproc-metastore closed pull request #1787:
URL: https://github.com/apache/hive/pull/1787


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524796)
Time Spent: 2h 40m  (was: 2.5h)

> Separate HiveMetastore Thrift and Driver logic
> --
>
> Key: HIVE-24470
> URL: https://issues.apache.org/jira/browse/HIVE-24470
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Cameron Moberg
>Assignee: Cameron Moberg
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In the file HiveMetastore.java the majority of the code is a thrift interface 
> rather than the actual logic behind starting hive metastore, this should be 
> moved out into a separate file to clean up the file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24208) LLAP: query job stuck due to race conditions

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24208?focusedWorklogId=524785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524785
 ]

ASF GitHub Bot logged work on HIVE-24208:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 00:52
Start Date: 16/Dec/20 00:52
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1534:
URL: https://github.com/apache/hive/pull/1534


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524785)
Time Spent: 50m  (was: 40m)

> LLAP: query job stuck due to race conditions
> 
>
> Key: HIVE-24208
> URL: https://issues.apache.org/jira/browse/HIVE-24208
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.3.4
>Reporter: Yuriy Baltovskyy
>Assignee: Yuriy Baltovskyy
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When issuing an LLAP query, sometimes the TEZ job on LLAP server never ends 
> and it never returns the data reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24244) NPE during Atlas metadata replication

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24244?focusedWorklogId=524783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524783
 ]

ASF GitHub Bot logged work on HIVE-24244:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 00:52
Start Date: 16/Dec/20 00:52
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1563:
URL: https://github.com/apache/hive/pull/1563


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524783)
Time Spent: 1h 20m  (was: 1h 10m)

> NPE during Atlas metadata replication
> -
>
> Key: HIVE-24244
> URL: https://issues.apache.org/jira/browse/HIVE-24244
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24244.01.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24197) Check for write transactions for the db under replication at a frequent interval

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24197?focusedWorklogId=524784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524784
 ]

ASF GitHub Bot logged work on HIVE-24197:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 00:52
Start Date: 16/Dec/20 00:52
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1523:
URL: https://github.com/apache/hive/pull/1523


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524784)
Time Spent: 0.5h  (was: 20m)

> Check for write transactions for the db under replication at a frequent 
> interval
> 
>
> Key: HIVE-24197
> URL: https://issues.apache.org/jira/browse/HIVE-24197
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24197.01.patch, HIVE-24197.02.patch, 
> HIVE-24197.03.patch, HIVE-24197.04.patch, HIVE-24197.05.patch, 
> HIVE-24197.06.patch, HIVE-24197.07.patch, HIVE-24197.08.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23684?focusedWorklogId=524772=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524772
 ]

ASF GitHub Bot logged work on HIVE-23684:
-

Author: ASF GitHub Bot
Created on: 16/Dec/20 00:07
Start Date: 16/Dec/20 00:07
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 opened a new pull request #1786:
URL: https://github.com/apache/hive/pull/1786


   …rdinality ratio is big
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524772)
Remaining Estimate: 0h
Time Spent: 10m

> Large underestimation in NDV stats when input and join cardinality ratio is 
> big
> ---
>
> Key: HIVE-23684
> URL: https://issues.apache.org/jira/browse/HIVE-23684
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Large underestimations of NDV values may occur after a join operation since 
> the current logic will decrease the original NDV values proportionally.
> The 
> [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558]
>  compares the number of rows of each relation before the join with the number 
> of rows after the join and extracts a ratio for each side. Based on this 
> ratio it adapts (reduces) the NDV accordingly.
> Consider for instance the following query:
> {code:sql}
> select inv_warehouse_sk
>  , inv_item_sk
>  , stddev_samp(inv_quantity_on_hand) stdev
>  , avg(inv_quantity_on_hand) mean
> from inventory
>, date_dim
> where inv_date_sk = d_date_sk
>   and d_year = 1999
>   and d_moy = 2
> group by inv_warehouse_sk, inv_item_sk;
> {code}
> For the sake of the discussion, I outline below some relevant stats (from 
> TPCDS30tb):
>  T(inventory) = 1627857000
>  T(date_dim) = 73049
>  T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000
>  V(inventory, inv_date_sk) = 261
>  V(inventory, inv_item_sk) = 42
>  V(inventory, inv_warehouse_sk) = 27
>  V(date_dim, inv, d_date_sk) = 73049
> For instance, in this query the join between inventory and date_dim has ~24M 
> rows while inventory has ~1.5B so the NDV of the columns coming from 
> inventory are reduced by a factor of ~100 so we end up with V(JOIN, 
> inv_item_sk) = ~6K while the real one is 231000.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23684) Large underestimation in NDV stats when input and join cardinality ratio is big

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23684:
--
Labels: pull-request-available  (was: )

> Large underestimation in NDV stats when input and join cardinality ratio is 
> big
> ---
>
> Key: HIVE-23684
> URL: https://issues.apache.org/jira/browse/HIVE-23684
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Large underestimations of NDV values may occur after a join operation since 
> the current logic will decrease the original NDV values proportionally.
> The 
> [code|https://github.com/apache/hive/blob/1271d08a3c51c021fa710449f8748b8cdb12b70f/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L2558]
>  compares the number of rows of each relation before the join with the number 
> of rows after the join and extracts a ratio for each side. Based on this 
> ratio it adapts (reduces) the NDV accordingly.
> Consider for instance the following query:
> {code:sql}
> select inv_warehouse_sk
>  , inv_item_sk
>  , stddev_samp(inv_quantity_on_hand) stdev
>  , avg(inv_quantity_on_hand) mean
> from inventory
>, date_dim
> where inv_date_sk = d_date_sk
>   and d_year = 1999
>   and d_moy = 2
> group by inv_warehouse_sk, inv_item_sk;
> {code}
> For the sake of the discussion, I outline below some relevant stats (from 
> TPCDS30tb):
>  T(inventory) = 1627857000
>  T(date_dim) = 73049
>  T(inventory JOIN date_dim[d_year=1999 AND d_moy=2]) = 24948000
>  V(inventory, inv_date_sk) = 261
>  V(inventory, inv_item_sk) = 42
>  V(inventory, inv_warehouse_sk) = 27
>  V(date_dim, inv, d_date_sk) = 73049
> For instance, in this query the join between inventory and date_dim has ~24M 
> rows while inventory has ~1.5B so the NDV of the columns coming from 
> inventory are reduced by a factor of ~100 so we end up with V(JOIN, 
> inv_item_sk) = ~6K while the real one is 231000.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24534) Prevent comparisons between characters and decimals types when strict checks enabled

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24534?focusedWorklogId=524762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524762
 ]

ASF GitHub Bot logged work on HIVE-24534:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 23:05
Start Date: 15/Dec/20 23:05
Worklog Time Spent: 10m 
  Work Description: zabetak opened a new pull request #1780:
URL: https://github.com/apache/hive/pull/1780


   ### What changes were proposed in this pull request?
   Throw an error when `hive.strict.checks.type.safety=true` and the query 
contains comparison between decimals and character types.
   
   ### Why are the changes needed?
   To fail-fast and avoid unexpected query results. Examples in the JIRA.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Queries relying on comparisons between decimal and character types will fail.
   
   ### How was this patch tested?
   `mvn clean test -Dtest=TestNegativeCliDriver 
-Dqfile="strict_type_decimal_char_00.q,strict_type_decimal_char_01.q,strict_type_decimal_string_00.q,strict_type_decimal_string_01.q,strict_type_decimal_string_02.q,strict_type_decimal_varchar_00.q,strict_type_decimal_varchar_01.q,strict_type_decimal_varchar_02.q,strict_type_decimal_varchar_03.q,strict_type_decimal_varchar_04.q,strict_type_decimal_varchar_05.q,strict_type_decimal_varchar_06.q,strict_type_decimal_varchar_07.q,strict_type_decimal_varchar_08.q"
 -Dtest.output.overwrite`
   `mvn test -Dtest=TestDecimalStringValidation`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524762)
Time Spent: 1h 20m  (was: 1h 10m)

> Prevent comparisons between characters and decimals types when strict checks 
> enabled
> 
>
> Key: HIVE-24534
> URL: https://issues.apache.org/jira/browse/HIVE-24534
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When we compare decimal and character types implicit conversions take place 
> that can lead to unexpected and surprising results. 
> {code:sql}
> create table t_str (str_col string);
> insert into t_str values ('1208925742523269458163819');select * from t_str 
> where str_col=1208925742523269479013976;
> {code}
> The SELECT query brings up one row while the filtering value is not the same 
> with the one present in the string column of the table. The problem is that 
> both types are converted to doubles and due to loss of precision the values 
> are deemed equal.
> Even if we change the implicit conversion to use another type (HIVE-24528) 
> there are always some cases that may lead to unexpected results. 
> The goal of this issue is to prevent comparisons between decimal and 
> character types when hive.strict.checks.type.safety is enabled and throw an 
> error. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24534) Prevent comparisons between characters and decimals types when strict checks enabled

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24534?focusedWorklogId=524760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524760
 ]

ASF GitHub Bot logged work on HIVE-24534:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 23:05
Start Date: 15/Dec/20 23:05
Worklog Time Spent: 10m 
  Work Description: zabetak closed pull request #1780:
URL: https://github.com/apache/hive/pull/1780


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524760)
Time Spent: 1h  (was: 50m)

> Prevent comparisons between characters and decimals types when strict checks 
> enabled
> 
>
> Key: HIVE-24534
> URL: https://issues.apache.org/jira/browse/HIVE-24534
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When we compare decimal and character types implicit conversions take place 
> that can lead to unexpected and surprising results. 
> {code:sql}
> create table t_str (str_col string);
> insert into t_str values ('1208925742523269458163819');select * from t_str 
> where str_col=1208925742523269479013976;
> {code}
> The SELECT query brings up one row while the filtering value is not the same 
> with the one present in the string column of the table. The problem is that 
> both types are converted to doubles and due to loss of precision the values 
> are deemed equal.
> Even if we change the implicit conversion to use another type (HIVE-24528) 
> there are always some cases that may lead to unexpected results. 
> The goal of this issue is to prevent comparisons between decimal and 
> character types when hive.strict.checks.type.safety is enabled and throw an 
> error. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24534) Prevent comparisons between characters and decimals types when strict checks enabled

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24534?focusedWorklogId=524761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524761
 ]

ASF GitHub Bot logged work on HIVE-24534:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 23:05
Start Date: 15/Dec/20 23:05
Worklog Time Spent: 10m 
  Work Description: zabetak commented on pull request #1780:
URL: https://github.com/apache/hive/pull/1780#issuecomment-745622154


   Close and reopen to re-trigger tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524761)
Time Spent: 1h 10m  (was: 1h)

> Prevent comparisons between characters and decimals types when strict checks 
> enabled
> 
>
> Key: HIVE-24534
> URL: https://issues.apache.org/jira/browse/HIVE-24534
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When we compare decimal and character types implicit conversions take place 
> that can lead to unexpected and surprising results. 
> {code:sql}
> create table t_str (str_col string);
> insert into t_str values ('1208925742523269458163819');select * from t_str 
> where str_col=1208925742523269479013976;
> {code}
> The SELECT query brings up one row while the filtering value is not the same 
> with the one present in the string column of the table. The problem is that 
> both types are converted to doubles and due to loss of precision the values 
> are deemed equal.
> Even if we change the implicit conversion to use another type (HIVE-24528) 
> there are always some cases that may lead to unexpected results. 
> The goal of this issue is to prevent comparisons between decimal and 
> character types when hive.strict.checks.type.safety is enabled and throw an 
> error. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24394) Enable printing explain to console at query start

2020-12-15 Thread Johan Gustavsson (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249893#comment-17249893
 ] 

Johan Gustavsson commented on HIVE-24394:
-

Thanks for taking a look at this and thank you for the feedback [~kgyrtkirk]. I 
think that in normal cases using the standard HS2 web ui in combination with 
the Tez UI make most queries very debug-able. The problem is that due to our 
internal security policies, our end users can't be granted access to these 
services. The only information end users can get access to is the query they 
submit, the console log and the result of the query. On top of this most of the 
users are submitting their queries through workflows using temporary 
intermediate tables, so making a temporary change to do a one off explain print 
is sometimes not easy for them. With this change we default to just have 
explain printed in the console for all queries to make it easier to go back and 
investigate unexpected behaviors.

> Enable printing explain to console at query start
> -
>
> Key: HIVE-24394
> URL: https://issues.apache.org/jira/browse/HIVE-24394
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor
>Affects Versions: 2.3.7, 3.1.2
>Reporter: Johan Gustavsson
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is a hive.log.explain.output option that prints extended 
> explain to log. While this is helpful for internal investigations, it limits 
> the information that is available to users. So we should add options to make 
> this print non-extended explain to console,. for general user consumption, to 
> make it easier for users to debug queries and workflows without having to 
> resubmit queries with explain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24543) Support SAML 2.0 as an authentication mechanism

2020-12-15 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-24543:
--


> Support SAML 2.0 as an authentication mechanism
> ---
>
> Key: HIVE-24543
> URL: https://issues.apache.org/jira/browse/HIVE-24543
> Project: Hive
>  Issue Type: New Feature
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> With cloud based deployments, having a SAML 2.0 based authentication support 
> in HS2 will be greatly useful in case of federated or external identity 
> providers like Okta, PingIdentity or Azure AD.
> This authentication mechanism can initially be only supported on http 
> transport mode in HiveServer2 since the SAML 2.0 protocol is primarily 
> designed for web clients.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23553) Upgrade ORC version to 1.6.6

2020-12-15 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-23553:
--
Summary: Upgrade ORC version to 1.6.6  (was: Bump ORC version to 1.6.6)

> Upgrade ORC version to 1.6.6
> 
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23553) Bump ORC version to 1.6.6

2020-12-15 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-23553:
--
Summary: Bump ORC version to 1.6.6  (was: Bump ORC version to 1.6)

> Bump ORC version to 1.6.6
> -
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23553) Bump ORC version to 1.6

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?focusedWorklogId=524547=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524547
 ]

ASF GitHub Bot logged work on HIVE-23553:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 16:42
Start Date: 15/Dec/20 16:42
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1785:
URL: https://github.com/apache/hive/pull/1785


   ### What changes were proposed in this pull request?
   Bump apache ORC version to 1.6.6
   
   ### Why are the changes needed?
   So hive can take advantage of the latest features and bug fixes
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   Internal tests + q files



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524547)
Remaining Estimate: 0h
Time Spent: 10m

> Bump ORC version to 1.6
> ---
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23553) Bump ORC version to 1.6

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23553:
--
Labels: pull-request-available  (was: )

> Bump ORC version to 1.6
> ---
>
> Key: HIVE-23553
> URL: https://issues.apache.org/jira/browse/HIVE-23553
> Project: Hive
>  Issue Type: Improvement
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  Apache Hive is currently on 1.5.X version and in order to take advantage of 
> the latest ORC improvements such as column encryption we have to bump to 
> 1.6.X.
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343288==12318320=Create_token=A5KQ-2QAV-T4JA-FDED_4ae78f19321c7fb1e7f337fba1dd90af751d8810_lin
> Even though ORC reader could work out of the box, HIVE LLAP is heavily 
> depending on internal ORC APIs e.g., to retrieve and store File Footers, 
> Tails, streams – un/compress RG data etc. As there ware many internal changes 
> from 1.5 to 1.6 (Input stream offsets, relative BufferChunks etc.) the 
> upgrade is not straightforward.
> This Umbrella Jira tracks this upgrade effort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24274) Implement Query Text based MaterializedView rewrite

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24274?focusedWorklogId=524533=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524533
 ]

ASF GitHub Bot logged work on HIVE-24274:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 16:28
Start Date: 15/Dec/20 16:28
Worklog Time Spent: 10m 
  Work Description: kasakrisz closed pull request #1561:
URL: https://github.com/apache/hive/pull/1561


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524533)
Time Spent: 4h 10m  (was: 4h)

> Implement Query Text based MaterializedView rewrite
> ---
>
> Key: HIVE-24274
> URL: https://issues.apache.org/jira/browse/HIVE-24274
> Project: Hive
>  Issue Type: Improvement
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Besides the way queries are currently rewritten to use materialized views in 
> Hive this project provides an alternative:
> Compare the query text with the materialized views query text stored. If we 
> found a match the original query's logical plan can be replaced by a scan on 
> the materialized view.
> - Only materialized views which are enabled to rewrite can participate
> - Use existing *HiveMaterializedViewsRegistry* through *Hive* object by 
> adding a lookup method by query text.
> - There might be more than one materialized views which have the same query 
> text. In this case chose the first valid one.
> - Validation can be done by calling 
> *Hive.validateMaterializedViewsFromRegistry()*
> - The scope of this first patch is rewriting queries which entire text can be 
> matched only.
> - Use the expanded query text (fully qualified column and table names) for 
> comparing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24542) Prepare Guava for Upgrades

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24542?focusedWorklogId=524489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524489
 ]

ASF GitHub Bot logged work on HIVE-24542:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:50
Start Date: 15/Dec/20 14:50
Worklog Time Spent: 10m 
  Work Description: belugabehr opened a new pull request #1784:
URL: https://github.com/apache/hive/pull/1784


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524489)
Remaining Estimate: 0h
Time Spent: 10m

> Prepare Guava for Upgrades
> --
>
> Key: HIVE-24542
> URL: https://issues.apache.org/jira/browse/HIVE-24542
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive is currently using some Guava methods that are removed in future 
> versions, also, in some projects, the version of Guava being used is being 
> implicitly inherited from other projects even though Hive has a defined 
> version.  Be explicit about it.
> These actions will make upgrading Guava versions easier in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24542) Prepare Guava for Upgrades

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24542:
--
Labels: pull-request-available  (was: )

> Prepare Guava for Upgrades
> --
>
> Key: HIVE-24542
> URL: https://issues.apache.org/jira/browse/HIVE-24542
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hive is currently using some Guava methods that are removed in future 
> versions, also, in some projects, the version of Guava being used is being 
> implicitly inherited from other projects even though Hive has a defined 
> version.  Be explicit about it.
> These actions will make upgrading Guava versions easier in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24542) Prepare Guava for Upgrades

2020-12-15 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor reassigned HIVE-24542:
-


> Prepare Guava for Upgrades
> --
>
> Key: HIVE-24542
> URL: https://issues.apache.org/jira/browse/HIVE-24542
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>
> Hive is currently using some Guava methods that are removed in future 
> versions, also, in some projects, the version of Guava being used is being 
> implicitly inherited from other projects even though Hive has a defined 
> version.  Be explicit about it.
> These actions will make upgrading Guava versions easier in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24530) Potential NPE in FileSinkOperator.closeRecordwriters method

2020-12-15 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora resolved HIVE-24530.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.
Thanks a lot [~szita] for the review!

> Potential NPE in FileSinkOperator.closeRecordwriters method
> ---
>
> Key: HIVE-24530
> URL: https://issues.apache.org/jira/browse/HIVE-24530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> During testing a NPE occurred in the FileSinkOperator.closeRecordwriters 
> method.
> After investigating, turned out there was an underlaying IOException during 
> executing the FileSinkOperator.process method. It got caught by the following 
> code part:
> {noformat}
> } catch (IOException e) {
>   closeWriters(true);
>   throw new HiveException(e);
> } catch (SerDeException e) {
>   closeWriters(true);
>   throw new HiveException(e);
> }
> {noformat}
> First the closeWriters method was called:
> {noformat}
>   private void closeWriters(boolean abort) throws HiveException {
> fpaths.closeWriters(true);
> closeRecordwriters(true);
>   }
>   private void closeRecordwriters(boolean abort) {
> for (RecordWriter writer : rowOutWriters) {
>   try {
> LOG.info("Closing {} on exception", writer);
> writer.close(abort);
>   } catch (IOException e) {
> LOG.error("Error closing rowOutWriter" + writer, e);
>   }
> }
> {noformat}
> If the writers had got closed successfully, a HiveException would have been 
> thrown with the original IOException.
> But when the IOException occurred the writers in the rowOutWriters were not 
> yet initialised, so a NPE occurred. This was very misleading as the NPE was 
> not the real issue, but the original IOException was hidden.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24530) Potential NPE in FileSinkOperator.closeRecordwriters method

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24530?focusedWorklogId=524479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524479
 ]

ASF GitHub Bot logged work on HIVE-24530:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:40
Start Date: 15/Dec/20 14:40
Worklog Time Spent: 10m 
  Work Description: kuczoram merged pull request #1775:
URL: https://github.com/apache/hive/pull/1775


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524479)
Time Spent: 40m  (was: 0.5h)

> Potential NPE in FileSinkOperator.closeRecordwriters method
> ---
>
> Key: HIVE-24530
> URL: https://issues.apache.org/jira/browse/HIVE-24530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> During testing a NPE occurred in the FileSinkOperator.closeRecordwriters 
> method.
> After investigating, turned out there was an underlaying IOException during 
> executing the FileSinkOperator.process method. It got caught by the following 
> code part:
> {noformat}
> } catch (IOException e) {
>   closeWriters(true);
>   throw new HiveException(e);
> } catch (SerDeException e) {
>   closeWriters(true);
>   throw new HiveException(e);
> }
> {noformat}
> First the closeWriters method was called:
> {noformat}
>   private void closeWriters(boolean abort) throws HiveException {
> fpaths.closeWriters(true);
> closeRecordwriters(true);
>   }
>   private void closeRecordwriters(boolean abort) {
> for (RecordWriter writer : rowOutWriters) {
>   try {
> LOG.info("Closing {} on exception", writer);
> writer.close(abort);
>   } catch (IOException e) {
> LOG.error("Error closing rowOutWriter" + writer, e);
>   }
> }
> {noformat}
> If the writers had got closed successfully, a HiveException would have been 
> thrown with the original IOException.
> But when the IOException occurred the writers in the rowOutWriters were not 
> yet initialised, so a NPE occurred. This was very misleading as the NPE was 
> not the real issue, but the original IOException was hidden.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524471=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524471
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:33
Start Date: 15/Dec/20 14:33
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543397112



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
##
@@ -100,20 +101,19 @@ public void configure(JobConf job) {
 isTagged = gWork.getNeedsTagging();
 try {
   keyTableDesc = gWork.getKeyDesc();
-  inputKeyDeserializer = ReflectionUtils.newInstance(keyTableDesc
-  .getDeserializerClass(), null);
-  SerDeUtils.initializeSerDe(inputKeyDeserializer, null, 
keyTableDesc.getProperties(), null);
+  AbstractSerDe serDe = ReflectionUtils.newInstance(keyTableDesc
+  .getSerDeClass(), null);
+  serDe.initialize(null, keyTableDesc.getProperties(), null);
+  inputKeyDeserializer = serDe;

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524471)
Time Spent: 3.5h  (was: 3h 20m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524470
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:31
Start Date: 15/Dec/20 14:31
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543395685



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
##
@@ -273,15 +273,14 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 try {
   this.hconf = hconf;
 
-  scriptOutputDeserializer = conf.getScriptOutputInfo()
-  .getDeserializerClass().newInstance();
-  SerDeUtils.initializeSerDe(scriptOutputDeserializer, hconf,
- conf.getScriptOutputInfo().getProperties(), 
null);
+  AbstractSerDe outputSerde = 
conf.getScriptOutputInfo().getSerDeClass().newInstance();
+  outputSerde.initialize(hconf, 
conf.getScriptOutputInfo().getProperties(), null);
 
-  scriptInputSerializer = (Serializer) conf.getScriptInputInfo()
-  .getDeserializerClass().newInstance();
-  scriptInputSerializer.initialize(hconf, conf.getScriptInputInfo()
-  .getProperties());
+  AbstractSerDe inputSerde = 
conf.getScriptInputInfo().getSerDeClass().newInstance();

Review comment:
   Done.

##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java
##
@@ -407,7 +407,7 @@ public void initEmptyInputChildren(List> 
children, Configuration hco
   StructObjectInspector soi = null;
   PartitionDesc partDesc = 
conf.getAliasToPartnInfo().get(tsOp.getConf().getAlias());
   Configuration newConf = 
tableNameToConf.get(partDesc.getTableDesc().getTableName());
-  Deserializer serde = partDesc.getTableDesc().getDeserializer();
+  Deserializer serde = partDesc.getTableDesc().getSerDe();

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524470)
Time Spent: 3h 20m  (was: 3h 10m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524466
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:29
Start Date: 15/Dec/20 14:29
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543393854



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecReducer.java
##
@@ -100,20 +101,19 @@ public void configure(JobConf job) {
 isTagged = gWork.getNeedsTagging();
 try {
   keyTableDesc = gWork.getKeyDesc();
-  inputKeyDeserializer = ReflectionUtils.newInstance(keyTableDesc
-  .getDeserializerClass(), null);
-  SerDeUtils.initializeSerDe(inputKeyDeserializer, null, 
keyTableDesc.getProperties(), null);
+  AbstractSerDe serDe = ReflectionUtils.newInstance(keyTableDesc
+  .getSerDeClass(), null);
+  serDe.initialize(null, keyTableDesc.getProperties(), null);
+  inputKeyDeserializer = serDe;
   keyObjectInspector = inputKeyDeserializer.getObjectInspector();
   valueTableDesc = new TableDesc[gWork.getTagToValueDesc().size()];
   for (int tag = 0; tag < gWork.getTagToValueDesc().size(); tag++) {
 // We should initialize the SerDe with the TypeInfo when available.
 valueTableDesc[tag] = gWork.getTagToValueDesc().get(tag);
-inputValueDeserializer[tag] = ReflectionUtils.newInstance(
-valueTableDesc[tag].getDeserializerClass(), null);
-SerDeUtils.initializeSerDe(inputValueDeserializer[tag], null,
-   valueTableDesc[tag].getProperties(), null);
-valueObjectInspector[tag] = inputValueDeserializer[tag]
-.getObjectInspector();
+AbstractSerDe sd = 
ReflectionUtils.newInstance(valueTableDesc[tag].getSerDeClass(), null);
+sd.initialize(null, valueTableDesc[tag].getProperties(), null);
+inputValueDeserializer[tag] = sd;
+valueObjectInspector[tag] = 
inputValueDeserializer[tag].getObjectInspector();

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524466)
Time Spent: 3h  (was: 2h 50m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524467
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:29
Start Date: 15/Dec/20 14:29
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543394455



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/ScriptOperator.java
##
@@ -273,15 +273,14 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 try {
   this.hconf = hconf;
 
-  scriptOutputDeserializer = conf.getScriptOutputInfo()
-  .getDeserializerClass().newInstance();
-  SerDeUtils.initializeSerDe(scriptOutputDeserializer, hconf,
- conf.getScriptOutputInfo().getProperties(), 
null);
+  AbstractSerDe outputSerde = 
conf.getScriptOutputInfo().getSerDeClass().newInstance();

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524467)
Time Spent: 3h 10m  (was: 3h)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524465
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:28
Start Date: 15/Dec/20 14:28
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543393380



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
##
@@ -154,10 +157,11 @@ public void init(JobConf job, OutputCollector output, 
Reporter reporter) throws
   for (int tag = 0; tag < gWork.getTagToValueDesc().size(); tag++) {
 // We should initialize the SerDe with the TypeInfo when available.
 valueTableDesc[tag] = gWork.getTagToValueDesc().get(tag);
-inputValueDeserializer[tag] = ReflectionUtils.newInstance(
-valueTableDesc[tag].getDeserializerClass(), null);
-SerDeUtils.initializeSerDe(inputValueDeserializer[tag], null,
-valueTableDesc[tag].getProperties(), null);
+
+AbstractSerDe sd = 
ReflectionUtils.newInstance(valueTableDesc[tag].getSerDeClass(), null);
+sd.initialize(null, valueTableDesc[tag].getProperties(), null);
+
+inputValueDeserializer[tag] = sd;

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524465)
Time Spent: 2h 50m  (was: 2h 40m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24477) Separate production and test code in TxnDbUtil

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24477?focusedWorklogId=524463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524463
 ]

ASF GitHub Bot logged work on HIVE-24477:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:26
Start Date: 15/Dec/20 14:26
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1731:
URL: https://github.com/apache/hive/pull/1731#issuecomment-745321236


   @miklosgergely could you merge this? thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524463)
Time Spent: 0.5h  (was: 20m)

> Separate production and test code in TxnDbUtil
> --
>
> Key: HIVE-24477
> URL: https://issues.apache.org/jira/browse/HIVE-24477
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This class was created as a test utility, but it is production package, since 
> it is used in multiple projects. Now it is a mixed of test utility and 
> production utility which is unfortunate.
> The production code could be moved to TxnUtils class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524464
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:26
Start Date: 15/Dec/20 14:26
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543391931



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkReduceRecordHandler.java
##
@@ -132,10 +133,12 @@ public void init(JobConf job, OutputCollector output, 
Reporter reporter) throws
 isTagged = gWork.getNeedsTagging();
 try {
   keyTableDesc = gWork.getKeyDesc();
-  inputKeyDeserializer = ReflectionUtils.newInstance(keyTableDesc
-.getDeserializerClass(), null);
-  SerDeUtils.initializeSerDe(inputKeyDeserializer, null, 
keyTableDesc.getProperties(), null);
-  keyObjectInspector = inputKeyDeserializer.getObjectInspector();
+  AbstractSerDe serde = ReflectionUtils.newInstance(keyTableDesc
+.getSerDeClass(), null);
+  serde.initialize(null, keyTableDesc.getProperties(), null);
+  keyObjectInspector = serde.getObjectInspector();
+
+  inputKeyDeserializer = serde;

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524464)
Time Spent: 2h 40m  (was: 2.5h)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524462
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:24
Start Date: 15/Dec/20 14:24
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543390071



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicValueRegistryTez.java
##
@@ -104,8 +105,8 @@ public void init(RegistryConf conf) throws Exception {
   RuntimeValuesInfo runtimeValuesInfo = 
rct.baseWork.getInputSourceToRuntimeValuesInfo().get(inputSourceName);
 
   // Setup deserializer/obj inspectors for the incoming data source
-  Deserializer deserializer = 
ReflectionUtils.newInstance(runtimeValuesInfo.getTableDesc().getDeserializerClass(),
 null);
-  deserializer.initialize(rct.conf, 
runtimeValuesInfo.getTableDesc().getProperties());
+  AbstractSerDe deserializer = 
ReflectionUtils.newInstance(runtimeValuesInfo.getTableDesc().getSerDeClass(), 
null);

Review comment:
   Resolved.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524462)
Time Spent: 2.5h  (was: 2h 20m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524461
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:24
Start Date: 15/Dec/20 14:24
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543389456



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java
##
@@ -152,10 +152,11 @@ void init(JobConf jconf, Operator reducer, boolean 
vectorized, TableDesc keyT
 this.tag = tag;
 
 try {
-  inputKeyDeserializer = ReflectionUtils.newInstance(keyTableDesc
-  .getDeserializerClass(), null);
-  SerDeUtils.initializeSerDe(inputKeyDeserializer, null, 
keyTableDesc.getProperties(), null);
-  keyObjectInspector = inputKeyDeserializer.getObjectInspector();
+  AbstractSerDe serde = 
ReflectionUtils.newInstance(keyTableDesc.getSerDeClass(), null);
+  serde.initialize(null, keyTableDesc.getProperties(), null);
+
+  inputKeyDeserializer = serde; 

Review comment:
   Done.  I had it this way because `inputKeyDeserializer` is a 
`Deserializer` class which does not have an 'initialize' method.  However, I 
change the type to be `AbstractSerDe` to address your comments here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524461)
Time Spent: 2h 20m  (was: 2h 10m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524458
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:19
Start Date: 15/Dec/20 14:19
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543383695



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/TablePropertyEnrichmentOptimizer.java
##
@@ -115,13 +115,13 @@ public Object process(Node nd, Stack stack, 
NodeProcessorCtx procCtx, Obje
   String deserializerClassName = null;
   try {
 deserializerClassName = 
tableScanDesc.getTableMetadata().getSd().getSerdeInfo().getSerializationLib();
-Deserializer deserializer = ReflectionUtil.newInstance(
+AbstractSerDe deserializer = ReflectionUtil.newInstance(

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524458)
Time Spent: 2h 10m  (was: 2h)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24534) Prevent comparisons between characters and decimals types when strict checks enabled

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24534?focusedWorklogId=524459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524459
 ]

ASF GitHub Bot logged work on HIVE-24534:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:19
Start Date: 15/Dec/20 14:19
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #1780:
URL: https://github.com/apache/hive/pull/1780#discussion_r543384168



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/type/TypeCheckProcFactory.java
##
@@ -801,6 +797,22 @@ private boolean unSafeCompareWithBigInt(TypeInfo 
otherTypeInfo, TypeInfo bigintC
   }
   return false;
 }
+
+private boolean isDecimalCharacterComparison(TypeInfo type1, TypeInfo 
type2) {
+  if(type1 instanceof PrimitiveTypeInfo && type2 instanceof 
PrimitiveTypeInfo) {
+Set> comparisons = new 
HashSet<>(3);
+comparisons.add(EnumSet.of(
+PrimitiveObjectInspector.PrimitiveCategory.DECIMAL, 
PrimitiveObjectInspector.PrimitiveCategory.CHAR));
+comparisons.add(EnumSet.of(
+PrimitiveObjectInspector.PrimitiveCategory.DECIMAL, 
PrimitiveObjectInspector.PrimitiveCategory.VARCHAR));
+comparisons.add(EnumSet.of(

Review comment:
   Fixed in 
https://github.com/apache/hive/pull/1780/commits/ab7432b6f8af27755acc28b2034fd2d3870c0207.
 Overall there is more refactoring to be done but I want to keep it in a 
separate issue.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524459)
Time Spent: 50m  (was: 40m)

> Prevent comparisons between characters and decimals types when strict checks 
> enabled
> 
>
> Key: HIVE-24534
> URL: https://issues.apache.org/jira/browse/HIVE-24534
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When we compare decimal and character types implicit conversions take place 
> that can lead to unexpected and surprising results. 
> {code:sql}
> create table t_str (str_col string);
> insert into t_str values ('1208925742523269458163819');select * from t_str 
> where str_col=1208925742523269479013976;
> {code}
> The SELECT query brings up one row while the filtering value is not the same 
> with the one present in the string column of the table. The problem is that 
> both types are converted to doubles and due to loss of precision the values 
> are deemed equal.
> Even if we change the implicit conversion to use another type (HIVE-24528) 
> there are always some cases that may lead to unexpected results. 
> The goal of this issue is to prevent comparisons between decimal and 
> character types when hive.strict.checks.type.safety is enabled and throw an 
> error. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524456
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:16
Start Date: 15/Dec/20 14:16
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543379546



##
File path: serde/src/java/org/apache/hadoop/hive/serde2/SerDe.java
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.serde2;
+
+/**
+ * A Hive Serializer/Deserializer.
+ */
+public interface SerDe {
+
+  /**
+   * Returns statistics collected when serializing

Review comment:
   Thanks. Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524456)
Time Spent: 2h  (was: 1h 50m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524454
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:15
Start Date: 15/Dec/20 14:15
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543378128



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableDummyOperator.java
##
@@ -45,8 +44,8 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 super.initializeOp(hconf);
 TableDesc tbl = this.getConf().getTbl();
 try {
-  Deserializer serde = tbl.getDeserializerClass().newInstance();
-  SerDeUtils.initializeSerDe(serde, hconf, tbl.getProperties(), null);
+  AbstractSerDe serde = tbl.getSerDeClass().newInstance();

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524454)
Time Spent: 1h 50m  (was: 1h 40m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524451
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:14
Start Date: 15/Dec/20 14:14
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543376678



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
##
@@ -179,18 +178,18 @@ protected void initializeOp(Configuration hconf) throws 
HiveException {
 }
 try {
   TableDesc keyTableDesc = conf.getKeyTblDesc();
-  AbstractSerDe keySerde = (AbstractSerDe) 
ReflectionUtils.newInstance(keyTableDesc.getDeserializerClass(),
+  AbstractSerDe keySerde = (AbstractSerDe) 
ReflectionUtils.newInstance(keyTableDesc.getSerDeClass(),

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524451)
Time Spent: 1.5h  (was: 1h 20m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524453
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:14
Start Date: 15/Dec/20 14:14
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543377449



##
File path: 
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InternalUtil.java
##
@@ -143,18 +142,16 @@ private static ObjectInspector 
getObjectInspector(TypeInfo type) throws IOExcept
   //TODO this has to find a better home, it's also hardcoded as default in 
hive would be nice
   // if the default was decided by the serde
   static void initializeOutputSerDe(AbstractSerDe serDe, Configuration conf, 
OutputJobInfo jobInfo)
-throws SerDeException {
-SerDeUtils.initializeSerDe(serDe, conf,
-   getSerdeProperties(jobInfo.getTableInfo(),
-  jobInfo.getOutputSchema()),
-   null);
+  throws SerDeException {
+serDe.initialize(conf, getSerdeProperties(jobInfo.getTableInfo(), 
jobInfo.getOutputSchema()), null);
   }
 
   static void initializeDeserializer(Deserializer deserializer, Configuration 
conf,
  HCatTableInfo info, HCatSchema schema) throws 
SerDeException {
 Properties props = getSerdeProperties(info, schema);
 LOG.info("Initializing " + deserializer.getClass().getName() + " with 
properties " + props);
-SerDeUtils.initializeSerDe(deserializer, conf, props, null);
+AbstractSerDe serde = (AbstractSerDe)deserializer;

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524453)
Time Spent: 1h 40m  (was: 1.5h)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524450
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:13
Start Date: 15/Dec/20 14:13
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543375870



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java
##
@@ -352,8 +351,8 @@ public void generateMapMetaData() throws HiveException {
 try {
   TableDesc keyTableDesc = conf.getKeyTblDesc();
   AbstractSerDe keySerializer = (AbstractSerDe) ReflectionUtil.newInstance(

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524450)
Time Spent: 1h 20m  (was: 1h 10m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524446
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:12
Start Date: 15/Dec/20 14:12
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543373993



##
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/PartitionDesc.java
##
@@ -171,9 +171,9 @@ public String getDeserializerClassName() {
   public Deserializer getDeserializer(Configuration conf) throws Exception {
 Properties schema = getProperties();
 String clazzName = getDeserializerClassName();
-Deserializer deserializer = 
ReflectionUtil.newInstance(conf.getClassByName(clazzName)
-.asSubclass(Deserializer.class), conf);
-SerDeUtils.initializeSerDe(deserializer, conf, 
getTableDesc().getProperties(), schema);
+AbstractSerDe deserializer = 
ReflectionUtil.newInstance(conf.getClassByName(clazzName)

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524446)
Time Spent: 1h  (was: 50m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524448
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:12
Start Date: 15/Dec/20 14:12
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543374966



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/ReduceRecordSource.java
##
@@ -164,10 +165,8 @@ void init(JobConf jconf, Operator reducer, boolean 
vectorized, TableDesc keyT
 
   // We should initialize the SerDe with the TypeInfo when available.
   this.valueTableDesc = valueTableDesc;
-  inputValueDeserializer = (AbstractSerDe) ReflectionUtils.newInstance(
-  valueTableDesc.getDeserializerClass(), null);
-  SerDeUtils.initializeSerDe(inputValueDeserializer, null,
-  valueTableDesc.getProperties(), null);
+  inputValueDeserializer = (AbstractSerDe) 
ReflectionUtils.newInstance(valueTableDesc.getSerDeClass(), null);

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524448)
Time Spent: 1h 10m  (was: 1h)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24541) Add config to set a default storage handler class

2020-12-15 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-24541:
-


> Add config to set a default storage handler class
> -
>
> Key: HIVE-24541
> URL: https://issues.apache.org/jira/browse/HIVE-24541
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Add a config param "hive.default.storage.handler.class" so we can set a 
> default storage handler class that can be used for all create table 
> statements. By default it would be an empty string, taking no effect.
> This would allow existing user queries to be reused for a new table format 
> for example, such as Iceberg. 
> For example, after setting in the config: hive.default.storage.handler.class= 
> org.apache.iceberg.mr.hive.HiveIcebergStorageHandler
> The query: CREATE TABLE abc (a int, b string) LOCATION ...
> would be equivalent to: CREATE TABLE abc (a int, b string) STORED BY 
> 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION ...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24332) Make AbstractSerDe Superclass of all Classes

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24332?focusedWorklogId=524445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524445
 ]

ASF GitHub Bot logged work on HIVE-24332:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:10
Start Date: 15/Dec/20 14:10
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on a change in pull request #1634:
URL: https://github.com/apache/hive/pull/1634#discussion_r543372574



##
File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java
##
@@ -775,7 +775,7 @@ public static void setTaskPlan(Path path, String alias,
 if (topOp instanceof TableScanOperator) {
   try {
 Utilities.addSchemaEvolutionToTableScanOperator(
-  (StructObjectInspector) 
tt_desc.getDeserializer().getObjectInspector(),
+  (StructObjectInspector) tt_desc.getSerDe().getObjectInspector(),

Review comment:
   Done.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524445)
Time Spent: 50m  (was: 40m)

> Make AbstractSerDe Superclass of all Classes
> 
>
> Key: HIVE-24332
> URL: https://issues.apache.org/jira/browse/HIVE-24332
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Rework how {{AbstractSerDe}}, {{Deserializer}}, and {{Serializer}} classes 
> are designed.
> Simplify, and consolidate more functionality into {{AbstractSerDe}}.  Remove 
> functionality that is not commonly used.  Remove deprecated methods that were 
> deprecated in 3.x (or maybe even older).
> Make it like Java's {{ByteChannel}} that provides implementations for both 
> {{ReadableByteChannel}} and {{WriteableByteChannel}} interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24460) Refactor Get Next Event ID for DbNotificationListener

2020-12-15 Thread David Mollitor (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Mollitor resolved HIVE-24460.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Pushed to master.  Thank you [~aasha] and [~mgergely] for the review!

> Refactor Get Next Event ID for DbNotificationListener
> -
>
> Key: HIVE-24460
> URL: https://issues.apache.org/jira/browse/HIVE-24460
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Refactor event ID generation to match notification log ID generation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24460) Refactor Get Next Event ID for DbNotificationListener

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24460?focusedWorklogId=524441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524441
 ]

ASF GitHub Bot logged work on HIVE-24460:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 14:04
Start Date: 15/Dec/20 14:04
Worklog Time Spent: 10m 
  Work Description: belugabehr merged pull request #1725:
URL: https://github.com/apache/hive/pull/1725


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524441)
Time Spent: 1h  (was: 50m)

> Refactor Get Next Event ID for DbNotificationListener
> -
>
> Key: HIVE-24460
> URL: https://issues.apache.org/jira/browse/HIVE-24460
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Refactor event ID generation to match notification log ID generation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24540) Add test hive shell for simpler execution tests and debugging

2020-12-15 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-24540:
-


> Add test hive shell for simpler execution tests and debugging
> -
>
> Key: HIVE-24540
> URL: https://issues.apache.org/jira/browse/HIVE-24540
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> In the Apache Iceberg project, we've been using a 
> TestHiveShell/TestHiveMetastore class for running query execution unit tests, 
> which made our life much easier both in terms of writing test and debugging 
> the code from an IDE. It would have value bringing it to the Apache Hive 
> project as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24539:
--
Labels: pull-request-available  (was: )

> OrcInputFormat schema generation should respect column delimiter
> 
>
> Key: HIVE-24539
> URL: https://issues.apache.org/jira/browse/HIVE-24539
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24539?focusedWorklogId=524423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524423
 ]

ASF GitHub Bot logged work on HIVE-24539:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 13:43
Start Date: 15/Dec/20 13:43
Worklog Time Spent: 10m 
  Work Description: pgaref opened a new pull request #1783:
URL: https://github.com/apache/hive/pull/1783


   ### What changes were proposed in this pull request?
   OrcInputFormat  getDesiredRowTypeDescr method should use column delimiter to 
generate column names.
   
   ### Why are the changes needed?
   Current logic can create wrong names when column names contain commas.
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   q file



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524423)
Remaining Estimate: 0h
Time Spent: 10m

> OrcInputFormat schema generation should respect column delimiter
> 
>
> Key: HIVE-24539
> URL: https://issues.apache.org/jira/browse/HIVE-24539
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-15 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis updated HIVE-24539:
--
Description: 
OrcInputFormat currently generates schema using the given configuration and the 
default delimiter – that causes inconsistencies when names contain commas.

We should follow a similar approach to 
[OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]

  was:
OrcInputFormat currently generates schema using the given configuration and the 
default delimiter – that causes inconsistencies when names contain commas.

We should follow a similar approach to 
[OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]:


> OrcInputFormat schema generation should respect column delimiter
> 
>
> Key: HIVE-24539
> URL: https://issues.apache.org/jira/browse/HIVE-24539
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24539) OrcInputFormat schema generation should respect column delimiter

2020-12-15 Thread Panagiotis Garefalakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Panagiotis Garefalakis reassigned HIVE-24539:
-


> OrcInputFormat schema generation should respect column delimiter
> 
>
> Key: HIVE-24539
> URL: https://issues.apache.org/jira/browse/HIVE-24539
> Project: Hive
>  Issue Type: Bug
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>
> OrcInputFormat currently generates schema using the given configuration and 
> the default delimiter – that causes inconsistencies when names contain commas.
> We should follow a similar approach to 
> [OrcOutputFormat|https://github.com/apache/hive/blob/9563dd63188280f4b7c307f36e1ea0c69aec/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java#L145]:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24428) Concurrent add_partitions requests may lead to data loss

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24428?focusedWorklogId=524413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524413
 ]

ASF GitHub Bot logged work on HIVE-24428:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 13:28
Start Date: 15/Dec/20 13:28
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1724:
URL: https://github.com/apache/hive/pull/1724#issuecomment-745286738


   @deniskuzZ I've added a test for the issue; there is an interesting aspects 
of this issue it only happens in case there is at least 1 known dimension of 
the dpp insert



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524413)
Time Spent: 1h 10m  (was: 1h)

> Concurrent add_partitions requests may lead to data loss
> 
>
> Key: HIVE-24428
> URL: https://issues.apache.org/jira/browse/HIVE-24428
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> in case multiple clients are adding partitions to the same table - when the 
> same partition is being added there is a chance that the data dir is removed 
> after the other client have already written its data
> https://github.com/apache/hive/blob/5e96b14a2357c66a0640254d5414bc706d8be852/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3958



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24428) Concurrent add_partitions requests may lead to data loss

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24428?focusedWorklogId=524412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524412
 ]

ASF GitHub Bot logged work on HIVE-24428:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 13:26
Start Date: 15/Dec/20 13:26
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on a change in pull request #1724:
URL: https://github.com/apache/hive/pull/1724#discussion_r543338891



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java
##
@@ -747,6 +753,92 @@ private void checkFileFormats(Hive db, LoadTableDesc tbd, 
Table table)
 }
   }
 
+  class LocalTableLock  implements Closeable{
+
+private Optional lock;
+private HiveLock lockObj;
+
+public LocalTableLock(Optional lock) throws LockException {
+
+  this.lock = lock;
+  if(!lock.isPresent()) {
+return;
+  }
+  LOG.info("LocalTableLock; locking: " + lock);
+  HiveLockManager lockMgr = context.getHiveTxnManager().getLockManager();
+  lockObj = lockMgr.lock(lock.get(), HiveLockMode.SEMI_SHARED, true);
+  LOG.info("LocalTableLock; locked: " + lock);
+}
+
+@Override
+public void close() throws IOException {
+  if(!lock.isPresent()) {
+return;
+  }
+  LOG.info("LocalTableLock; unlocking: "+lock);
+  HiveLockManager lockMgr;
+  try {
+lockMgr = context.getHiveTxnManager().getLockManager();
+lockMgr.unlock(lockObj);
+  } catch (LockException e1) {
+throw new IOException(e1);
+  }
+  LOG.info("LocalTableLock; unlocked");
+}
+
+  }
+
+  private LocalTableLock acquireLockForFileMove(LoadTableDesc loadTableWork) 
throws HiveException {
+// nothing needs to be done
+if (!conf.getBoolVar(HiveConf.ConfVars.HIVE_SUPPORT_CONCURRENCY)) {
+  return new LocalTableLock(Optional.empty());
+}
+String lockFileMoveMode = 
conf.getVar(HiveConf.ConfVars.HIVE_LOCK_FILE_MOVE_MODE);
+
+if ("none".equalsIgnoreCase(lockFileMoveMode)) {

Review comment:
   the `MoveTask` is created in the `TaskFactory` - and the "work" is only 
set via `setWork` - I choose not to override that method and parsed the enum 
value right when its needed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524412)
Time Spent: 1h  (was: 50m)

> Concurrent add_partitions requests may lead to data loss
> 
>
> Key: HIVE-24428
> URL: https://issues.apache.org/jira/browse/HIVE-24428
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> in case multiple clients are adding partitions to the same table - when the 
> same partition is being added there is a chance that the data dir is removed 
> after the other client have already written its data
> https://github.com/apache/hive/blob/5e96b14a2357c66a0640254d5414bc706d8be852/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L3958



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24322) In case of direct insert, the attempt ID has to be checked when reading the manifest files

2020-12-15 Thread Marta Kuczora (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marta Kuczora resolved HIVE-24322.
--
Resolution: Fixed

> In case of direct insert, the attempt ID has to be checked when reading the 
> manifest files
> --
>
> Key: HIVE-24322
> URL: https://issues.apache.org/jira/browse/HIVE-24322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In IMPALA-10247 there was an exception from Hive when tyring to load the data:
> {noformat}
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
> exec.Task: Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>  at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
>  at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>  at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>  at java.io.DataInputStream.readInt(DataInputStream.java:392)
>  at 
> org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
>  ... 29 more
> {noformat}
> The reason of the exception was that Hive was trying to read an empty 
> manifest file. Manifest files are used in case of direct insert to determine 
> which files needs to be kept and which one needs to be cleaned up. They are 
> created by the tasks and they use the task attempt Id as postfix. In this 
> particular test what happened is that one of the container ran out of memory 
> so Tez decided to kill it right after the manifest file got created but 
> before the paths got written into the manifest file. This was the manifest 
> file for the task attempt 0. Then Tez assigned a new container to the task, 
> so a new attempt was made with attemptId=1. This one was successful, and 
> wrote the manifest file correctly. But Hive didn't know about this, since 
> this out of memory issue got handled by Tez under the hood, so there was no 
> exception in Hive, therefore no clean-up in the manifest folder. And when 
> Hive is reading the manifest files, it just reads every file from the defined 
> folder, so it tried to read the manifest files for attempt 0 and 1 as well.
> If there are multiple manifest files with the same name but different 
> attemptId, Hive should 

[jira] [Commented] (HIVE-24322) In case of direct insert, the attempt ID has to be checked when reading the manifest files

2020-12-15 Thread Marta Kuczora (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249672#comment-17249672
 ] 

Marta Kuczora commented on HIVE-24322:
--

Pushed to master.
Thanks a lot [~szita] for the review!

> In case of direct insert, the attempt ID has to be checked when reading the 
> manifest files
> --
>
> Key: HIVE-24322
> URL: https://issues.apache.org/jira/browse/HIVE-24322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In IMPALA-10247 there was an exception from Hive when tyring to load the data:
> {noformat}
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
> exec.Task: Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>  at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
>  at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>  at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>  at java.io.DataInputStream.readInt(DataInputStream.java:392)
>  at 
> org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
>  ... 29 more
> {noformat}
> The reason of the exception was that Hive was trying to read an empty 
> manifest file. Manifest files are used in case of direct insert to determine 
> which files needs to be kept and which one needs to be cleaned up. They are 
> created by the tasks and they use the task attempt Id as postfix. In this 
> particular test what happened is that one of the container ran out of memory 
> so Tez decided to kill it right after the manifest file got created but 
> before the paths got written into the manifest file. This was the manifest 
> file for the task attempt 0. Then Tez assigned a new container to the task, 
> so a new attempt was made with attemptId=1. This one was successful, and 
> wrote the manifest file correctly. But Hive didn't know about this, since 
> this out of memory issue got handled by Tez under the hood, so there was no 
> exception in Hive, therefore no clean-up in the manifest folder. And when 
> Hive is reading the manifest files, it just reads every file from the defined 
> folder, so it tried to read the manifest files for attempt 0 and 1 as well.
> If there are multiple 

[jira] [Work logged] (HIVE-24322) In case of direct insert, the attempt ID has to be checked when reading the manifest files

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24322?focusedWorklogId=524395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524395
 ]

ASF GitHub Bot logged work on HIVE-24322:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 13:00
Start Date: 15/Dec/20 13:00
Worklog Time Spent: 10m 
  Work Description: kuczoram merged pull request #1774:
URL: https://github.com/apache/hive/pull/1774


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524395)
Time Spent: 20m  (was: 10m)

> In case of direct insert, the attempt ID has to be checked when reading the 
> manifest files
> --
>
> Key: HIVE-24322
> URL: https://issues.apache.org/jira/browse/HIVE-24322
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In IMPALA-10247 there was an exception from Hive when tyring to load the data:
> {noformat}
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
> exec.Task: Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
>  at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
>  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>  at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>  at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
>  at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>  at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>  at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>  at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>  at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>  at java.io.DataInputStream.readInt(DataInputStream.java:392)
>  at 
> org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
>  at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
>  ... 29 more
> {noformat}
> The reason of the exception was that Hive was trying to read an empty 
> manifest file. Manifest files are used in case of direct insert to determine 
> which files needs to be kept and which one needs to be cleaned up. They are 
> created by the tasks and they use the task attempt Id as postfix. In this 
> particular test what happened is that one of the container ran out of memory 
> so Tez decided to kill it right after the manifest file got created but 
> 

[jira] [Commented] (HIVE-24394) Enable printing explain to console at query start

2020-12-15 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249666#comment-17249666
 ] 

Zoltan Haindrich commented on HIVE-24394:
-

[~johang] sorry for the long RTT - december is a bit more crowded (especially 
now)

I see that you are after making things a little bit better - to debug queries 
easier; what do you think about providing more details on the HS2 web interface?
I think the explain was already present there; I feel that is something we 
really under-use - what do you think we really miss there; or its not really an 
option for end users because its a web ui?

> Enable printing explain to console at query start
> -
>
> Key: HIVE-24394
> URL: https://issues.apache.org/jira/browse/HIVE-24394
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, Query Processor
>Affects Versions: 2.3.7, 3.1.2
>Reporter: Johan Gustavsson
>Assignee: Jesus Camacho Rodriguez
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently there is a hive.log.explain.output option that prints extended 
> explain to log. While this is helpful for internal investigations, it limits 
> the information that is available to users. So we should add options to make 
> this print non-extended explain to console,. for general user consumption, to 
> make it easier for users to debug queries and workflows without having to 
> resubmit queries with explain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24434) Filter out materialized views for rewriting if plan pattern is not allowed

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24434?focusedWorklogId=524367=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524367
 ]

ASF GitHub Bot logged work on HIVE-24434:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 10:58
Start Date: 15/Dec/20 10:58
Worklog Time Spent: 10m 
  Work Description: kasakrisz opened a new pull request #1782:
URL: https://github.com/apache/hive/pull/1782


   ### What changes were proposed in this pull request?
   1. Store the scope of each materialized view in the cache/registry. It 
specifies whether the MV can be used in both Calcite and Sql text based query 
rewrites or Sql text based only.
   2. Scope value is calculated when the MV definition is parsed during adding 
the MV to the Registry.
   
   ### Why are the changes needed?
   Number of Materialized views added to the Calcite based rewrite planner 
affects query compilation performance. Too much view definition increases the 
planning time without any benefit. There are plan patterns which are not 
supported by the planner. MV having such patterns in its definition can be 
filtered out before even added to the planner.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=materialized_view_rewrite_11.q -pl itests/qtest -Pitests
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524367)
Remaining Estimate: 0h
Time Spent: 10m

> Filter out materialized views for rewriting if plan pattern is not allowed
> --
>
> Key: HIVE-24434
> URL: https://issues.apache.org/jira/browse/HIVE-24434
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Some materialized views are not enabled for Calcite based rewriting. Rules 
> for validating materialized views are implemented by HIVE-20748. 
> Since text based materialized view query rewrite doesn't have such 
> limitations some logic must be implemented to flag materialized view whether 
> they are enabled to text based rewrite only or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24434) Filter out materialized views for rewriting if plan pattern is not allowed

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24434:
--
Labels: pull-request-available  (was: )

> Filter out materialized views for rewriting if plan pattern is not allowed
> --
>
> Key: HIVE-24434
> URL: https://issues.apache.org/jira/browse/HIVE-24434
> Project: Hive
>  Issue Type: Improvement
>  Components: Materialized views
>Affects Versions: 4.0.0
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Some materialized views are not enabled for Calcite based rewriting. Rules 
> for validating materialized views are implemented by HIVE-20748. 
> Since text based materialized view query rewrite doesn't have such 
> limitations some logic must be implemented to flag materialized view whether 
> they are enabled to text based rewrite only or both.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24538) Enforce strict type checks on all operators relying on comparisons

2020-12-15 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249615#comment-17249615
 ] 

Stamatis Zampetakis commented on HIVE-24538:


After the resolution of HIVE-24534 and HIVE-24538, HIVE-13958 can be closed.

> Enforce strict type checks on all operators relying on comparisons
> --
>
> Key: HIVE-24538
> URL: https://issues.apache.org/jira/browse/HIVE-24538
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Strict type checks are enforced when {{hive.strict.checks.type.safety}} is 
> enabled to prevent unexpected query results in some common cases where 
> implicit type conversions take place.
> At the moment these checks are enforced in most comparison based operators 
> (=,<,>,<=,>=,<>,<=>) but not all. For instance the checks are not active for 
> BETWEEN and IN operators and possibly others. 
> The goal of this issue is to review 
> {{FunctionRegistry#getCommonClassForComparison}} and make sure that operators 
> relying on comparisons are covered (at a minimum BETWEEN, IN).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24538) Enforce strict type checks on all operators relying on comparisons

2020-12-15 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-24538:
--


> Enforce strict type checks on all operators relying on comparisons
> --
>
> Key: HIVE-24538
> URL: https://issues.apache.org/jira/browse/HIVE-24538
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Strict type checks are enforced when {{hive.strict.checks.type.safety}} is 
> enabled to prevent unexpected query results in some common cases where 
> implicit type conversions take place.
> At the moment these checks are enforced in most comparison based operators 
> (=,<,>,<=,>=,<>,<=>) but not all. For instance the checks are not active for 
> BETWEEN and IN operators and possibly others. 
> The goal of this issue is to review 
> {{FunctionRegistry#getCommonClassForComparison}} and make sure that operators 
> relying on comparisons are covered (at a minimum BETWEEN, IN).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24530) Potential NPE in FileSinkOperator.closeRecordwriters method

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24530?focusedWorklogId=524326=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524326
 ]

ASF GitHub Bot logged work on HIVE-24530:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 09:16
Start Date: 15/Dec/20 09:16
Worklog Time Spent: 10m 
  Work Description: kuczoram commented on a change in pull request #1775:
URL: https://github.com/apache/hive/pull/1775#discussion_r543173097



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -1204,8 +1205,10 @@ private void closeWriters(boolean abort) throws 
HiveException {
   private void closeRecordwriters(boolean abort) {
 for (RecordWriter writer : rowOutWriters) {

Review comment:
   You're right, I added a null check for the rowOutWriters as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524326)
Time Spent: 0.5h  (was: 20m)

> Potential NPE in FileSinkOperator.closeRecordwriters method
> ---
>
> Key: HIVE-24530
> URL: https://issues.apache.org/jira/browse/HIVE-24530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> During testing a NPE occurred in the FileSinkOperator.closeRecordwriters 
> method.
> After investigating, turned out there was an underlaying IOException during 
> executing the FileSinkOperator.process method. It got caught by the following 
> code part:
> {noformat}
> } catch (IOException e) {
>   closeWriters(true);
>   throw new HiveException(e);
> } catch (SerDeException e) {
>   closeWriters(true);
>   throw new HiveException(e);
> }
> {noformat}
> First the closeWriters method was called:
> {noformat}
>   private void closeWriters(boolean abort) throws HiveException {
> fpaths.closeWriters(true);
> closeRecordwriters(true);
>   }
>   private void closeRecordwriters(boolean abort) {
> for (RecordWriter writer : rowOutWriters) {
>   try {
> LOG.info("Closing {} on exception", writer);
> writer.close(abort);
>   } catch (IOException e) {
> LOG.error("Error closing rowOutWriter" + writer, e);
>   }
> }
> {noformat}
> If the writers had got closed successfully, a HiveException would have been 
> thrown with the original IOException.
> But when the IOException occurred the writers in the rowOutWriters were not 
> yet initialised, so a NPE occurred. This was very misleading as the NPE was 
> not the real issue, but the original IOException was hidden.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24530) Potential NPE in FileSinkOperator.closeRecordwriters method

2020-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24530?focusedWorklogId=524322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524322
 ]

ASF GitHub Bot logged work on HIVE-24530:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 08:56
Start Date: 15/Dec/20 08:56
Worklog Time Spent: 10m 
  Work Description: szlta commented on a change in pull request #1775:
URL: https://github.com/apache/hive/pull/1775#discussion_r543159074



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
##
@@ -1204,8 +1205,10 @@ private void closeWriters(boolean abort) throws 
HiveException {
   private void closeRecordwriters(boolean abort) {
 for (RecordWriter writer : rowOutWriters) {

Review comment:
   So now there is a null check for the RecordWriter instances in this 
list, but shouldn't we also check for the list itself (rowOutWriters) being 
null here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524322)
Time Spent: 20m  (was: 10m)

> Potential NPE in FileSinkOperator.closeRecordwriters method
> ---
>
> Key: HIVE-24530
> URL: https://issues.apache.org/jira/browse/HIVE-24530
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Marta Kuczora
>Assignee: Marta Kuczora
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During testing a NPE occurred in the FileSinkOperator.closeRecordwriters 
> method.
> After investigating, turned out there was an underlaying IOException during 
> executing the FileSinkOperator.process method. It got caught by the following 
> code part:
> {noformat}
> } catch (IOException e) {
>   closeWriters(true);
>   throw new HiveException(e);
> } catch (SerDeException e) {
>   closeWriters(true);
>   throw new HiveException(e);
> }
> {noformat}
> First the closeWriters method was called:
> {noformat}
>   private void closeWriters(boolean abort) throws HiveException {
> fpaths.closeWriters(true);
> closeRecordwriters(true);
>   }
>   private void closeRecordwriters(boolean abort) {
> for (RecordWriter writer : rowOutWriters) {
>   try {
> LOG.info("Closing {} on exception", writer);
> writer.close(abort);
>   } catch (IOException e) {
> LOG.error("Error closing rowOutWriter" + writer, e);
>   }
> }
> {noformat}
> If the writers had got closed successfully, a HiveException would have been 
> thrown with the original IOException.
> But when the IOException occurred the writers in the rowOutWriters were not 
> yet initialised, so a NPE occurred. This was very misleading as the NPE was 
> not the real issue, but the original IOException was hidden.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)