[jira] [Updated] (HIVE-26047) Vectorized LIKE UDF should use Re2J regex to address JDK-8203458

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26047:
--
Labels: pull-request-available  (was: )

> Vectorized LIKE UDF should use Re2J regex to address JDK-8203458
> 
>
> Key: HIVE-26047
> URL: https://issues.apache.org/jira/browse/HIVE-26047
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Below pattern is taking a long time to validate regex in java8 with same 
> trace as shown in java bug
> [JDK-8203458|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458]
>  
> {code:java}
> import java.util.regex.Pattern;
> public class Test {
>   public static void main(String args[]) {
> String pattern = 
> "a_b";
>  
> Pattern CHAIN_PATTERN = Pattern.compile("(%?[^%_]+%?)+");
> CHAIN_PATTERN.matcher(pattern).matches(); 
>   }
> }
> {code}
> Same is reproducible with following SQL
> {code:java}
> create table table1(name string);
> insert into table1 (name) values 
> ('a_b');
> select * from table1 where name like 
> "a_b";{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26047) Vectorized LIKE UDF should use Re2J regex to address JDK-8203458

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26047?focusedWorklogId=743569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743569
 ]

ASF GitHub Bot logged work on HIVE-26047:
-

Author: ASF GitHub Bot
Created on: 18/Mar/22 03:56
Start Date: 18/Mar/22 03:56
Worklog Time Spent: 10m 
  Work Description: nareshpr opened a new pull request #3117:
URL: https://github.com/apache/hive/pull/3117


   ### What changes were proposed in this pull request?
   Vectorized LIKE udf is taking proportionately higher time depending on the 
size of input string in UDF.
   
   ### Why are the changes needed?
   To support filter condition based on input data.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added testcase as part of PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743569)
Remaining Estimate: 0h
Time Spent: 10m

> Vectorized LIKE UDF should use Re2J regex to address JDK-8203458
> 
>
> Key: HIVE-26047
> URL: https://issues.apache.org/jira/browse/HIVE-26047
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Below pattern is taking a long time to validate regex in java8 with same 
> trace as shown in java bug
> [JDK-8203458|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458]
>  
> {code:java}
> import java.util.regex.Pattern;
> public class Test {
>   public static void main(String args[]) {
> String pattern = 
> "a_b";
>  
> Pattern CHAIN_PATTERN = Pattern.compile("(%?[^%_]+%?)+");
> CHAIN_PATTERN.matcher(pattern).matches(); 
>   }
> }
> {code}
> Same is reproducible with following SQL
> {code:java}
> create table table1(name string);
> insert into table1 (name) values 
> ('a_b');
> select * from table1 where name like 
> "a_b";{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26047) Vectorized LIKE UDF should use Re2J regex to address JDK-8203458

2022-03-17 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-26047:
--
Description: 
Below pattern is taking a long time to validate regex in java8 with same trace 
as shown in java bug

[JDK-8203458|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458]

 
{code:java}
import java.util.regex.Pattern;
public class Test {
  public static void main(String args[]) {
String pattern = 
"a_b";
 
Pattern CHAIN_PATTERN = Pattern.compile("(%?[^%_]+%?)+");
CHAIN_PATTERN.matcher(pattern).matches(); 
  }
}
{code}
Same is reproducible with following SQL
{code:java}
create table table1(name string);
insert into table1 (name) values 
('a_b');
select * from table1 where name like 
"a_b";{code}

  was:
Below pattern is taking a long time to validate regex in java8 with same trace 
as shown in java bug 
[[JDK-8203458||https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458]
 [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458] 
[]|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458]
import java.util.regex.Pattern;

public class ABCD {

  public static void main(String args[]) {
String pattern = 
"a_b";
Pattern CHAIN_PATTERN = Pattern.compile("(%?[^%_]+%?)+");
CHAIN_PATTERN.matcher(pattern).matches();
  }
}
Same is reproducible with following SQL
{code:java}
create table table1(name string);
insert into table1 (name) values 
('a_b');
select * from table1 where name like 
"a_b";{code}


> Vectorized LIKE UDF should use Re2J regex to address JDK-8203458
> 
>
> Key: HIVE-26047
> URL: https://issues.apache.org/jira/browse/HIVE-26047
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> Below pattern is taking a long time to validate regex in java8 with same 
> trace as shown in java bug
> [JDK-8203458|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458]
>  
> {code:java}
> import java.util.regex.Pattern;
> public class Test {
>   public static void main(String args[]) {
> String pattern = 
> "a_b";
>  
> Pattern CHAIN_PATTERN = Pattern.compile("(%?[^%_]+%?)+");
> CHAIN_PATTERN.matcher(pattern).matches(); 
>   }
> }
> {code}
> Same is reproducible with following SQL
> {code:java}
> create table table1(name string);
> insert into table1 (name) values 
> ('a_b');
> select * from table1 where name like 
> "a_b";{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26047) Vectorized LIKE UDF should use Re2J regex to address JDK-8203458

2022-03-17 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R reassigned HIVE-26047:
-


> Vectorized LIKE UDF should use Re2J regex to address JDK-8203458
> 
>
> Key: HIVE-26047
> URL: https://issues.apache.org/jira/browse/HIVE-26047
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>
> Below pattern is taking a long time to validate regex in java8 with same 
> trace as shown in java bug 
> [[JDK-8203458||https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458]
>  [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458] 
> []|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8203458]
> import java.util.regex.Pattern;
> public class ABCD {
>   public static void main(String args[]) {
> String pattern = 
> "a_b";
> Pattern CHAIN_PATTERN = Pattern.compile("(%?[^%_]+%?)+");
> CHAIN_PATTERN.matcher(pattern).matches();
>   }
> }
> Same is reproducible with following SQL
> {code:java}
> create table table1(name string);
> insert into table1 (name) values 
> ('a_b');
> select * from table1 where name like 
> "a_b";{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25667) Unify code managing JDBC databases in tests

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25667?focusedWorklogId=743502&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743502
 ]

ASF GitHub Bot logged work on HIVE-25667:
-

Author: ASF GitHub Bot
Created on: 18/Mar/22 00:16
Start Date: 18/Mar/22 00:16
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #2919:
URL: https://github.com/apache/hive/pull/2919#issuecomment-1071892104


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743502)
Time Spent: 2.5h  (was: 2h 20m)

> Unify code managing JDBC databases in tests
> ---
>
> Key: HIVE-25667
> URL: https://issues.apache.org/jira/browse/HIVE-25667
> Project: Hive
>  Issue Type: Task
>  Components: Testing Infrastructure
>Affects Versions: 4.0.0
>Reporter: Stamatis Zampetakis
>Assignee: Mark Bathori
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently there are two class hierarchies managing JDBC databases in tests, 
> [DatabaseRule|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/dbinstall/rules/DatabaseRule.java]
>  and 
> [AbstractExternalDB|https://github.com/apache/hive/blob/d35de014dd49fdcfe0aacb68e6c587beff6d1dea/itests/util/src/main/java/org/apache/hadoop/hive/ql/externalDB/AbstractExternalDB.java].
>  There are many similarities between these hierarchies and certain parts are 
> duplicated. 
> The goal of this JIRA is to refactor the aforementioned hierarchies to reduce 
> code duplication and improve extensibility.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25575) Add support for JWT authentication in HTTP mode

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25575?focusedWorklogId=743488&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743488
 ]

ASF GitHub Bot logged work on HIVE-25575:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 23:42
Start Date: 17/Mar/22 23:42
Worklog Time Spent: 10m 
  Work Description: hsnusonic commented on a change in pull request #3006:
URL: https://github.com/apache/hive/pull/3006#discussion_r829592891



##
File path: 
service/src/java/org/apache/hive/service/auth/jwt/URLBasedJWKSProvider.java
##
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.auth.jwt;
+
+import com.nimbusds.jose.JWSHeader;
+import com.nimbusds.jose.jwk.JWK;
+import com.nimbusds.jose.jwk.JWKMatcher;
+import com.nimbusds.jose.jwk.JWKSelector;
+import com.nimbusds.jose.jwk.JWKSet;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URL;
+import java.text.ParseException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Provides a way to get JWKS json. Hive will use this to verify the incoming 
JWTs.
+ */
+public class URLBasedJWKSProvider {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(URLBasedJWKSProvider.class.getName());
+  private final HiveConf conf;
+  private List jwkSets = new ArrayList<>();
+
+  public URLBasedJWKSProvider(HiveConf conf) throws IOException, 
ParseException {
+this.conf = conf;
+loadJWKSets();
+  }
+
+  /**
+   * Fetches the JWKS and stores into memory. The JWKS are expected to be in 
the standard form as defined here -
+   * https://datatracker.ietf.org/doc/html/rfc7517#appendix-A.
+   */
+  private void loadJWKSets() throws IOException, ParseException {
+String jwksURL = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_SERVER2_AUTHENTICATION_JWT_JWKS_URL);
+String[] jwksURLs = jwksURL.split(",");
+for (String urlString : jwksURLs) {
+  URL url = new URL(urlString);
+  jwkSets.add(JWKSet.load(url));
+  LOG.info("Loaded JWKS from " + urlString);
+}
+  }
+
+  /**
+   * Returns filtered JWKS by one or more criteria, such as kid, typ, alg.
+   */
+  public List getJWKs(JWSHeader header) {
+List jwks = new ArrayList<>();
+JWKSelector selector = new JWKSelector(JWKMatcher.forJWSHeader(header));
+for (JWKSet jwkSet : jwkSets) {
+  List selectedJwks = selector.select(jwkSet);
+  if (selectedJwks != null) {

Review comment:
   Thanks for noticing this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743488)
Time Spent: 4h 10m  (was: 4h)

> Add support for JWT authentication in HTTP mode
> ---
>
> Key: HIVE-25575
> URL: https://issues.apache.org/jira/browse/HIVE-25575
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, JDBC
>Affects Versions: 4.0.0
>Reporter: Shubham Chaurasia
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> It would be good to support JWT auth mechanism in hive. In order to implement 
> it, we would need the following - 
> On HS2 side -
> 1. Accept JWT in Authorization: Bearer header.
> 2. Fetch JWKS from a public endpoint to verify JWT signature, to start with 
> we can fetch on HS2 start up.
> 3. Verify JWT Signature.
> On JDBC Client side - 
> 1. Hive jdbc client should be able to accept jwt in JDBC url. (will add more 
> details)
> 2. Client should also be able to pick up JWT from an env var 

[jira] [Work logged] (HIVE-25575) Add support for JWT authentication in HTTP mode

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25575?focusedWorklogId=743486&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743486
 ]

ASF GitHub Bot logged work on HIVE-25575:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 23:41
Start Date: 17/Mar/22 23:41
Worklog Time Spent: 10m 
  Work Description: hsnusonic commented on a change in pull request #3006:
URL: https://github.com/apache/hive/pull/3006#discussion_r829592567



##
File path: 
service/src/java/org/apache/hive/service/auth/jwt/URLBasedJWKSProvider.java
##
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.auth.jwt;
+
+import com.nimbusds.jose.JWSHeader;
+import com.nimbusds.jose.jwk.JWK;
+import com.nimbusds.jose.jwk.JWKMatcher;
+import com.nimbusds.jose.jwk.JWKSelector;
+import com.nimbusds.jose.jwk.JWKSet;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URL;
+import java.text.ParseException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Implementation of {@link JWKSProvider} which reads JWKS from URL.
+ */
+public class URLBasedJWKSProvider implements JWKSProvider {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(URLBasedJWKSProvider.class.getName());
+  private final HiveConf conf;
+  private List jwkSets = new ArrayList<>();
+
+  public URLBasedJWKSProvider(HiveConf conf) {
+this.conf = conf;
+loadJWKSets();
+  }
+
+  private void loadJWKSets() {
+String jwksURL = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_SERVER2_THRIFT_HTTP_JWT_JWKS_URL);
+List jwksURLs = 
Arrays.stream(jwksURL.split(",")).collect(Collectors.toList());
+for (String urlString : jwksURLs) {
+  try {
+URL url = new URL(urlString);
+jwkSets.add(JWKSet.load(url));
+LOG.info("Loaded JWKS from " + urlString);
+  } catch (IOException | ParseException e) {
+LOG.info("Failed to retrieve JWKS from {}: {}", urlString, 
e.getMessage());
+  }
+}
+  }
+
+  @Override
+  public List getJWKs(JWSHeader header) {
+List jwks = new ArrayList<>();
+JWKSelector selector = new JWKSelector(JWKMatcher.forJWSHeader(header));

Review comment:
   Good catch! Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743486)
Time Spent: 4h  (was: 3h 50m)

> Add support for JWT authentication in HTTP mode
> ---
>
> Key: HIVE-25575
> URL: https://issues.apache.org/jira/browse/HIVE-25575
> Project: Hive
>  Issue Type: New Feature
>  Components: HiveServer2, JDBC
>Affects Versions: 4.0.0
>Reporter: Shubham Chaurasia
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> It would be good to support JWT auth mechanism in hive. In order to implement 
> it, we would need the following - 
> On HS2 side -
> 1. Accept JWT in Authorization: Bearer header.
> 2. Fetch JWKS from a public endpoint to verify JWT signature, to start with 
> we can fetch on HS2 start up.
> 3. Verify JWT Signature.
> On JDBC Client side - 
> 1. Hive jdbc client should be able to accept jwt in JDBC url. (will add more 
> details)
> 2. Client should also be able to pick up JWT from an env var if it's defined.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25575) Add support for JWT authentication in HTTP mode

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25575?focusedWorklogId=743403&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743403
 ]

ASF GitHub Bot logged work on HIVE-25575:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 20:16
Start Date: 17/Mar/22 20:16
Worklog Time Spent: 10m 
  Work Description: sourabh912 commented on a change in pull request #3006:
URL: https://github.com/apache/hive/pull/3006#discussion_r829414588



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -4127,7 +4127,9 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 "  (Use with property 
hive.server2.custom.authentication.class)\n" +
 "  PAM: Pluggable authentication module\n" +
 "  NOSASL:  Raw transport\n" +
-"  SAML: SAML 2.0 compliant authentication. This is only supported in 
http transport mode."),
+"  SAML: SAML 2.0 compliant authentication. This is only supported in 
http transport mode.\n" +
+"  JWT: JWT based authentication, JWT needs to contain the user name 
as subject. This is only supported in\n" +

Review comment:
   nit: Here we should also document that HS2 expects Asymmetric key for 
JWT signature verification. 

##
File path: 
service/src/java/org/apache/hive/service/auth/jwt/URLBasedJWKSProvider.java
##
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hive.service.auth.jwt;
+
+import com.nimbusds.jose.JWSHeader;
+import com.nimbusds.jose.jwk.JWK;
+import com.nimbusds.jose.jwk.JWKMatcher;
+import com.nimbusds.jose.jwk.JWKSelector;
+import com.nimbusds.jose.jwk.JWKSet;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URL;
+import java.text.ParseException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.stream.Collectors;
+
+/**
+ * Provides a way to get JWKS json. Hive will use this to verify the incoming 
JWTs.
+ */
+public class URLBasedJWKSProvider {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(URLBasedJWKSProvider.class.getName());
+  private final HiveConf conf;
+  private List jwkSets = new ArrayList<>();
+
+  public URLBasedJWKSProvider(HiveConf conf) throws IOException, 
ParseException {
+this.conf = conf;
+loadJWKSets();
+  }
+
+  /**
+   * Fetches the JWKS and stores into memory. The JWKS are expected to be in 
the standard form as defined here -
+   * https://datatracker.ietf.org/doc/html/rfc7517#appendix-A.
+   */
+  private void loadJWKSets() throws IOException, ParseException {
+String jwksURL = HiveConf.getVar(conf, 
HiveConf.ConfVars.HIVE_SERVER2_AUTHENTICATION_JWT_JWKS_URL);
+String[] jwksURLs = jwksURL.split(",");
+for (String urlString : jwksURLs) {
+  URL url = new URL(urlString);
+  jwkSets.add(JWKSet.load(url));
+  LOG.info("Loaded JWKS from " + urlString);
+}
+  }
+
+  /**
+   * Returns filtered JWKS by one or more criteria, such as kid, typ, alg.
+   */
+  public List getJWKs(JWSHeader header) {
+List jwks = new ArrayList<>();
+JWKSelector selector = new JWKSelector(JWKMatcher.forJWSHeader(header));
+for (JWKSet jwkSet : jwkSets) {
+  List selectedJwks = selector.select(jwkSet);
+  if (selectedJwks != null) {

Review comment:
   `select` api in selector never returns null. Therefore we don't need a 
null check 

##
File path: service/src/java/org/apache/hive/service/auth/jwt/JWTValidator.java
##
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LI

[jira] [Work logged] (HIVE-26029) Upgrade netty to 4.1.75.Final due to CVE

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26029?focusedWorklogId=743284&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743284
 ]

ASF GitHub Bot logged work on HIVE-26029:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 17:35
Start Date: 17/Mar/22 17:35
Worklog Time Spent: 10m 
  Work Description: hsnusonic closed pull request #3097:
URL: https://github.com/apache/hive/pull/3097


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743284)
Time Spent: 0.5h  (was: 20m)

> Upgrade netty to 4.1.75.Final due to CVE
> 
>
> Key: HIVE-26029
> URL: https://issues.apache.org/jira/browse/HIVE-26029
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As title.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26028) Upgrade pac4j-saml-opensamlv3 to 4.5.5 due to CVE

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26028?focusedWorklogId=743282&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743282
 ]

ASF GitHub Bot logged work on HIVE-26028:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 17:35
Start Date: 17/Mar/22 17:35
Worklog Time Spent: 10m 
  Work Description: hsnusonic closed pull request #3096:
URL: https://github.com/apache/hive/pull/3096


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743282)
Time Spent: 40m  (was: 0.5h)

> Upgrade pac4j-saml-opensamlv3 to 4.5.5 due to CVE
> -
>
> Key: HIVE-26028
> URL: https://issues.apache.org/jira/browse/HIVE-26028
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As title.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26029) Upgrade netty to 4.1.75.Final due to CVE

2022-03-17 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-26029.
--
Fix Version/s: 4.0.0-alpha-1
   Resolution: Fixed

Fix has been merged to master. Closing the jira. Thank you for the patch 
[~hsnusonic]

> Upgrade netty to 4.1.75.Final due to CVE
> 
>
> Key: HIVE-26029
> URL: https://issues.apache.org/jira/browse/HIVE-26029
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As title.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26028) Upgrade pac4j-saml-opensamlv3 to 4.5.5 due to CVE

2022-03-17 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam resolved HIVE-26028.
--
Target Version/s: 4.0.0-alpha-1
  Resolution: Fixed

Fix has been merged to master. Closing the jira. Thank you for the fix 
[~hsnusonic]

> Upgrade pac4j-saml-opensamlv3 to 4.5.5 due to CVE
> -
>
> Key: HIVE-26028
> URL: https://issues.apache.org/jira/browse/HIVE-26028
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As title.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26029) Upgrade netty to 4.1.75.Final due to CVE

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26029?focusedWorklogId=743266&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743266
 ]

ASF GitHub Bot logged work on HIVE-26029:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 17:18
Start Date: 17/Mar/22 17:18
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #3097:
URL: https://github.com/apache/hive/pull/3097#issuecomment-1071113615


   Fix has been merged to master. Please close the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743266)
Time Spent: 20m  (was: 10m)

> Upgrade netty to 4.1.75.Final due to CVE
> 
>
> Key: HIVE-26029
> URL: https://issues.apache.org/jira/browse/HIVE-26029
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As title.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26028) Upgrade pac4j-saml-opensamlv3 to 4.5.5 due to CVE

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26028?focusedWorklogId=743264&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743264
 ]

ASF GitHub Bot logged work on HIVE-26028:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 17:18
Start Date: 17/Mar/22 17:18
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on pull request #3096:
URL: https://github.com/apache/hive/pull/3096#issuecomment-1071113114


   Fix has been merged. Please close the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743264)
Time Spent: 0.5h  (was: 20m)

> Upgrade pac4j-saml-opensamlv3 to 4.5.5 due to CVE
> -
>
> Key: HIVE-26028
> URL: https://issues.apache.org/jira/browse/HIVE-26028
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As title.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26044) Remove hardcoded version references from the tests

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26044:
--
Labels: pull-request-available  (was: )

> Remove hardcoded version references from the tests
> --
>
> Key: HIVE-26044
> URL: https://issues.apache.org/jira/browse/HIVE-26044
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are several places where there are hardcoded version references in the 
> tests.
> We should remove them to so it is easier to change versions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26044) Remove hardcoded version references from the tests

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26044?focusedWorklogId=743243&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743243
 ]

ASF GitHub Bot logged work on HIVE-26044:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 16:29
Start Date: 17/Mar/22 16:29
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #3115:
URL: https://github.com/apache/hive/pull/3115


   ### What changes were proposed in this pull request?
   Remove hardcoded version references from the tests
   
   ### Why are the changes needed?
   For easier version changes
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Running the unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743243)
Remaining Estimate: 0h
Time Spent: 10m

> Remove hardcoded version references from the tests
> --
>
> Key: HIVE-26044
> URL: https://issues.apache.org/jira/browse/HIVE-26044
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There are several places where there are hardcoded version references in the 
> tests.
> We should remove them to so it is easier to change versions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26044) Remove hardcoded version references from the tests

2022-03-17 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-26044:
--
Summary: Remove hardcoded version references from the tests  (was: Remove 
hard coded version references from the tests)

> Remove hardcoded version references from the tests
> --
>
> Key: HIVE-26044
> URL: https://issues.apache.org/jira/browse/HIVE-26044
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> There are several places where there are hardcoded version references in the 
> tests.
> We should remove them to so it is easier to change versions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-26044) Remove hard coded version references from the tests

2022-03-17 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-26044:
-


> Remove hard coded version references from the tests
> ---
>
> Key: HIVE-26044
> URL: https://issues.apache.org/jira/browse/HIVE-26044
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> There are several places where there are hardcoded version references in the 
> tests.
> We should remove them to so it is easier to change versions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-26002) Preparing for 4.0.0-alpha-1 development

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26002?focusedWorklogId=743237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743237
 ]

ASF GitHub Bot logged work on HIVE-26002:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 16:25
Start Date: 17/Mar/22 16:25
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #3081:
URL: https://github.com/apache/hive/pull/3081


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743237)
Time Spent: 1.5h  (was: 1h 20m)

> Preparing for 4.0.0-alpha-1 development
> ---
>
> Key: HIVE-26002
> URL: https://issues.apache.org/jira/browse/HIVE-26002
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> For the release we need to create the appropriate sql scripts for HMS db 
> initialization



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (HIVE-26002) Preparing for 4.0.0-alpha-1 development

2022-03-17 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26002.
---
Resolution: Fixed

Pushed to master.
Thanks for the review [~zabetak]

> Preparing for 4.0.0-alpha-1 development
> ---
>
> Key: HIVE-26002
> URL: https://issues.apache.org/jira/browse/HIVE-26002
> Project: Hive
>  Issue Type: Task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> For the release we need to create the appropriate sql scripts for HMS db 
> initialization



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-26042) Fix flaky streaming tests

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26042:
--
Labels: pull-request-available  (was: )

> Fix flaky streaming tests
> -
>
> Key: HIVE-26042
> URL: https://issues.apache.org/jira/browse/HIVE-26042
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The TestStreamingDynamicPartitioning / TestStreaming tests are often failing 
> because of problems with creating directories.
> Example:
> {code}
> 2022-03-17T04:03:39,024 ERROR [main] metastore.RetryingHMSHandler: 
> MetaException(message:Unable to create database managed directory 
> pfile:/home/jenkins/agent/workspace/hive-precommit_PR-3081/streaming/target/warehouse/testing5.db,
>  failed to create database testing5)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.create_database_core(HMSHandler.java:1269)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.create_database(HMSHandler.java:1389)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:146)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy38.create_database(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:1144)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:218)
>   at com.sun.proxy.$Proxy47.createDatabase(Unknown Source)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:608)
>   at 
> org.apache.hadoop.hive.ql.ddl.database.create.CreateDatabaseOperation.execute(CreateDatabaseOperation.java:68)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:106)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:228)
>   at 
> org.apache.hive.streaming.TestStreaming.runDDL(TestStreaming.java:3202)
>   at 
> org.apache.hive.streaming.TestStreaming.createStoreSales(TestStreaming.java:267)
>   at org.apache.hive.streaming.TestStreaming.setup(TestStreaming.java:250)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.run

[jira] [Work logged] (HIVE-26042) Fix flaky streaming tests

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26042?focusedWorklogId=743133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743133
 ]

ASF GitHub Bot logged work on HIVE-26042:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 13:33
Start Date: 17/Mar/22 13:33
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #3114:
URL: https://github.com/apache/hive/pull/3114


   ### What changes were proposed in this pull request?
   We should set the warehouse locations in the tests, so the managed location 
will also be inside the temporary folder. This will prevent the tests to fail, 
if the previous ones left some directories behind
   
   ### Why are the changes needed?
   These tests are often failing on the CI
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Run the tests manually also used a breakpoint to make sure that the 
directories are correct


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743133)
Remaining Estimate: 0h
Time Spent: 10m

> Fix flaky streaming tests
> -
>
> Key: HIVE-26042
> URL: https://issues.apache.org/jira/browse/HIVE-26042
> Project: Hive
>  Issue Type: Test
>Reporter: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The TestStreamingDynamicPartitioning / TestStreaming tests are often failing 
> because of problems with creating directories.
> Example:
> {code}
> 2022-03-17T04:03:39,024 ERROR [main] metastore.RetryingHMSHandler: 
> MetaException(message:Unable to create database managed directory 
> pfile:/home/jenkins/agent/workspace/hive-precommit_PR-3081/streaming/target/warehouse/testing5.db,
>  failed to create database testing5)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.create_database_core(HMSHandler.java:1269)
>   at 
> org.apache.hadoop.hive.metastore.HMSHandler.create_database(HMSHandler.java:1389)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:146)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy38.create_database(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:1144)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:218)
>   at com.sun.proxy.$Proxy47.createDatabase(Unknown Source)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:608)
>   at 
> org.apache.hadoop.hive.ql.ddl.database.create.CreateDatabaseOperation.execute(CreateDatabaseOperation.java:68)
>   at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:84)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:212)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>   at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:361)
>   at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:334)
>   at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:245)
>   at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:106)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:348)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:204)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:153)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:148)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:185)
>   at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:228)
>   at 
> org.apache.hive.streaming.TestStreaming.runDDL(TestStreaming.java:3202)
>   at 
> org.apac

[jira] [Assigned] (HIVE-26043) Use constraint info when creating RexNodes

2022-03-17 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa reassigned HIVE-26043:
-


> Use constraint info when creating RexNodes
> --
>
> Key: HIVE-26043
> URL: https://issues.apache.org/jira/browse/HIVE-26043
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>
> Prior HIVE-23100 Not null constraints affected newly created RexNode type 
> nullability.
> Nullability enables the subquery rewrite algorithm to generate more optimal 
> plan.
> [https://github.com/apache/hive/blob/1213ad3f0ae0e21e7519dc28b8b6d1401cdd1441/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java#L324]
> Example:
> {code:java}
> explain cbo
> select ws_sales_price
>  from web_sales, customer, item
>  where ws_bill_customer_sk = c_customer_sk
>   and ws_item_sk = i_item_sk
>   and ( c_customer_sk = 1
> or
> i_item_id in (select i_item_id
>  from item
>  where i_item_sk in (2, 3)
>  )
>   );
> {code}
> Without not null constraints
> {code:java}
> HiveProject(ws_sales_price=[$2])
>   HiveFilter(condition=[OR(AND(<>($6, 0), IS NOT NULL($8)), =($3, 1))])
> HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], 
> ws_sales_price=[$2], c_customer_sk=[$8], i_item_sk=[$3], i_item_id=[$4], 
> c=[$5], i_item_id0=[$6], literalTrue=[$7])
>   HiveJoin(condition=[=($1, $8)], joinType=[inner], algorithm=[none], 
> cost=[not available])
> HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], 
> cost=[not available])
>   HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], 
> ws_sales_price=[$20])
> HiveFilter(condition=[IS NOT NULL($3)])
>   HiveTableScan(table=[[default, web_sales]], 
> table:alias=[web_sales])
>   HiveJoin(condition=[=($1, $3)], joinType=[left], algorithm=[none], 
> cost=[not available])
> HiveJoin(condition=[true], joinType=[inner], algorithm=[none], 
> cost=[not available])
>   HiveProject(i_item_sk=[$0], i_item_id=[$1])
> HiveTableScan(table=[[default, item]], table:alias=[item])
>   HiveProject(c=[$0])
> HiveAggregate(group=[{}], c=[COUNT()])
>   HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
> HiveTableScan(table=[[default, item]], table:alias=[item])
> HiveProject(i_item_id=[$0], literalTrue=[true])
>   HiveAggregate(group=[{1}])
> HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
>   HiveTableScan(table=[[default, item]], table:alias=[item])
> HiveProject(c_customer_sk=[$0])
>   HiveTableScan(table=[[default, customer]], table:alias=[customer])
> {code}
> With not null constraints
> {code:java}
> HiveProject(ws_sales_price=[$2])
>   HiveFilter(condition=[OR(IS NOT NULL($7), =($3, 1))])
> HiveProject(ws_item_sk=[$0], ws_bill_customer_sk=[$1], 
> ws_sales_price=[$2], c_customer_sk=[$7], i_item_sk=[$3], i_item_id=[$4], 
> i_item_id0=[$5], literalTrue=[$6])
>   HiveJoin(condition=[=($1, $7)], joinType=[inner], algorithm=[none], 
> cost=[not available])
> HiveJoin(condition=[=($0, $3)], joinType=[inner], algorithm=[none], 
> cost=[not available])
>   HiveProject(ws_item_sk=[$2], ws_bill_customer_sk=[$3], 
> ws_sales_price=[$20])
> HiveFilter(condition=[IS NOT NULL($3)])
>   HiveTableScan(table=[[default, web_sales]], 
> table:alias=[web_sales])
>   HiveJoin(condition=[=($1, $2)], joinType=[left], algorithm=[none], 
> cost=[not available])
> HiveProject(i_item_sk=[$0], i_item_id=[$1])
>   HiveTableScan(table=[[default, item]], table:alias=[item])
> HiveProject(i_item_id=[$0], literalTrue=[true])
>   HiveAggregate(group=[{1}])
> HiveFilter(condition=[IN($0, 2:BIGINT, 3:BIGINT)])
>   HiveTableScan(table=[[default, item]], table:alias=[item])
> HiveProject(c_customer_sk=[$0])
>   HiveTableScan(table=[[default, customer]], table:alias=[customer])
> {code}
> In the first plan when not null constraints was ignored there is an extra 
> {{item}} table join without join condition:
> {code:java}
> HiveJoin(condition=[true], joinType=[inner], algorithm=[none], 
> cost=[not available])
>   HiveProject(i_item_sk=[$0], i_item_id=[$1])
> HiveTableScan(table=[[default, item]], table:alias=[item])
>   HiveProject(c=[$0])
> HiveAggregate(group=[{}], c=[COUNT()])
>   HiveFilter(co

[jira] [Work logged] (HIVE-26040) Fix DirectSqlUpdateStat.getNextCSIdForMPartitionColumnStatistics for mssql

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26040?focusedWorklogId=743090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-743090
 ]

ASF GitHub Bot logged work on HIVE-26040:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 11:25
Start Date: 17/Mar/22 11:25
Worklog Time Spent: 10m 
  Work Description: pvary merged pull request #3112:
URL: https://github.com/apache/hive/pull/3112


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 743090)
Time Spent: 20m  (was: 10m)

> Fix DirectSqlUpdateStat.getNextCSIdForMPartitionColumnStatistics for mssql
> --
>
> Key: HIVE-26040
> URL: https://issues.apache.org/jira/browse/HIVE-26040
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=list_bucket_dml_9.q 
> -Dtest.metastore.db=mssql
> {code}
> Fails with
> {code}
> 2022-03-15T07:57:17,078 ERROR [2b933b88-6083-4750-b151-2d2c7e04ccce main] 
> metastore.DirectSqlUpdateStat: Unable to 
> getNextCSIdForMPartitionColumnStatistics
> com.microsoft.sqlserver.jdbc.SQLServerException: Line 1: FOR UPDATE clause 
> allowed only for DECLARE CURSOR.
> at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:258)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1535)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:845)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:752)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151) 
> ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:219)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:199)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:654)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.zaxxer.hikari.pool.ProxyStatement.executeQuery(ProxyStatement.java:108) 
> ~[HikariCP-2.6.1.jar:?]
> at 
> com.zaxxer.hikari.pool.HikariProxyStatement.executeQuery(HikariProxyStatement.java)
>  ~[HikariCP-2.6.1.jar:?]
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.getNextCSIdForMPartitionColumnStatistics(DirectSqlUpdateStat.java:676)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:2966)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9849)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_261]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_261]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_261]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
> [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> com.sun.proxy.$Proxy60.updatePartitionColumnStatisticsInBatch(Unknown Source) 
> [?:?]
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7060)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBa

[jira] [Resolved] (HIVE-26040) Fix DirectSqlUpdateStat.getNextCSIdForMPartitionColumnStatistics for mssql

2022-03-17 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary resolved HIVE-26040.
---
Resolution: Fixed

Pushed to master.
Thanks for the review [~Marton Bod]!

> Fix DirectSqlUpdateStat.getNextCSIdForMPartitionColumnStatistics for mssql
> --
>
> Key: HIVE-26040
> URL: https://issues.apache.org/jira/browse/HIVE-26040
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=list_bucket_dml_9.q 
> -Dtest.metastore.db=mssql
> {code}
> Fails with
> {code}
> 2022-03-15T07:57:17,078 ERROR [2b933b88-6083-4750-b151-2d2c7e04ccce main] 
> metastore.DirectSqlUpdateStat: Unable to 
> getNextCSIdForMPartitionColumnStatistics
> com.microsoft.sqlserver.jdbc.SQLServerException: Line 1: FOR UPDATE clause 
> allowed only for DECLARE CURSOR.
> at 
> com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:258)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1535)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:845)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:752)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151) 
> ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:219)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:199)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:654)
>  ~[mssql-jdbc-6.2.1.jre8.jar:?]
> at 
> com.zaxxer.hikari.pool.ProxyStatement.executeQuery(ProxyStatement.java:108) 
> ~[HikariCP-2.6.1.jar:?]
> at 
> com.zaxxer.hikari.pool.HikariProxyStatement.executeQuery(HikariProxyStatement.java)
>  ~[HikariCP-2.6.1.jar:?]
> at 
> org.apache.hadoop.hive.metastore.DirectSqlUpdateStat.getNextCSIdForMPartitionColumnStatistics(DirectSqlUpdateStat.java:676)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.updatePartitionColumnStatisticsBatch(MetaStoreDirectSql.java:2966)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.updatePartitionColumnStatisticsInBatch(ObjectStore.java:9849)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_261]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_261]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_261]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) 
> [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> com.sun.proxy.$Proxy60.updatePartitionColumnStatisticsInBatch(Unknown Source) 
> [?:?]
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsForOneBatch(HMSHandler.java:7060)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.updatePartitionColStatsInBatch(HMSHandler.java:7113)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.metastore.HMSHandler.set_aggr_stats_for(HMSHandler.java:9137)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_261]
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_261]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_261]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]

[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=742997&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-742997
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 07:31
Start Date: 17/Mar/22 07:31
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r828821803



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -468,6 +459,25 @@ void findUnknownPartitions(Table table, Set 
partPaths, byte[] filterExp,
 }
   }
 }
+
+Set partPathsInMS = partPaths;
+// don't want the table dir
+partPathsInMS.remove(tablePath);
+// remove partition paths in partPathsInMS, to getPartitionsNotOnFs
+partPathsInMS.removeAll(allPartDirs);
+// There can be edge case where user can define partition directory 
outside of table directory
+// to avoid eviction of such partitions
+// we check for partition path not exists and add to result for 
getPartitionsNotOnFs.
+for (Path partPath : partPathsInMS) {
+  FileSystem fs = partPath.getFileSystem(conf);

Review comment:
   As you pointed out, Having multi-threaded check of partition files  here 
will be useful.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 742997)
Time Spent: 4h 40m  (was: 4.5h)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> 

[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=742996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-742996
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 07:30
Start Date: 17/Mar/22 07:30
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on pull request #3053:
URL: https://github.com/apache/hive/pull/3053#issuecomment-1070436169


   Try to squash merge all the commits into one single commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 742996)
Time Spent: 4.5h  (was: 4h 20m)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>   at 
> com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
>   at 
> com.amazonaws.thirdparty.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>   at 
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1331)
>   at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
>   

[jira] [Work logged] (HIVE-25980) Reduce fs calls in HiveMetaStoreChecker.checkTable

2022-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25980?focusedWorklogId=742995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-742995
 ]

ASF GitHub Bot logged work on HIVE-25980:
-

Author: ASF GitHub Bot
Created on: 17/Mar/22 07:28
Start Date: 17/Mar/22 07:28
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #3053:
URL: https://github.com/apache/hive/pull/3053#discussion_r828821293



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -468,6 +459,25 @@ void findUnknownPartitions(Table table, Set 
partPaths, byte[] filterExp,
 }
   }
 }
+
+Set partPathsInMS = partPaths;
+// don't want the table dir
+partPathsInMS.remove(tablePath);
+// remove partition paths in partPathsInMS, to getPartitionsNotOnFs
+partPathsInMS.removeAll(allPartDirs);

Review comment:
   allPartDirs here contains all partition path which are there in 
FileSystem but not in metastore. Removing it from partPathsInMS doesn't make 
sense.
   
   I think this operation should be done before 
allPartDirs.removeAll(partPaths);

##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -468,6 +459,25 @@ void findUnknownPartitions(Table table, Set 
partPaths, byte[] filterExp,
 }
   }
 }
+
+Set partPathsInMS = partPaths;
+// don't want the table dir
+partPathsInMS.remove(tablePath);
+// remove partition paths in partPathsInMS, to getPartitionsNotOnFs
+partPathsInMS.removeAll(allPartDirs);
+// There can be edge case where user can define partition directory 
outside of table directory
+// to avoid eviction of such partitions
+// we check for partition path not exists and add to result for 
getPartitionsNotOnFs.
+for (Path partPath : partPathsInMS) {
+  FileSystem fs = partPath.getFileSystem(conf);

Review comment:
   As you pointed out, Having multi-threaded check of partition files will 
be useful.

##
File path: ql/src/test/queries/clientpositive/msck_repair_multi_thread.q
##
@@ -0,0 +1,33 @@
+DROP TABLE IF EXISTS repairtable_hive_25980;

Review comment:
   file name doesn't match the optimization we are doing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 742995)
Time Spent: 4h 20m  (was: 4h 10m)

> Reduce fs calls in HiveMetaStoreChecker.checkTable
> --
>
> Key: HIVE-25980
> URL: https://issues.apache.org/jira/browse/HIVE-25980
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Chiran Ravani
>Assignee: Chiran Ravani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> MSCK Repair table for high partition table can perform slow on Cloud Storage 
> such as S3, one of the case we found where slowness was observed in 
> HiveMetaStoreChecker.checkTable.
> {code:java}
> "HiveServer2-Background-Pool: Thread-382" #382 prio=5 os_prio=0 
> tid=0x7f97fc4a4000 nid=0x5c2a runnable [0x7f97c41a8000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>   at java.net.SocketInputStream.read(SocketInputStream.java:171)
>   at java.net.SocketInputStream.read(SocketInputStream.java:141)
>   at 
> sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
>   at 
> sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
>   at 
> sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1341)
>   at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
>   at 
> sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:957)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
>   at 
> com.amazonaws.