[jira] [Work logged] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=452835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452835
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:08
Start Date: 30/Jun/20 07:08
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651592714


   The assertion error is due to the fact both shaded and non shaded jars are 
there in classpath.
   ``` Caused by: java.lang.AssertionError
at 
org.apache.calcite.avatica.AvaticaUtils.instantiatePlugin(AvaticaUtils.java:229)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$5.apply(ConnectionConfigImpl.java:392)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.get_(ConnectionConfigImpl.java:173)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.getPlugin(ConnectionConfigImpl.java:296)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.getPlugin(ConnectionConfigImpl.java:282)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.config.CalciteConnectionConfigImpl.typeSystem(CalciteConnectionConfigImpl.java:155)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.(CalciteConnectionImpl.java:127)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteJdbc41Factory$CalciteJdbc41Connection.(CalciteJdbc41Factory.java:115)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection(CalciteJdbc41Factory.java:59)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection(CalciteJdbc41Factory.java:44)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteFactory.newConnection(CalciteFactory.java:53) 
~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138)
 ~[avatica-1.12.0.jar:1.12.0]
at java.sql.DriverManager.getConnection(DriverManager.java:664) 
~[?:1.8.0_181]
at java.sql.DriverManager.getConnection(DriverManager.java:208) 
~[?:1.8.0_181]
at 
org.apache.hive.org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:175)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1555)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:540)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:?]  
   ```
   
   Post ``at java.sql.DriverManager.getConnection(DriverManager.java:664) 
~[?:1.8.0_181]`` rather than coming back to ``hive-exec`` it goes back to the 
non shaded jars ``calcite-core``
   
   The driver gets registered with original jar only so it executes call to the 
non-shaded ones. 
   Ideally we can remove the original ones for itests??
   Removing that takes away that error, but driver registration is done as 
``HiveDriver`` and while getting connection it wants a `jdbc:calcite` 
hard-coded in `Framework` class so it gets a:
   ``` Caused by: java.sql.SQLException: No suitable driver found for 
jdbc:calcite:
at java.sql.DriverManager.getConnection(DriverManager.java:689) 
~[?:1.8.0_181]
at java.sql.DriverManager.getConnection(DriverManager.java:208) 
~[?:1.8.0_181]
at 
org.apache.hive.org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:175)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]```

   Is there are configuration or workaround for the driver-connection. If you 
have any idea, Let me know



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452835)
Time Spent: 1h 10m  (was: 1h)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  

[jira] [Work logged] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=452837&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452837
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:10
Start Date: 30/Jun/20 07:10
Worklog Time Spent: 10m 
  Work Description: ayushtkn edited a comment on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651592714


   The assertion error is due to the fact both shaded and non shaded jars are 
there in classpath.
   ``` Caused by: java.lang.AssertionError
at 
org.apache.calcite.avatica.AvaticaUtils.instantiatePlugin(AvaticaUtils.java:229)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$5.apply(ConnectionConfigImpl.java:392)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.get_(ConnectionConfigImpl.java:173)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.getPlugin(ConnectionConfigImpl.java:296)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.getPlugin(ConnectionConfigImpl.java:282)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.config.CalciteConnectionConfigImpl.typeSystem(CalciteConnectionConfigImpl.java:155)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.(CalciteConnectionImpl.java:127)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteJdbc41Factory$CalciteJdbc41Connection.(CalciteJdbc41Factory.java:115)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection(CalciteJdbc41Factory.java:59)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection(CalciteJdbc41Factory.java:44)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteFactory.newConnection(CalciteFactory.java:53) 
~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138)
 ~[avatica-1.12.0.jar:1.12.0]
at java.sql.DriverManager.getConnection(DriverManager.java:664) 
~[?:1.8.0_181]
at java.sql.DriverManager.getConnection(DriverManager.java:208) 
~[?:1.8.0_181]
at 
org.apache.hive.org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:175)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1555)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:540)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:?]  
   ```
   
   Post ``at java.sql.DriverManager.getConnection(DriverManager.java:664) 
~[?:1.8.0_181]`` rather than coming back to ``hive-exec`` it goes back to the 
non shaded jars ``calcite-core``
   
   The driver gets registered with original jar only so it executes call to the 
non-shaded ones. 
   Ideally we can remove the original ones for itests??
   Removing that takes away that error, but driver registration is done as 
``HiveDriver`` and while getting connection it wants a `jdbc:calcite` 
hard-coded in `Framework` class so it gets a:
   ``` 
   Caused by: java.sql.SQLException: No suitable driver found for jdbc:calcite:
at java.sql.DriverManager.getConnection(DriverManager.java:689) 
~[?:1.8.0_181]
at java.sql.DriverManager.getConnection(DriverManager.java:208) 
~[?:1.8.0_181]
at 
org.apache.hive.org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:175)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
   ```

   Is there are configuration or workaround for the driver-connection. If you 
have any idea, Let me know



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452837)
Time Spent: 1.5h  (was: 1h 20m)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time S

[jira] [Work logged] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=452836&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452836
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:10
Start Date: 30/Jun/20 07:10
Worklog Time Spent: 10m 
  Work Description: ayushtkn edited a comment on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-651592714


   The assertion error is due to the fact both shaded and non shaded jars are 
there in classpath.
   ``` Caused by: java.lang.AssertionError
at 
org.apache.calcite.avatica.AvaticaUtils.instantiatePlugin(AvaticaUtils.java:229)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$5.apply(ConnectionConfigImpl.java:392)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.get_(ConnectionConfigImpl.java:173)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.getPlugin(ConnectionConfigImpl.java:296)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.avatica.ConnectionConfigImpl$PropEnv.getPlugin(ConnectionConfigImpl.java:282)
 ~[avatica-1.12.0.jar:1.12.0]
at 
org.apache.calcite.config.CalciteConnectionConfigImpl.typeSystem(CalciteConnectionConfigImpl.java:155)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.(CalciteConnectionImpl.java:127)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteJdbc41Factory$CalciteJdbc41Connection.(CalciteJdbc41Factory.java:115)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection(CalciteJdbc41Factory.java:59)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteJdbc41Factory.newConnection(CalciteJdbc41Factory.java:44)
 ~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.jdbc.CalciteFactory.newConnection(CalciteFactory.java:53) 
~[calcite-core-1.21.0.jar:1.21.0]
at 
org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138)
 ~[avatica-1.12.0.jar:1.12.0]
at java.sql.DriverManager.getConnection(DriverManager.java:664) 
~[?:1.8.0_181]
at java.sql.DriverManager.getConnection(DriverManager.java:208) 
~[?:1.8.0_181]
at 
org.apache.hive.org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:175)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hive.org.apache.calcite.tools.Frameworks.withPlanner(Frameworks.java:125)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.logicalPlan(CalcitePlanner.java:1555)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:?]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:540)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:?]  
   ```
   
   Post ``at java.sql.DriverManager.getConnection(DriverManager.java:664) 
~[?:1.8.0_181]`` rather than coming back to ``hive-exec`` it goes back to the 
non shaded jars ``calcite-core``
   
   The driver gets registered with original jar only so it executes call to the 
non-shaded ones. 
   Ideally we can remove the original ones for itests??
   Removing that takes away that error, but driver registration is done as 
``HiveDriver`` and while getting connection it wants a `jdbc:calcite` 
hard-coded in `Framework` class so it gets a:
   ``` Caused by: java.sql.SQLException: No suitable driver found for 
jdbc:calcite:
at java.sql.DriverManager.getConnection(DriverManager.java:689) 
~[?:1.8.0_181]
at java.sql.DriverManager.getConnection(DriverManager.java:208) 
~[?:1.8.0_181]
at 
org.apache.hive.org.apache.calcite.tools.Frameworks.withPrepare(Frameworks.java:175)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
   ```

   Is there are configuration or workaround for the driver-connection. If you 
have any idea, Let me know



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452836)
Time Spent: 1h 20m  (was: 1h 10m)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Sp

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452838&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452838
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:13
Start Date: 30/Jun/20 07:13
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447033768



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java
##
@@ -429,6 +451,75 @@ void findUnknownPartitions(Table table, Set 
partPaths,
 LOG.debug("Number of partitions not in metastore : " + 
result.getPartitionsNotInMs().size());
   }
 
+  /**
+   * Calculate the maximum seen writeId from the acid directory structure
+   * @param partPath Path of the partition directory
+   * @param res Partition result to write the max ids
+   * @throws IOException ex
+   */
+  private void setMaxTxnAndWriteIdFromPartition(Path partPath, 
CheckResult.PartitionResult res) throws IOException {
+FileSystem fs = partPath.getFileSystem(conf);
+FileStatus[] deltaOrBaseFiles = fs.listStatus(partPath, 
HIDDEN_FILES_PATH_FILTER);
+
+// Read the writeIds from every base and delta directory and find the max
+long maxWriteId = 0L;
+long maxVisibilityId = 0L;
+for(FileStatus fileStatus : deltaOrBaseFiles) {
+  if (!fileStatus.isDirectory()) {
+continue;
+  }
+  long writeId = 0L;
+  long visibilityId = 0L;
+  String folder = fileStatus.getPath().getName();
+  if (folder.startsWith(BASE_PREFIX)) {
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+writeId = Long.parseLong(folder.substring(BASE_PREFIX.length()));
+  } else if (folder.startsWith(DELTA_PREFIX) || 
folder.startsWith(DELETE_DELTA_PREFIX)) {
+// See AcidUtils.parseDelta
+visibilityId = getVisibilityTxnId(folder);
+if (visibilityId > 0) {
+  folder = removeVisibilityTxnId(folder);
+}
+boolean isDeleteDelta = folder.startsWith(DELETE_DELTA_PREFIX);
+String rest = folder.substring((isDeleteDelta ? DELETE_DELTA_PREFIX : 
DELTA_PREFIX).length());
+int split = rest.indexOf('_');
+//split2 may be -1 if no statementId
+int split2 = rest.indexOf('_', split + 1);
+// We always want the second part (it is either the same or greater if 
it is a compacted delta)
+writeId = split2 == -1 ? Long.parseLong(rest.substring(split + 1)) : 
Long
+.parseLong(rest.substring(split + 1, split2));
+  }
+  if (writeId > maxWriteId) {
+maxWriteId = writeId;
+  }
+  if (visibilityId > maxVisibilityId) {
+maxVisibilityId = visibilityId;
+  }
+}
+LOG.debug("Max writeId {}, max txnId {} found in partition {}", 
maxWriteId, maxVisibilityId,
+partPath.toUri().toString());
+res.setMaxWriteId(maxWriteId);
+res.setMaxTxnId(maxVisibilityId);
+  }
+  private long getVisibilityTxnId(String folder) {
+int idxOfVis = folder.indexOf(VISIBILITY_PREFIX);

Review comment:
   why not use regex with pattern matching? removeVisibilityTxnId probably 
wouldn't even be needed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452838)
Time Spent: 4h 40m  (was: 4.5h)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use c

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452841&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452841
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:17
Start Date: 30/Jun/20 07:17
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447462171



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName

[jira] [Updated] (HIVE-22255) Hive don't trigger Major Compaction automatically if table contains only base files

2020-06-30 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage updated HIVE-22255:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master, PR #1175. Thanks for the patch [~Rajkumar Singh]!

> Hive don't trigger Major Compaction automatically if table contains only base 
> files 
> 
>
> Key: HIVE-22255
> URL: https://issues.apache.org/jira/browse/HIVE-22255
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 3.1.2
> Environment: Hive-3.1.1
>Reporter: Rajkumar Singh
>Assignee: Rajkumar Singh
>Priority: Major
> Attachments: HIVE-22255.01.patch, HIVE-22255.02.patch, 
> HIVE-22255.patch
>
>
> user may run into the issue if the table consists of all base files but no 
> delta, then the following condition will yield false and automatic major 
> compaction will be skipped.
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java#L313]
>  
> Steps to Reproduce:
>  # create Acid table 
> {code:java}
> //  create table myacid(id int);
> {code}
>  # Run multiple insert table 
> {code:java}
> // insert overwrite table myacid values(1);insert overwrite table myacid 
> values(2),(3),(4){code}
>  # DFS ls output
> {code:java}
> // dfs -ls -R /warehouse/tablespace/managed/hive/myacid;
> ++
> |                     DFS Output                     |
> ++
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001 |
> | -rw-rw+  3 hive hadoop          1 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001/_orc_acid_version |
> | -rw-rw+  3 hive hadoop        610 2019-09-27 16:42 
> /warehouse/tablespace/managed/hive/myacid/base_001/bucket_0 |
> | drwxrwx---+  - hive hadoop          0 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002 |
> | -rw-rw+  3 hive hadoop          1 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002/_orc_acid_version |
> | -rw-rw+  3 hive hadoop        633 2019-09-27 16:43 
> /warehouse/tablespace/managed/hive/myacid/base_002/bucket_0 |
> ++{code}
>  
> you will see that Major compaction will not be trigger until you run alter 
> table compact MAJOR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452846
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:28
Start Date: 30/Jun/20 07:28
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447468545



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -229,102 +239,168 @@ public int repair(MsckInfo msckInfo) {
 throw new MetastoreException(e);
   }
 }
+if (transactionalTable && !MetaStoreServerUtils.isPartitioned(table)) {
+  if (result.getMaxWriteId() > 0) {
+if (txnId < 0) {
+  // We need the txnId to check against even if we didn't do the 
locking
+  txnId = getMsc().openTxn(getUserName());
+}
+
+validateAndAddMaxTxnIdAndWriteId(result.getMaxWriteId(), 
result.getMaxTxnId(),
+table.getDbName(), table.getTableName(), txnId);
+  }
+}
   }
   success = true;
 } catch (Exception e) {
   LOG.warn("Failed to run metacheck: ", e);
   success = false;
-  ret = 1;
 } finally {
-  if (msckInfo.getResFile() != null) {
-BufferedWriter resultOut = null;
-try {
-  Path resFile = new Path(msckInfo.getResFile());
-  FileSystem fs = resFile.getFileSystem(getConf());
-  resultOut = new BufferedWriter(new 
OutputStreamWriter(fs.create(resFile)));
-
-  boolean firstWritten = false;
-  firstWritten |= writeMsckResult(result.getTablesNotInMs(),
-"Tables not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getTablesNotOnFs(),
-"Tables missing on filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotInMs(),
-"Partitions not in metastore:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getPartitionsNotOnFs(),
-"Partitions missing from filesystem:", resultOut, firstWritten);
-  firstWritten |= writeMsckResult(result.getExpiredPartitions(),
-"Expired partitions (retention period: " + partitionExpirySeconds 
+ "s) :", resultOut, firstWritten);
-  // sorting to stabilize qfile output (msck_repair_drop.q)
-  Collections.sort(repairOutput);
-  for (String rout : repairOutput) {
-if (firstWritten) {
-  resultOut.write(terminator);
-} else {
-  firstWritten = true;
-}
-resultOut.write(rout);
-  }
-} catch (IOException e) {
-  LOG.warn("Failed to save metacheck output: ", e);
-  ret = 1;
-} finally {
-  if (resultOut != null) {
-try {
-  resultOut.close();
-} catch (IOException e) {
-  LOG.warn("Failed to close output file: ", e);
-  ret = 1;
-}
-  }
+  if (result!=null) {
+logResult(result);
+if (msckInfo.getResFile() != null) {
+  success = writeResultToFile(msckInfo, result, repairOutput, 
partitionExpirySeconds) && success;
 }
   }
 
-  LOG.info("Tables not in metastore: {}", result.getTablesNotInMs());
-  LOG.info("Tables missing on filesystem: {}", result.getTablesNotOnFs());
-  LOG.info("Partitions not in metastore: {}", 
result.getPartitionsNotInMs());
-  LOG.info("Partitions missing from filesystem: {}", 
result.getPartitionsNotOnFs());
-  LOG.info("Expired partitions: {}", result.getExpiredPartitions());
-  if (acquireLock && txnId > 0) {
-  if (success) {
-try {
-  LOG.info("txnId: {} succeeded. Committing..", txnId);
-  getMsc().commitTxn(txnId);
-} catch (Exception e) {
-  LOG.warn("Error while committing txnId: {} for table: {}", 
txnId, qualifiedTableName, e);
-  ret = 1;
-}
-  } else {
-try {
-  LOG.info("txnId: {} failed. Aborting..", txnId);
-  getMsc().abortTxns(Lists.newArrayList(txnId));
-} catch (Exception e) {
-  LOG.warn("Error while aborting txnId: {} for table: {}", txnId, 
qualifiedTableName, e);
-  ret = 1;
-}
-  }
+  if (txnId > 0) {
+success = closeTxn(qualifiedTableName, success, txnId) && success;
   }
   if (getMsc() != null) {
 getMsc().close();
 msc = null;
   }
 }
+return success ? 0 : 1;
+  }
 
+  private boolean closeTxn(String qualifiedTableName

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452848&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452848
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:36
Start Date: 30/Jun/20 07:36
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447472989



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java
##
@@ -383,6 +475,7 @@ public Void execute(int size) throws MetastoreException {
   partsToAdd.add(partition);
   lastBatch.add(part);
   addMsgs.add(String.format(addMsgFormat, 
part.getPartitionName()));
+  LOG.debug(String.format(addMsgFormat, part.getPartitionName()));

Review comment:
   why  not to log content of addMsgs 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452848)
Time Spent: 5h 10m  (was: 5h)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-30 Thread Karen Coppage (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148396#comment-17148396
 ] 

Karen Coppage commented on HIVE-23748:
--

Hi [~wanguangping], this was resolved as Fixed, was there a fix?

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 
> FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
>  I

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452850&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452850
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:41
Start Date: 30/Jun/20 07:41
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447476089



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDbUtil.java
##
@@ -313,6 +313,41 @@ private static void resetTxnSequence(Connection conn, 
Statement stmt) throws SQL
 }
   }
 
+  /**
+   * Restarts the txnId sequence with the given seed value.
+   * It is the responsibility of the caller to not set the sequence backward.
+   * @param conn database connection
+   * @param stmt sql statement
+   * @param seedTxnId the seed value for the sequence
+   * @throws SQLException ex
+   */
+  public static void seedTxnSequence(Connection conn, Statement stmt, long 
seedTxnId) throws SQLException {
+String dbProduct = conn.getMetaData().getDatabaseProductName();
+DatabaseProduct databaseProduct = determineDatabaseProduct(dbProduct);
+switch (databaseProduct) {
+
+case DERBY:

Review comment:
   minor: i would probably create EnumMap for SEED_FN, and use proper one 
based on db type.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452850)
Time Spent: 5h 20m  (was: 5h 10m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452852&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452852
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:43
Start Date: 30/Jun/20 07:43
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447477522



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2015,8 +2019,49 @@ public AllocateTableWriteIdsResponse 
allocateTableWriteIds(AllocateTableWriteIds
   return allocateTableWriteIds(rqst);
 }
   }
+
+  @Override
+  public MaxAllocatedTableWriteIdResponse 
getMaxAllocatedTableWrited(MaxAllocatedTableWriteIdRequest rqst) throws 
MetaException {
+String dbName = rqst.getDbName();
+String tableName = rqst.getTableName();
+try {
+  Connection dbConn = null;
+  PreparedStatement pStmt = null;
+  ResultSet rs = null;
+  try {
+lockInternal();
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+List params = Arrays.asList(dbName, tableName);
+String query = "SELECT \"NWI_NEXT\" FROM \"NEXT_WRITE_ID\" WHERE 
\"NWI_DATABASE\" = ? AND \"NWI_TABLE\" = ?";

Review comment:
   should we have a query constant?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452852)
Time Spent: 5.5h  (was: 5h 20m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452854&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452854
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:46
Start Date: 30/Jun/20 07:46
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447478910



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2015,8 +2019,49 @@ public AllocateTableWriteIdsResponse 
allocateTableWriteIds(AllocateTableWriteIds
   return allocateTableWriteIds(rqst);
 }
   }
+
+  @Override
+  public MaxAllocatedTableWriteIdResponse 
getMaxAllocatedTableWrited(MaxAllocatedTableWriteIdRequest rqst) throws 
MetaException {
+String dbName = rqst.getDbName();
+String tableName = rqst.getTableName();
+try {
+  Connection dbConn = null;
+  PreparedStatement pStmt = null;
+  ResultSet rs = null;
+  try {
+lockInternal();
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+List params = Arrays.asList(dbName, tableName);
+String query = "SELECT \"NWI_NEXT\" FROM \"NEXT_WRITE_ID\" WHERE 
\"NWI_DATABASE\" = ? AND \"NWI_TABLE\" = ?";
+pStmt = sqlGenerator.prepareStmtWithParameters(dbConn, query, params);
+LOG.debug("Going to execute query <" + query.replaceAll("\\?", "{}") + 
">", quoteString(dbName),
+quoteString(tableName));
+rs = pStmt.executeQuery();
+// If there is no record, we never allocated anything
+long maxWriteId = 0l;
+if (rs.next()) {
+  // The row contains the nextId not the previously allocated
+  maxWriteId = rs.getLong(1) - 1;
+}
+return new MaxAllocatedTableWriteIdResponse(maxWriteId);
+  } catch (SQLException e) {
+LOG.error(
+"Exception during reading the max allocated writeId for dbName={}, 
tableName={}. Will retry if possible.",
+dbName, tableName, e);
+rollbackDBConn(dbConn);

Review comment:
   what to rollback, you have select here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452854)
Time Spent: 5h 40m  (was: 5.5h)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452856&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452856
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:48
Start Date: 30/Jun/20 07:48
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447480415



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2015,8 +2019,49 @@ public AllocateTableWriteIdsResponse 
allocateTableWriteIds(AllocateTableWriteIds
   return allocateTableWriteIds(rqst);
 }
   }
+
+  @Override
+  public MaxAllocatedTableWriteIdResponse 
getMaxAllocatedTableWrited(MaxAllocatedTableWriteIdRequest rqst) throws 
MetaException {
+String dbName = rqst.getDbName();
+String tableName = rqst.getTableName();
+try {
+  Connection dbConn = null;
+  PreparedStatement pStmt = null;
+  ResultSet rs = null;
+  try {
+lockInternal();

Review comment:
   lockInternal is required for Derby to simulate S4U





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452856)
Time Spent: 5h 50m  (was: 5h 40m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452857&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452857
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:49
Start Date: 30/Jun/20 07:49
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447480415



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2015,8 +2019,49 @@ public AllocateTableWriteIdsResponse 
allocateTableWriteIds(AllocateTableWriteIds
   return allocateTableWriteIds(rqst);
 }
   }
+
+  @Override
+  public MaxAllocatedTableWriteIdResponse 
getMaxAllocatedTableWrited(MaxAllocatedTableWriteIdRequest rqst) throws 
MetaException {
+String dbName = rqst.getDbName();
+String tableName = rqst.getTableName();
+try {
+  Connection dbConn = null;
+  PreparedStatement pStmt = null;
+  ResultSet rs = null;
+  try {
+lockInternal();

Review comment:
   lockInternal is required for Derby to simulate S4U, why use here? 
unlockInternal is not needed as well





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452857)
Time Spent: 6h  (was: 5h 50m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452858&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452858
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:51
Start Date: 30/Jun/20 07:51
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447482052



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2015,8 +2019,49 @@ public AllocateTableWriteIdsResponse 
allocateTableWriteIds(AllocateTableWriteIds
   return allocateTableWriteIds(rqst);
 }
   }
+
+  @Override
+  public MaxAllocatedTableWriteIdResponse 
getMaxAllocatedTableWrited(MaxAllocatedTableWriteIdRequest rqst) throws 
MetaException {
+String dbName = rqst.getDbName();
+String tableName = rqst.getTableName();
+try {
+  Connection dbConn = null;
+  PreparedStatement pStmt = null;
+  ResultSet rs = null;
+  try {
+lockInternal();
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);

Review comment:
   minor: i would use try-with-resources instead of doing explicit 
management  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452858)
Time Spent: 6h 10m  (was: 6h)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452859&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452859
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:53
Start Date: 30/Jun/20 07:53
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447483691



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2015,8 +2019,49 @@ public AllocateTableWriteIdsResponse 
allocateTableWriteIds(AllocateTableWriteIds
   return allocateTableWriteIds(rqst);
 }
   }
+
+  @Override
+  public MaxAllocatedTableWriteIdResponse 
getMaxAllocatedTableWrited(MaxAllocatedTableWriteIdRequest rqst) throws 
MetaException {
+String dbName = rqst.getDbName();
+String tableName = rqst.getTableName();
+try {
+  Connection dbConn = null;
+  PreparedStatement pStmt = null;
+  ResultSet rs = null;
+  try {
+lockInternal();
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+List params = Arrays.asList(dbName, tableName);
+String query = "SELECT \"NWI_NEXT\" FROM \"NEXT_WRITE_ID\" WHERE 
\"NWI_DATABASE\" = ? AND \"NWI_TABLE\" = ?";
+pStmt = sqlGenerator.prepareStmtWithParameters(dbConn, query, params);

Review comment:
   minor: you can simply pass params as  Arrays.asList(rqst.getDbName(), 
rqst.getTableName()) instead of using so many local vars





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452859)
Time Spent: 6h 20m  (was: 6h 10m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452860
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:54
Start Date: 30/Jun/20 07:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447484200



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2032,28 +2077,61 @@ public void 
seedWriteIdOnAcidConversion(InitializeTableWriteIdsRequest rqst)
 // The initial value for write id should be 1 and hence we add 1 with 
number of write ids
 // allocated here
 String s = "INSERT INTO \"NEXT_WRITE_ID\" (\"NWI_DATABASE\", 
\"NWI_TABLE\", \"NWI_NEXT\") VALUES (?, ?, "
-+ Long.toString(rqst.getSeeWriteId() + 1) + ")";
-pst = sqlGenerator.prepareStmtWithParameters(dbConn, s, 
Arrays.asList(rqst.getDbName(), rqst.getTblName()));
++ Long.toString(rqst.getSeedWriteId() + 1) + ")";
+pst = sqlGenerator.prepareStmtWithParameters(dbConn, s, 
Arrays.asList(rqst.getDbName(), rqst.getTableName()));
 LOG.debug("Going to execute insert <" + s.replaceAll("\\?", "{}") + 
">",
-quoteString(rqst.getDbName()), quoteString(rqst.getTblName()));
+quoteString(rqst.getDbName()), 
quoteString(rqst.getTableName()));
 pst.execute();
 LOG.debug("Going to commit");
 dbConn.commit();
   } catch (SQLException e) {
-LOG.debug("Going to rollback");
 rollbackDBConn(dbConn);
-checkRetryable(dbConn, e, "seedWriteIdOnAcidConversion(" + rqst + ")");
-throw new MetaException("Unable to update transaction database "
-+ StringUtils.stringifyException(e));
+checkRetryable(dbConn, e, "seedWriteId(" + rqst + ")");
+throw new MetaException("Unable to update transaction database " + 
StringUtils.stringifyException(e));
   } finally {
 close(null, pst, dbConn);
 unlockInternal();

Review comment:
   not sure why is it used here





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452860)
Time Spent: 6.5h  (was: 6h 20m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452861&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452861
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 07:55
Start Date: 30/Jun/20 07:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447484596



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2032,28 +2077,61 @@ public void 
seedWriteIdOnAcidConversion(InitializeTableWriteIdsRequest rqst)
 // The initial value for write id should be 1 and hence we add 1 with 
number of write ids
 // allocated here
 String s = "INSERT INTO \"NEXT_WRITE_ID\" (\"NWI_DATABASE\", 
\"NWI_TABLE\", \"NWI_NEXT\") VALUES (?, ?, "
-+ Long.toString(rqst.getSeeWriteId() + 1) + ")";
-pst = sqlGenerator.prepareStmtWithParameters(dbConn, s, 
Arrays.asList(rqst.getDbName(), rqst.getTblName()));
++ Long.toString(rqst.getSeedWriteId() + 1) + ")";
+pst = sqlGenerator.prepareStmtWithParameters(dbConn, s, 
Arrays.asList(rqst.getDbName(), rqst.getTableName()));
 LOG.debug("Going to execute insert <" + s.replaceAll("\\?", "{}") + 
">",
-quoteString(rqst.getDbName()), quoteString(rqst.getTblName()));
+quoteString(rqst.getDbName()), 
quoteString(rqst.getTableName()));
 pst.execute();
 LOG.debug("Going to commit");
 dbConn.commit();
   } catch (SQLException e) {
-LOG.debug("Going to rollback");
 rollbackDBConn(dbConn);
-checkRetryable(dbConn, e, "seedWriteIdOnAcidConversion(" + rqst + ")");
-throw new MetaException("Unable to update transaction database "
-+ StringUtils.stringifyException(e));
+checkRetryable(dbConn, e, "seedWriteId(" + rqst + ")");
+throw new MetaException("Unable to update transaction database " + 
StringUtils.stringifyException(e));
   } finally {
 close(null, pst, dbConn);
 unlockInternal();
   }
 } catch (RetryException e) {
-  seedWriteIdOnAcidConversion(rqst);
+  seedWriteId(rqst);
 }
+  }
+
+  @Override
+  public void seedTxnId(SeedTxnIdRequest rqst) throws MetaException {
+try {
+  Connection dbConn = null;
+  Statement stmt = null;
+  try {
+lockInternal();

Review comment:
   same, could you please check if this is needed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452861)
Time Spent: 6h 40m  (was: 6.5h)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-30 Thread wanguangping (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148422#comment-17148422
 ] 

wanguangping commented on HIVE-23748:
-

1.not fix,problem still exists.

2.i just want to delete sensitive information in this issue.Although I edit 
description section to delete sensitive information,but in history section  
still retain sensitive information.

3.if you can directly delete sensitive information in history section ,i will 
reopen this issue.
or if you delete sensitive information
only by delete this issue  ,i will new a issue for problem

please help me delete sensitive information!



aheartfulguy
邮箱:aheartful...@126.com

签名由 网易邮箱大师 定制

On 06/30/2020 15:42, Karen Coppage (Jira) wrote:

   [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148396#comment-17148396
 ]

Karen Coppage commented on HIVE-23748:
--

Hi [~wanguangping], this was resolved as Fixed, was there a fix?




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, typ

[jira] [Commented] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-30 Thread wanguangping (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148425#comment-17148425
 ] 

wanguangping commented on HIVE-23748:
-

not fix ,problem still exist. I just want delete sensitive information in this 
issue,help me

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 
> FieldSchema(name:last_ipcity, type:string, comment:nu

[jira] [Issue Comment Deleted] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-30 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping updated HIVE-23748:

Comment: was deleted

(was: not fix ,problem still exist. I just want delete sensitive information in 
this issue,help me)

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 
> FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
> 

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=452867&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452867
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 08:36
Start Date: 30/Jun/20 08:36
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r447509066



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##
@@ -2032,28 +2077,61 @@ public void 
seedWriteIdOnAcidConversion(InitializeTableWriteIdsRequest rqst)
 // The initial value for write id should be 1 and hence we add 1 with 
number of write ids
 // allocated here
 String s = "INSERT INTO \"NEXT_WRITE_ID\" (\"NWI_DATABASE\", 
\"NWI_TABLE\", \"NWI_NEXT\") VALUES (?, ?, "
-+ Long.toString(rqst.getSeeWriteId() + 1) + ")";
-pst = sqlGenerator.prepareStmtWithParameters(dbConn, s, 
Arrays.asList(rqst.getDbName(), rqst.getTblName()));
++ Long.toString(rqst.getSeedWriteId() + 1) + ")";
+pst = sqlGenerator.prepareStmtWithParameters(dbConn, s, 
Arrays.asList(rqst.getDbName(), rqst.getTableName()));
 LOG.debug("Going to execute insert <" + s.replaceAll("\\?", "{}") + 
">",
-quoteString(rqst.getDbName()), quoteString(rqst.getTblName()));
+quoteString(rqst.getDbName()), 
quoteString(rqst.getTableName()));
 pst.execute();
 LOG.debug("Going to commit");
 dbConn.commit();
   } catch (SQLException e) {
-LOG.debug("Going to rollback");
 rollbackDBConn(dbConn);
-checkRetryable(dbConn, e, "seedWriteIdOnAcidConversion(" + rqst + ")");
-throw new MetaException("Unable to update transaction database "
-+ StringUtils.stringifyException(e));
+checkRetryable(dbConn, e, "seedWriteId(" + rqst + ")");
+throw new MetaException("Unable to update transaction database " + 
StringUtils.stringifyException(e));
   } finally {
 close(null, pst, dbConn);
 unlockInternal();
   }
 } catch (RetryException e) {
-  seedWriteIdOnAcidConversion(rqst);
+  seedWriteId(rqst);
 }
+  }
+
+  @Override
+  public void seedTxnId(SeedTxnIdRequest rqst) throws MetaException {
+try {
+  Connection dbConn = null;
+  Statement stmt = null;
+  try {
+lockInternal();
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+stmt = dbConn.createStatement();
+/*
+ * Locking the txnLock an exclusive way, we do not want to set the 
txnId backward accidentally
+ * if there are concurrent open transactions
+ */
+acquireTxnLock(stmt, false);
+long highWaterMark = getHighWaterMark(stmt);
+if (highWaterMark >= rqst.getSeedTxnId()) {

Review comment:
   not quite understand this if condition. You have check in 
validateAndAddMaxTxnIdAndWriteId() if there are already some write ids 
registered in HMS and we try to do repair - throw exception. Could it be 
possible due to lack of locking that when we calculate the write ids there is 
nothing in HMS,  however when we try to seed - some transaction generates a new 
write id - would it cause some dataloss problems or other issues? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452867)
Time Spent: 6h 50m  (was: 6h 40m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read

[jira] [Commented] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-30 Thread Karen Coppage (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148447#comment-17148447
 ] 

Karen Coppage commented on HIVE-23748:
--

Please check the comments under INFRA-12784 and see if they apply. If not, you 
can try opening a ticket requesting deletion in the INFRA project.

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 

[jira] [Work logged] (HIVE-23074) SchemaTool sql script execution errors when updating the metadata's schema

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23074?focusedWorklogId=452871&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452871
 ]

ASF GitHub Bot logged work on HIVE-23074:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 08:56
Start Date: 30/Jun/20 08:56
Worklog Time Spent: 10m 
  Work Description: John1Tang commented on pull request #967:
URL: https://github.com/apache/hive/pull/967#issuecomment-651657855


   my pull request only changed the postgres schematool, this test failed with 
hive jdbc storage handler...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452871)
Time Spent: 1h 10m  (was: 1h)

> SchemaTool sql script execution errors when updating the metadata's schema
> --
>
> Key: HIVE-23074
> URL: https://issues.apache.org/jira/browse/HIVE-23074
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
> Environment: running machine: centos7.2 
> metadata db: PostgreSQL 11.3 on x86_64-pc-linux-gnu
> hive version: upgrade from version 3.0.0 to 3.1.2
>Reporter: John1Tang
>Assignee: John1Tang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.1.2
>
>   Original Estimate: 1h
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> SchemaTool sql script executed with conflicts on indices and columns and 
> missed " for avoiding keywords when updating the metadata's schema
> {code:java}
> bin/schematool -dbType postgres -upgradeSchemaFrom 3.0.0{code}
> went like this:
> {code:java}
> ALTER TABLE "GLOBAL_PRIVS" ADD COLUMN "AUTHORIZER" character varying(128) 
> DEFAULT NULL::character varying
> Error: ERROR: column "AUTHORIZER" of relation "GLOBAL_PRIVS" already exists 
> (state=42701,code=0){code}
> {code:java}
> ALTER TABLE COMPLETED_TXN_COMPONENTS ADD COLUMN IF NOT EXISTS 
> CTC_UPDATE_DELETE char(1) NULL
> Error: ERROR: relation "completed_txn_components" does not exist 
> (state=42P01,code=0)
> {code}
> I've already come up with a solution and created a pull request for this 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-30 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping reopened HIVE-23748:
-

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 
> FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f); 
> Time taken: 78.5

[jira] [Resolved] (HIVE-23748) tez task with File Merge operator generate tmp file with wrong suffix

2020-06-30 Thread wanguangping (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wanguangping resolved HIVE-23748.
-
  Assignee: wanguangping
Resolution: Auto Closed

> tez task with File Merge operator generate tmp file with wrong suffix
> -
>
> Key: HIVE-23748
> URL: https://issues.apache.org/jira/browse/HIVE-23748
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 3.1.0
>Reporter: wanguangping
>Assignee: wanguangping
>Priority: Major
>
> h1. background
>  * SQL on TEZ 
>  * it's a Occasional problem
> h1. hiveserver2 log
> SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Connecting to jdbc:hive2://xxx:1/prod
>  Connected to: Apache Hive (version 3.1.0.3.1.4.0-315)
>  Driver: Hive JDBC (version 3.1.0.3.1.4.0-315)
>  Transaction isolation: TRANSACTION_REPEATABLE_READ
>  INFO : Compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.887 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2): 
> use prod
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033313_ed882b48-7ab4-42a2-84e4-c9ef764271e2); 
> Time taken: 0.197 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (1.096 seconds)
>  No rows affected (0.004 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
>  INFO : Completed compiling 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 1.324 seconds
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : Executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23): 
> drop table if exists temp.shawnlee_newbase_devicebase
>  INFO : Starting task [Stage-0:DDL] in serial mode
>  INFO : Completed executing 
> command(queryId=hive_20200609033314_cba66b08-ad42-4b94-ad61-d15fe48efe23); 
> Time taken: 12.895 seconds
>  INFO : OK
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  No rows affected (14.229 seconds)
>  INFO : Compiling 
> command(queryId=hive_20200609033329_3fbf0a38-e5b0-4e3a-ae8b-ef95f400b50f): 
> x
>  INFO : Concurrency mode is disabled, not creating a lock manager
>  INFO : No Stats for user_profile@dw_uba_event_daily, Columns: attribute, 
> event
>  INFO : Semantic Analysis Completed (retrial = false)
>  INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:day, 
> type:string, comment:null), FieldSchema(name:device_id, type:string, 
> comment:null), FieldSchema(name:is_new, type:int, comment:null), 
> FieldSchema(name:first_attribute, type:map, comment:null), 
> FieldSchema(name:first_app_version, type:string, comment:null), 
> FieldSchema(name:first_platform_type, type:string, comment:null), 
> FieldSchema(name:first_manufacturer, type:string, comment:null), 
> FieldSchema(name:first_model, type:string, comment:null), 
> FieldSchema(name:first_ipprovince, type:string, comment:null), 
> FieldSchema(name:first_ipcity, type:string, comment:null), 
> FieldSchema(name:last_attribute, type:map, comment:null), 
> FieldSchema(name:last_app_version, type:string, comment:null), 
> FieldSchema(name:last_platform_type, type:string, comment:null), 
> FieldSchema(name:last_manufacturer, type:string, comment:null), 
> FieldSchema(name:last_model, type:string, comment:null), 
> FieldSchema(name:last_ipprovince, type:string, comment:null), 
> FieldSchema(name:last_ipcity, type:string, comment:null)], properties:null)
>  INFO : Completed compiling 
> c

[jira] [Commented] (HIVE-23598) Add option to rewrite NTILE and RANK to sketch functions

2020-06-30 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148467#comment-17148467
 ] 

Zoltan Haindrich commented on HIVE-23598:
-

[~jcamachorodriguez] could you please take another look?

> Add option to rewrite NTILE and RANK to sketch functions
> 
>
> Key: HIVE-23598
> URL: https://issues.apache.org/jira/browse/HIVE-23598
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23779?focusedWorklogId=452906&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-452906
 ]

ASF GitHub Bot logged work on HIVE-23779:
-

Author: ASF GitHub Bot
Created on: 30/Jun/20 10:12
Start Date: 30/Jun/20 10:12
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #1191:
URL: https://github.com/apache/hive/pull/1191#issuecomment-651699390


   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 452906)
Time Spent: 40m  (was: 0.5h)

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23781) Incomplete partition column stats in CachedStore may lead to wrong aggregate stats

2020-06-30 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-23781:
--


> Incomplete partition column stats in CachedStore may lead to wrong aggregate 
> stats
> --
>
> Key: HIVE-23781
> URL: https://issues.apache.org/jira/browse/HIVE-23781
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Major
>
> Requesting aggregate stats from the Metastore 
> ({{RawStore#get_aggr_stats_for}}) may return wrong results when the backing 
> implementation is CachedStore and column statistics are missing from the 
> cache.
>  
> The suspicious code lies inside {{CachedStore#mergeColStatsForPartitions}} 
> that returns an [empty 
> object|https://github.com/apache/hive/blob/31ee14644bf6105360d6266baa8c6c8060d38ea3/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java#L2267]
>  when no stats are found in the cache. This is considered a valid value by 
> the consumer so no additional lookup is performed in the rawstore to fetch 
> the actual values.
> Moreover, in the case where the cache holds values for some partitions but 
> not for all those requested the result will be wrong assuming that the 
> underlying rawstore has information about the requested partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23618) NotificationLog should also contain events for default/check constraints

2020-06-30 Thread Adesh Kumar Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao updated HIVE-23618:
---
Description: This should follow similar approach of notNull/Unique 
constraints   (was: This should follow similar approach of notNull/Unique 
constraints)

> NotificationLog should also contain events for default/check constraints
> 
>
> Key: HIVE-23618
> URL: https://issues.apache.org/jira/browse/HIVE-23618
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
>
> This should follow similar approach of notNull/Unique constraints 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23782) Beeline does not update application id on console if query was killed and started on new application

2020-06-30 Thread Adesh Kumar Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao reassigned HIVE-23782:
--


> Beeline does not update application id on console if query was killed and 
> started on new application
> 
>
> Key: HIVE-23782
> URL: https://issues.apache.org/jira/browse/HIVE-23782
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 4.0.0
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Minor
> Fix For: 4.0.0
>
>
> After HIVE-23619, beeline just prints the application ID once on console. If 
> the query gets killed and is executed with another application, beeline will 
> not update the new application id.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23783) Support compaction in qtests

2020-06-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23783:

Description: From time to time I get back to a scenario where I want to 
investigate a compaction related issue and I have a complete repro SQL script 
for that. In this case, I would prefer a qtest in which I can run manually 
triggered compaction instead of setting up other kinds of unit tests.

> Support compaction in qtests
> 
>
> Key: HIVE-23783
> URL: https://issues.apache.org/jira/browse/HIVE-23783
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Priority: Major
>
> From time to time I get back to a scenario where I want to investigate a 
> compaction related issue and I have a complete repro SQL script for that. In 
> this case, I would prefer a qtest in which I can run manually triggered 
> compaction instead of setting up other kinds of unit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23783) Support compaction in qtests

2020-06-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor reassigned HIVE-23783:
---

Assignee: László Bodor

> Support compaction in qtests
> 
>
> Key: HIVE-23783
> URL: https://issues.apache.org/jira/browse/HIVE-23783
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> From time to time I get back to a scenario where I want to investigate a 
> compaction related issue and I have a complete repro SQL script for that. In 
> this case, I would prefer a qtest in which I can run manually triggered 
> compaction instead of setting up other kinds of unit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23783) Support compaction in qtests

2020-06-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-23783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-23783:

Description: 
>From time to time I get back to a scenario where I want to investigate a 
>compaction related issue and I have a complete repro SQL script for that. In 
>this case, I would prefer a qtest in which I can run manually triggered 
>compaction instead of setting up other kinds of unit tests.

This can be quite challenging, but the most important is that a compactor 
environment (compactor threads, queue, etc.) cannot be set on the fly in the 
scripts. Fortunately, we already have some preprocessed meta stuff in qtest 
which can run in advance, it worth try that for this purpose.

  was:From time to time I get back to a scenario where I want to investigate a 
compaction related issue and I have a complete repro SQL script for that. In 
this case, I would prefer a qtest in which I can run manually triggered 
compaction instead of setting up other kinds of unit tests.


> Support compaction in qtests
> 
>
> Key: HIVE-23783
> URL: https://issues.apache.org/jira/browse/HIVE-23783
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> From time to time I get back to a scenario where I want to investigate a 
> compaction related issue and I have a complete repro SQL script for that. In 
> this case, I would prefer a qtest in which I can run manually triggered 
> compaction instead of setting up other kinds of unit tests.
> This can be quite challenging, but the most important is that a compactor 
> environment (compactor threads, queue, etc.) cannot be set on the fly in the 
> scripts. Fortunately, we already have some preprocessed meta stuff in qtest 
> which can run in advance, it worth try that for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23757) Pushing TopN Key operator through MAPJOIN

2020-06-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23757:
-
Attachment: (was: HIVE-23757.1.patch)

> Pushing TopN Key operator through MAPJOIN
> -
>
> Key: HIVE-23757
> URL: https://issues.apache.org/jira/browse/HIVE-23757
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> So far only MERGEJOIN + JOIN cases are handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23598) Add option to rewrite NTILE and RANK to sketch functions

2020-06-30 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148767#comment-17148767
 ] 

Jesus Camacho Rodriguez commented on HIVE-23598:


+1

> Add option to rewrite NTILE and RANK to sketch functions
> 
>
> Key: HIVE-23598
> URL: https://issues.apache.org/jira/browse/HIVE-23598
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23784) Fix Replication Metrics Sink to DB

2020-06-30 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-23784:
--


> Fix Replication Metrics Sink to DB
> --
>
> Key: HIVE-23784
> URL: https://issues.apache.org/jira/browse/HIVE-23784
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-20606) hive3.1 beeline to dns complaining about ssl on ip

2020-06-30 Thread Yan Cheng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Cheng updated HIVE-20606:
-
Affects Version/s: 3.1.2

> hive3.1 beeline to dns complaining about ssl on ip
> --
>
> Key: HIVE-20606
> URL: https://issues.apache.org/jira/browse/HIVE-20606
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, HiveServer2
>Affects Versions: 3.1.0, 3.1.2
>Reporter: t oo
>Priority: Blocker
>
> Why is beeline complaining about ip when i use dns in the connection? I have 
> a valid cert/jks on the dns. Exact same beeline worked when running on 
> hive2.3.2 but this is hive3.1.0
> [ec2-user@ip-10-1-2-3 logs]$ $HIVE_HOME/bin/beeline
>  SLF4J: Class path contains multiple SLF4J bindings.
>  SLF4J: Found binding in 
> [jar:file:/usr/lib/apache-hive-3.1.0-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: Found binding in 
> [jar:file:/usr/lib/hadoop-2.7.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>  SLF4J: See [http://www.slf4j.org/codes.html#multiple_bindings] for an 
> explanation.
>  SLF4J: Actual binding is of type 
> [org.apache.logging.slf4j.Log4jLoggerFactory]
>  Beeline version 3.1.0 by Apache Hive
>  beeline> !connect 
> jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit
>  userhere passhere
>  Connecting to 
> jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit
>  18/09/20 04:49:06 [main]: WARN jdbc.HiveConnection: Failed to connect to 
> mydns:1
>  Unknown HS2 problem when communicating with Thrift server.
>  Error: Could not open client transport with JDBC Uri: 
> jdbc:hive2://mydns:1/default;ssl=true;sslTrustStore=/home/ec2-user/spark_home/conf/app-trust-nonprd.jks;trustStorePassword=changeit:
>  javax.net.ssl.SSLHandshakeException: 
> java.security.cert.CertificateException: No subject alternative names 
> matching IP address 10.1.2.3 found (state=08S01,code=0)
>  beeline>
>  
>  
>  
>  
>  
>  
>  
>  
>  
>  
> hiveserver2 logs:
> 2018-09-20T04:50:16,245 ERROR [HiveServer2-Handler-Pool: Thread-79] 
> server.TThreadPoolServer: Error occurred during processing of message.
> java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: 
> javax.net.ssl.SSLHandshakeException: Remote host closed connection during 
> handshake
>  at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:269)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_181]
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_181]
>  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
> Caused by: org.apache.thrift.transport.TTransportException: 
> javax.net.ssl.SSLHandshakeException: Remote host closed connection during 
> handshake
>  at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) 
> ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
> ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
>  ~[hive-exec-3.1.0.jar:3.1.0]
>  ... 4 more
> Caused by: javax.net.ssl.SSLHandshakeException: Remote host closed connection 
> during handshake
>  at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1002) 
> ~[?:1.8.0_181]
>  at 
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
>  ~[?:1.8.0_181]
>  at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:938) 
> ~[?:1.8.0_181]
>  at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) 
> ~[?:1.8.0_181]
>  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> ~[?:1.8.0_181]
>  at java.io.BufferedInputStream.read1(BufferedInputStream.java:286) 
> ~[?:1.8.0_181]
>  at java.io.BufferedInputStream.read(BufferedInputStream.java:345) 
> ~[?:1

[jira] [Work started] (HIVE-23784) Fix Replication Metrics Sink to DB

2020-06-30 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23784 started by Aasha Medhi.
--
> Fix Replication Metrics Sink to DB
> --
>
> Key: HIVE-23784
> URL: https://issues.apache.org/jira/browse/HIVE-23784
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-23784.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23784) Fix Replication Metrics Sink to DB

2020-06-30 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-23784:
---
Attachment: HIVE-23784.01.patch
Status: Patch Available  (was: In Progress)

> Fix Replication Metrics Sink to DB
> --
>
> Key: HIVE-23784
> URL: https://issues.apache.org/jira/browse/HIVE-23784
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-23784.01.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-30 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148863#comment-17148863
 ] 

Aasha Medhi commented on HIVE-23611:


+1

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23611.01.patch, HIVE-23611.02.patch, 
> HIVE-23611.03.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23785) Database should have a unique id

2020-06-30 Thread Vihang Karajgaonkar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-23785:
--


> Database should have a unique id
> 
>
> Key: HIVE-23785
> URL: https://issues.apache.org/jira/browse/HIVE-23785
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
>
> HIVE-20556 introduced a id field to the Table object. This is a useful 
> information since a table which is dropped and recreated with the same name 
> will have a different Id. If a HMS client is caching such table object, it 
> can be used to determine if the table which is present on the client-side 
> matches with the one in the HMS.
> We can expand this idea to other HMS objects like Database, Catalogs and 
> Partitions and add a new id field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2020-06-30 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-23779 started by Naresh P R.
-
> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23779) BasicStatsTask Info is not getting printed in beeline console

2020-06-30 Thread Naresh P R (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naresh P R updated HIVE-23779:
--
Status: Patch Available  (was: In Progress)

> BasicStatsTask Info is not getting printed in beeline console
> -
>
> Key: HIVE-23779
> URL: https://issues.apache.org/jira/browse/HIVE-23779
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After HIVE-16061, partition basic stats are not getting printed in beeline 
> console.
> {code:java}
> INFO : Partition {dt=2020-06-29} stats: [numFiles=21, numRows=22, 
> totalSize=14607, rawDataSize=0]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23786) HMS Server side filter

2020-06-30 Thread Sam An (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam An reassigned HIVE-23786:
-


> HMS Server side filter
> --
>
> Key: HIVE-23786
> URL: https://issues.apache.org/jira/browse/HIVE-23786
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>
> HMS server side filter of results based on authorization. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23787) Write all the events present in a task_queue in a single file.

2020-06-30 Thread Amlesh Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amlesh Kumar reassigned HIVE-23787:
---

Assignee: Amlesh Kumar

> Write all the events present in a task_queue in a single file.
> --
>
> Key: HIVE-23787
> URL: https://issues.apache.org/jira/browse/HIVE-23787
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Amlesh Kumar
>Assignee: Amlesh Kumar
>Priority: Major
>
> DAS does not get the event when the queue becomes full, and it ignores the 
> post_exec_hook / pre_exec_hook event. The default capacity is 64 in 
> hive.hook.proto.queue.capacity config for hs2.
> Now, we will increase the queue-capacity (let's say upto 256).
> Also for the optimisation, need to run all the events present in a 
> task_queue, and write in a single file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23235) Checkpointing in repl dump failing for orc format

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23235?focusedWorklogId=453167&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453167
 ]

ASF GitHub Bot logged work on HIVE-23235:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 00:31
Start Date: 01/Jul/20 00:31
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #987:
URL: https://github.com/apache/hive/pull/987


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453167)
Time Spent: 50m  (was: 40m)

> Checkpointing in repl dump failing for orc format
> -
>
> Key: HIVE-23235
> URL: https://issues.apache.org/jira/browse/HIVE-23235
> Project: Hive
>  Issue Type: Bug
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23235.01.patch, HIVE-23235.02.patch, 
> HIVE-23235.03.patch, HIVE-23235.04.patch, HIVE-23235.05.patch, 
> HIVE-23235.06.patch, HIVE-23235.07.patch, HIVE-23235.08.patch, 
> HIVE-23235.09.patch, HIVE-23235.10.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23040) Checkpointing for repl dump incremental phase

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23040?focusedWorklogId=453168&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453168
 ]

ASF GitHub Bot logged work on HIVE-23040:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 00:31
Start Date: 01/Jul/20 00:31
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #977:
URL: https://github.com/apache/hive/pull/977


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453168)
Time Spent: 2h 20m  (was: 2h 10m)

> Checkpointing for repl dump incremental phase
> -
>
> Key: HIVE-23040
> URL: https://issues.apache.org/jira/browse/HIVE-23040
> Project: Hive
>  Issue Type: Improvement
>Reporter: Aasha Medhi
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23040.01.patch, HIVE-23040.02.patch, 
> HIVE-23040.03.patch, HIVE-23040.04.patch, HIVE-23040.05.patch, 
> HIVE-23040.06.patch, HIVE-23040.06.patch, HIVE-23040.07.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23755) Fix Ranger Url extra slash

2020-06-30 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-23755:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1 , Committed to master 

> Fix Ranger Url extra slash
> --
>
> Key: HIVE-23755
> URL: https://issues.apache.org/jira/browse/HIVE-23755
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23755.01.patch, HIVE-23755.02.patch, 
> HIVE-23755.03.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23611) Mandate fully qualified absolute path for external table base dir during REPL operation

2020-06-30 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-23611:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to master.

> Mandate fully qualified absolute path for external table base dir during REPL 
> operation
> ---
>
> Key: HIVE-23611
> URL: https://issues.apache.org/jira/browse/HIVE-23611
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23611.01.patch, HIVE-23611.02.patch, 
> HIVE-23611.03.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-06-30 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-22957:
-
Comment: was deleted

(was: Test Passed after rebased to master
cc: [~jcamachorodriguez])

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR 
> TABLE.pdf, HIVE-22957.01.patch, HIVE-22957.02.patch, HIVE-22957.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-06-30 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140969#comment-17140969
 ] 

Syed Shameerur Rahman edited comment on HIVE-22957 at 7/1/20, 4:55 AM:
---

[~kgyrtkirk] [~jcamachorodriguez] [~vihangk1] Could you please review the patch?


was (Author: srahman):
[~kgyrtkirk] [~prasanth_j] [~jcamachorodriguez] [~rajesh.balamohan] Could you 
please review the patch?

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR 
> TABLE.pdf, HIVE-22957.01.patch, HIVE-22957.02.patch, HIVE-22957.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23786) HMS Server side filter

2020-06-30 Thread Peter Vary (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149105#comment-17149105
 ] 

Peter Vary commented on HIVE-23786:
---

[~samuelan]: Are you looking for something like HIVE-20776?

> HMS Server side filter
> --
>
> Key: HIVE-23786
> URL: https://issues.apache.org/jira/browse/HIVE-23786
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sam An
>Assignee: Sam An
>Priority: Major
>
> HMS server side filter of results based on authorization. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23784) Fix Replication Metrics Sink to DB

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23784?focusedWorklogId=453224&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453224
 ]

ASF GitHub Bot logged work on HIVE-23784:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 05:04
Start Date: 01/Jul/20 05:04
Worklog Time Spent: 10m 
  Work Description: pkumarsinha commented on a change in pull request #1193:
URL: https://github.com/apache/hive/pull/1193#discussion_r448113867



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/parse/repl/metric/MetricSink.java
##
@@ -64,8 +64,10 @@ public static MetricSink getInstance() {
   public synchronized void init(HiveConf conf) {
 if (!isInitialised) {
   this.conf = conf;
-  this.executorService.schedule(new MetricSinkWriter(conf), 
getFrequencyInSecs(), TimeUnit.SECONDS);
+  this.executorService.scheduleAtFixedRate(new MetricSinkWriter(conf), 0,
+getFrequencyInSecs(), TimeUnit.SECONDS);

Review comment:
   Frequency could be in Minutes only? Conversion to secs isn't required in 
getFrequencyInSecs()





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453224)
Remaining Estimate: 0h
Time Spent: 10m

> Fix Replication Metrics Sink to DB
> --
>
> Key: HIVE-23784
> URL: https://issues.apache.org/jira/browse/HIVE-23784
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-23784.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23784) Fix Replication Metrics Sink to DB

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23784:
--
Labels: pull-request-available  (was: )

> Fix Replication Metrics Sink to DB
> --
>
> Key: HIVE-23784
> URL: https://issues.apache.org/jira/browse/HIVE-23784
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23784.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23784) Fix Replication Metrics Sink to DB

2020-06-30 Thread Pravin Sinha (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149109#comment-17149109
 ] 

Pravin Sinha commented on HIVE-23784:
-

+1

> Fix Replication Metrics Sink to DB
> --
>
> Key: HIVE-23784
> URL: https://issues.apache.org/jira/browse/HIVE-23784
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23784.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23784) Fix Replication Metrics Sink to DB

2020-06-30 Thread Anishek Agarwal (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anishek Agarwal updated HIVE-23784:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1, committed to master 

> Fix Replication Metrics Sink to DB
> --
>
> Key: HIVE-23784
> URL: https://issues.apache.org/jira/browse/HIVE-23784
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23784.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23509) MapJoin AssertionError: Capacity must be power of 2

2020-06-30 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149128#comment-17149128
 ] 

Zoltan Haindrich commented on HIVE-23509:
-

okay; but I think if someone still uses it - or want's to backport it to some 
older version might make sense to have it
+1

> MapJoin AssertionError: Capacity must be power of 2
> ---
>
> Key: HIVE-23509
> URL: https://issues.apache.org/jira/browse/HIVE-23509
> Project: Hive
>  Issue Type: Bug
> Environment: Hive-2.3.6
>Reporter: Shashank Pedamallu
>Assignee: Shashank Pedamallu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Observed AssertionError errors in Hive query when rowCount for join is issued 
> as (2^x)+(2^(x+1)).
> Following is the stacktrace:
> {noformat}
> [2020-05-11 05:43:12,135] {base_task_runner.py:95} INFO - Subtask: ERROR : 
> Vertex failed, vertexName=Map 4, vertexId=vertex_1588729523139_51702_1_06, 
> diagnostics=[Task failed, taskId=task_1588729523139_51702_1_06_001286, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1588729523139_51702_1_06_001286_0:java.lang.RuntimeException: 
> java.lang.AssertionError: Capacity must be a power of two [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.security.AccessController.doPrivileged(Native Method) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> javax.security.auth.Subject.doAs(Subject.java:422) [2020-05-11 05:43:12,136] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
> [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266) [2020-05-11 
> 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [2020-05-11 05:43:12,136] {base_task_runner.py:95} INFO - Subtask: at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> java.lang.Thread.run(Thread.java:748) [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: Caused by: java.lang.AssertionError: 
> Capacity must be a power of two [2020-05-11 05:43:12,137] 
> {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.validateCapacity(BytesBytesMultiHashMap.java:552)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashImpl(BytesBytesMultiHashMap.java:731)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.BytesBytesMultiHashMap.expandAndRehashToTarget(BytesBytesMultiHashMap.java:545)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.persistence.HybridHashTableContainer$HashPartition.getHashMapFromDisk(HybridHashTableContainer.java:183)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.reloadHashTable(MapJoinOperator.java:641)
>  [2020-05-11 05:43:12,137] {base_task_runner.py:95} INFO - Subtask: at 
> org.apache.hadoop.hive.ql.exec.MapJoi

[jira] [Work logged] (HIVE-23772) Relocate calcite-core to prevent NoSuchFiledError

2020-06-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23772?focusedWorklogId=453249&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-453249
 ]

ASF GitHub Bot logged work on HIVE-23772:
-

Author: ASF GitHub Bot
Created on: 01/Jul/20 06:05
Start Date: 01/Jul/20 06:05
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1187:
URL: https://github.com/apache/hive/pull/1187#issuecomment-652211265


   > Ideally we can remove the original ones for itests??
   
   I think that will not make the problem go away - if someone adds a 
calcite-core to the classpath this could happen in production setup as well...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 453249)
Time Spent: 1h 40m  (was: 1.5h)

> Relocate calcite-core to prevent NoSuchFiledError
> -
>
> Key: HIVE-23772
> URL: https://issues.apache.org/jira/browse/HIVE-23772
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Exception trace due to conflict with {{calcite-core}}
> {noformat}
> Caused by: java.lang.NoSuchFieldError: operands
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:785)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$RexVisitor.visitCall(ASTConverter.java:509)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.calcite.rex.RexCall.accept(RexCall.java:191) 
> ~[calcite-core-1.21.0.jar:1.21.0]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:239)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:437)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:124)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:112)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedAST(CalcitePlanner.java:1620)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:555)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12456)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:433)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:290)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:220) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:184) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:602) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:548) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:542) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:125)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)